Publication Abstract

Title
Comparing supervised classification methods for prediction of substrate type using multibeam acoustic and legacy grain-size data
Publication Abstract

For effective planning and management of marine ecosystems and resources detailed seabed substrate maps are increasingly in demand. It has become common to use remotely sensed multi-beam data in the form of bathymetry and acoustic backscatter to inform the mapping of seabed substrates. Making the best use of existing data is important because of the significant costs involved in undertaking multi-beam and ground truth surveys.

This study compares the performances of a range of supervised classification techniques for predicting substrate type from multi-beam acoustic data. The study site is an area of the North Sea, which lies off the north east coast of England, the site ranges from 55-100m in depth. A total of 258 ground truth samples obtained from a legacy dataset of the British Geological Survey (BGS Legacy Particle Size Analysis uncontrolled data export (2011), British Geological Survey, www.bgs.ac.uk) are classified into four substrate classes. The exact vintage of the samples is unknown; however all samples were collected prior to the introduction of GPS and substantial positional errors are to be expected. The multi-beam bathymetry and backscatter data were collected as part of the Civil Hydrography Programme. The bathymetry and mean backscatter gridded to a 10 m resolution provided the primary input features. A range of secondary features were derived from the backscatter and bathymetry grids making a total of 15 input features.
Six supervised classification techniques are tested, including; Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forests and Bayesian Decision Rules. Each classifier is trained multiple times using different subsets of input features. The predictive performances of the models are validated using a separate test set of ground truth data set aside prior to the analysis. The statistical significance of the model performances are compared against a baseline represented by a very simple model (Nearest Neighbour predictions on bathymetry and backscatter) to assess the benefits gained by using more sophisticated algorithms and incorporating more input features. 
The best performing models achieved accuracies of around 80% on the test set, which when considering the limitations of the data is a satisfactory result. Tree based methods and Bayesian decision rules were the best performing techniques. The models that used all 15 input features didn’t generally perform well; this highlights the need for some means of feature selection. It is also worthwhile considering computation cost involved as there was a large variation between methods in the time taken during the training phase.
Publication Internet Address of the Data
Publication Authors
David Stephens* and Markus Diesing*
Publication Date
May 2013
Publication Reference
GeoHab 2013, Rome, Italy, 6 -10 May 2013
Publication DOI: https://doi.org/