Utilizing the random forest algorithm and interpretable machine learning to inform post-stratification of commercial fisheries data
Federal groundfish fisheries off Alaska are managed based on near-real time estimates of catch generated using a combination of data from the North Pacific Groundfish and Pacific Halibut Observer Program, which deploys observers and Electronic Monitoring systems into the fisheries to sample catch, a...
Saved in:
Published in | Fisheries research Vol. 281; p. 107253 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Federal groundfish fisheries off Alaska are managed based on near-real time estimates of catch generated using a combination of data from the North Pacific Groundfish and Pacific Halibut Observer Program, which deploys observers and Electronic Monitoring systems into the fisheries to sample catch, and industry-reported information. Catch is carefully monitored against limits that are based on biological constraints, quota allocations, or to control discard amounts. However, estimates of fish discarded at-sea (not retained for sale) can have large variance due to factors such as fishing behavior, species-specific vulnerability to fishing, and sample sizes. Post-stratification is a statistical approach widely used to improve the precision of catch estimates within a population because it controls for variance while also not relying on covariates known prior to sampling, which can be costly to collect or are unknown. Strategic use of post-stratification may increase the precision of estimates when compared to designs without post-stratification. However, choosing fishery characteristics to define post-strata may be elusive due to the high dimensionality of fishery data and complexity of creating post-strata that are optimized for multiple species. We propose a novel application of random forest classification and design-based estimation to explore multivariate post-stratification designs. These designs were evaluated by selecting the best performing trees from an ensemble using design-based estimation metrics. Results showed a large improvement in the precision of estimates by using the best-performing trees to label data and create post-strata. Moreover, through the use of subject matter expertise to evaluate the best performing trees, this method identified combinations of covariates that were not considered in previous estimation designs, and allows for exploration and testing of alternative post-strata designs that could be implemented in a management system. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0165-7836 |
DOI: | 10.1016/j.fishres.2024.107253 |