Text classification to streamline online wildlife trade analyses

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practition...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 16; no. 7; p. e0254007
Main Authors	Stringham, Oliver C, Moncayo, Stephanie, Hill, Katherine G. W, Toomes, Adam, Mitchell, Lewis, Ross, Joshua V, Cassey, Phillip
Format	Journal Article
Language	English
Published	San Francisco Public Library of Science 09.07.2021 Public Library of Science (PLoS)
Subjects	Advertising Biodiversity Biology and Life Sciences Biosecurity Birds Classification Classifiers Computer and Information Sciences Conservation Context Data collection Data processing Datasets Electronic commerce Evaluation Food Funding Image classification Internet Learning algorithms Machine learning Natural language processing Poultry Reptiles & amphibians Sensitivity analysis Social Sciences Streamlining Text categorization Text processing Unstructured data Websites Wild animal trade Wildlife conservation Wildlife trade Australia
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0254007