Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification

The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised‐learning classification, and discuss the advantages...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the Association for Information Science and Technology Vol. 67; no. 10; pp. 2464 - 2476
Main Authors	Suominen, Arho, Toivanen, Hannes
Format	Journal Article
Language	English
Published	Blackwell Publishing Ltd 01.10.2016
Subjects	automatic classification Cartography Classification Data processing Delineation Indexing Learning machine learning text mining Texts
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised‐learning classification, and discuss the advantages and disadvantages of this approach vis‐à‐vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning‐based classification frameworks of scientific knowledge, as they typically try to fit new‐to‐the‐world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large‐scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large‐scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.
Bibliography:	ark:/67375/WNG-F4TBPN15-J Appendix S1. Wordcloud representation of the latent topics presented. Appendix S2. Full sized version of FIG 2. ArticleID:ASI23596 istex:3DB898B1273F06373EE0F4845C80C37A89C7B5B2 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2330-1635 2330-1643
DOI:	10.1002/asi.23596