A Straightforward Author Profiling Approach in MapReduce

Most natural language processing tasks deal with large amounts of data, which takes a lot of time to process. For better results, a larger dataset and a good set of features are very helpful. But larger volumes of text and high dimensionality of features will mean slower performance. Thus, natural l...

Full description

Saved in:

Bibliographic Details
Published in	Advances in Artificial Intelligence -- IBERAMIA 2014 pp. 95 - 107
Main Authors	Maharjan, Suraj, Shrestha, Prasha, Solorio, Thamar, Hasan, Ragib
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Early Bird Hadoop Distribute File System Natural Language Processing Runtime Performance Statistical Machine Translation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Most natural language processing tasks deal with large amounts of data, which takes a lot of time to process. For better results, a larger dataset and a good set of features are very helpful. But larger volumes of text and high dimensionality of features will mean slower performance. Thus, natural language processing and distributed computing are a good match. In the PAN 2013 competition, the test runtimes for author profiling range from several minutes to several days. Most author profiling systems available now are either inaccurate or slow or both. Our system, written entirely in MapReduce, employs nearly 3 million features and still manages to finish the task in a fraction of time than state-of-the-art systems and with better accuracy. Our system demonstrates that when we deal with a huge amount of data and/or a large number of features, using distributed systems makes perfect sense.
ISBN:	9783319120263 3319120263
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-12027-0_8