S-176 SOCcer 2.0 and SOCcer in the Field: Moving from coding occupation after data collection to coding in real time by study subjects
ObjectiveFree-text job descriptions from lifetime occupational history questionnaires are the starting point for nearly all occupational exposure assessment activities in epidemiologic studies. This information is used to code job descriptions into standardized occupation classification (SOC) system...
Saved in:
Published in | Occupational and environmental medicine (London, England) Vol. 78; no. Suppl 1; pp. A152 - A153 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
BMJ Publishing Group Ltd
01.10.2021
BMJ Publishing Group LTD |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | ObjectiveFree-text job descriptions from lifetime occupational history questionnaires are the starting point for nearly all occupational exposure assessment activities in epidemiologic studies. This information is used to code job descriptions into standardized occupation classification (SOC) systems. We describe updates to SOCcer, an algorithm that incorporates natural language processing to automatically code job descriptions to SOC-2010.MethodsWe recently released SOCcer 2.0. It improved on the original algorithm by 1) expanding the training data set to include job descriptions from population-based epidemiologic studies and 2) revising the scoring algorithm to account for nonlinearity in the classifiers. However, perfect prediction is not possible because of the lack of gold standard approach on which to train the algorithm and the brevity of the job descriptions provided by participants, which may fit multiple codes. We have adapted SOCcer to be used in the data collection process to allow the study participant to serve as their own coder when completing a web-based occupational questionnaire. SOCcer reads the participants open-ended job title and tasks responses in real time and proposes a short list of best-fitting SOC-2010 codes for each job. The study participant reviews the list and selects the code that best fits their job.ResultsIn a validation set of 11,943 jobs, SOCcer’ highest scoring code had 50% and 63% agreement with a consensus expert-assigned code at the 6- and 3-digit level, respectively. Agreement increased linearly with algorithm score. The expert’s code was in the top 3 scoring codes from SOCcer for >70% of the jobs, lending support for providing a short list of codes for the study participants to review. Pilot testing is underway.ConclusionAutomated coding, especially in real time, has the potential to substantially reduce the efforts needed to code jobs in large epidemiologic studies and improve the codes accuracy. |
---|---|
Bibliography: | 28th International Symposium on Epidemiology in Occupational Health (EPICOH 2021) ObjectType-Conference Proceeding-1 SourceType-Scholarly Journals-1 content type line 14 |
ISSN: | 1351-0711 1470-7926 |
DOI: | 10.1136/OEM-2021-EPI.417 |