A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task

Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). A multivariate regression model was built using a backward-elimination approach as a function of certain generalized fea...

Full description

Saved in:
Bibliographic Details
Published inAMIA ... Annual Symposium proceedings Vol. 2007; pp. 620 - 624
Main Authors Rekapalli, Hari Krishna, Cohen, Aaron M, Hersh, William R
Format Journal Article
LanguageEnglish
Published United States American Medical Informatics Association 11.10.2007
Subjects
Online AccessGet full text
ISSN1942-597X
1559-4076

Cover

Loading…
Abstract Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.
AbstractList Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.
Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP).OBJECTIVEIdentify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP).A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants.METHODSA multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants.Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined.RESULTSOur regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined.These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.CONCLUSIONThese reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.
Author Rekapalli, Hari Krishna
Hersh, William R
Cohen, Aaron M
Author_xml – sequence: 1
  givenname: Hari Krishna
  surname: Rekapalli
  fullname: Rekapalli, Hari Krishna
  organization: Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
– sequence: 2
  givenname: Aaron M
  surname: Cohen
  fullname: Cohen, Aaron M
– sequence: 3
  givenname: William R
  surname: Hersh
  fullname: Hersh, William R
BackLink https://www.ncbi.nlm.nih.gov/pubmed/18693910$$D View this record in MEDLINE/PubMed
BookMark eNpVkE1Lw0AQhhepWFv9C7Inb4Ek-5W9CKXUKhQEqeAtTHYn7dp8uZsU-u8tWqWeZmDe93lgJmTUtA1ekOtECB3xWMnRcdc8jYRW72MyCeEjjrkSmbwi4ySTmukkvibbGTVt3YGH3u2RQgPVIbhA25J67L3DPVS0ROgHj4EOAS11De23SNevizlN41jSJTZt7Uygaw9mRzsIATZ41u8h7G7IZQlVwNvTnJK3x8V6_hStXpbP89kq6hKl-4gzhbKERAooVMnLjKUZZ0xakQIHzTJZ2DSz3ChhS20KZQUIbrVEk8ZZytiUPPxwu6Go0Rpseg9V3nlXgz_kLbj8_6Vx23zT7vNUCpExdQTcnwC-_Rww9HntgsGqggbbIeQqTth3dEruzk1_it_nsi-fAnot
ContentType Journal Article
Copyright 2007 AMIA - All rights reserved. 2007
Copyright_xml – notice: 2007 AMIA - All rights reserved. 2007
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1559-4076
EndPage 624
ExternalDocumentID PMC2655837
18693910
Genre Research Support, U.S. Gov't, Non-P.H.S
Journal Article
Comparative Study
GroupedDBID 2WC
53G
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BAWUL
CGR
CUY
CVF
DIK
E3Z
ECM
EIF
GX1
HYE
NPM
OK1
RPM
WOQ
7X8
5PM
ID FETCH-LOGICAL-p179t-437e6fa165ab7f4f83284336d52a4a9386bd28d4c75df9cb7d5a54d96ec208233
ISSN 1942-597X
IngestDate Thu Aug 21 18:16:39 EDT 2025
Thu Jul 10 22:47:32 EDT 2025
Thu Apr 03 07:08:58 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p179t-437e6fa165ab7f4f83284336d52a4a9386bd28d4c75df9cb7d5a54d96ec208233
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PMID 18693910
PQID 70136558
PQPubID 23479
PageCount 5
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_2655837
proquest_miscellaneous_70136558
pubmed_primary_18693910
PublicationCentury 2000
PublicationDate 2007-Oct-11
PublicationDateYYYYMMDD 2007-10-11
PublicationDate_xml – month: 10
  year: 2007
  text: 2007-Oct-11
  day: 11
PublicationDecade 2000
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle AMIA ... Annual Symposium proceedings
PublicationTitleAlternate AMIA Annu Symp Proc
PublicationYear 2007
Publisher American Medical Informatics Association
Publisher_xml – name: American Medical Informatics Association
References 15608257 - Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8
References_xml – reference: 15608257 - Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8
SSID ssj0047586
Score 1.7595257
Snippet Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average...
SourceID pubmedcentral
proquest
pubmed
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage 620
SubjectTerms Abstracting and Indexing as Topic
Algorithms
Databases as Topic
Genomics
Information Storage and Retrieval
Multivariate Analysis
Regression Analysis
Subject Headings
Vocabulary, Controlled
Title A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task
URI https://www.ncbi.nlm.nih.gov/pubmed/18693910
https://www.proquest.com/docview/70136558
https://pubmed.ncbi.nlm.nih.gov/PMC2655837
Volume 2007
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELbYPSAuiDddXj5wi1rR-NUco9VCQSoH6Eq9VXbsqKvStGpSJPj1zDh2k6JFAi5WlMQ--LPHY_ubbwh5C0ue5kzhDavjQ85gKhor9DBl1sqCay3HGO88-yyn1_zTQixiou0QXdKYUfHz1riS_0EV3gGuGCX7D8geG4UX8Az4QgkIQ_lXGOeBQt6Kd-uevsjeJ8r6jtGJzkt31smhdjayGudfri5RTFCi7DTGJdeocl6skx340sji6eo3ul73Pdh89jFPRqNREpT5v_7YIPHrsEm6tbDjz7s1LMbf2hDsKWzLvVFZVR09KEaH5HoP47B3NLuv_YFPOA8KvMZ4PKHQrgfz6YJJFRnsUlXfTEofANf0INptPEaYIItlgeh6qoMdP52RMzbGLAkfFkcyD4ddj09GFf65bbvwO-u150bMH5D7wf-neQvmQ3LHVY_I3VlgODwmq5z2MKURU7ot6RETGjGliCm9qShgShFTipjSiCn1mNKAaa8-YvqEXL-_ml9OhyEbxnAHRrOBeaScLPVYCm1UyUswxRPOmLQi1VxnbCKNTSeWF0rYMiuMgiknuM2kK1K8TmVPyXm1rdxzQgtTCIe57t9Zzo1TmQSv27i0UEobMzED8iZ23xKsDV4h6cptD_VSocSfEJMBedZ25nLXiqIsY9cPiDrp5uMPqGN--qW6WXk98xRbZOrij22-IPe6ofWSnDf7g3sFvmBjXvth8AuaKWaD
linkProvider Geneva Foundation for Medical Education and Research
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comparative+analysis+of+retrieval+features+used+in+the+TREC+2006+Genomics+Track+passage+retrieval+task&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Rekapalli%2C+Hari+Krishna&rft.au=Cohen%2C+Aaron+M&rft.au=Hersh%2C+William+R&rft.date=2007-10-11&rft.eissn=1559-4076&rft.spage=620&rft_id=info%3Apmid%2F18693910&rft.externalDocID=18693910
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon