A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task
Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). A multivariate regression model was built using a backward-elimination approach as a function of certain generalized fea...
Saved in:
Published in | AMIA ... Annual Symposium proceedings Vol. 2007; pp. 620 - 624 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
American Medical Informatics Association
11.10.2007
|
Subjects | |
Online Access | Get full text |
ISSN | 1942-597X 1559-4076 |
Cover
Loading…
Abstract | Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP).
A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants.
Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined.
These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks. |
---|---|
AbstractList | Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP).
A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants.
Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined.
These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks. Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP).OBJECTIVEIdentify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP).A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants.METHODSA multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants.Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined.RESULTSOur regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined.These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.CONCLUSIONThese reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks. |
Author | Rekapalli, Hari Krishna Hersh, William R Cohen, Aaron M |
Author_xml | – sequence: 1 givenname: Hari Krishna surname: Rekapalli fullname: Rekapalli, Hari Krishna organization: Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA – sequence: 2 givenname: Aaron M surname: Cohen fullname: Cohen, Aaron M – sequence: 3 givenname: William R surname: Hersh fullname: Hersh, William R |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/18693910$$D View this record in MEDLINE/PubMed |
BookMark | eNpVkE1Lw0AQhhepWFv9C7Inb4Ek-5W9CKXUKhQEqeAtTHYn7dp8uZsU-u8tWqWeZmDe93lgJmTUtA1ekOtECB3xWMnRcdc8jYRW72MyCeEjjrkSmbwi4ySTmukkvibbGTVt3YGH3u2RQgPVIbhA25J67L3DPVS0ROgHj4EOAS11De23SNevizlN41jSJTZt7Uygaw9mRzsIATZ41u8h7G7IZQlVwNvTnJK3x8V6_hStXpbP89kq6hKl-4gzhbKERAooVMnLjKUZZ0xakQIHzTJZ2DSz3ChhS20KZQUIbrVEk8ZZytiUPPxwu6Go0Rpseg9V3nlXgz_kLbj8_6Vx23zT7vNUCpExdQTcnwC-_Rww9HntgsGqggbbIeQqTth3dEruzk1_it_nsi-fAnot |
ContentType | Journal Article |
Copyright | 2007 AMIA - All rights reserved. 2007 |
Copyright_xml | – notice: 2007 AMIA - All rights reserved. 2007 |
DBID | CGR CUY CVF ECM EIF NPM 7X8 5PM |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 1559-4076 |
EndPage | 624 |
ExternalDocumentID | PMC2655837 18693910 |
Genre | Research Support, U.S. Gov't, Non-P.H.S Journal Article Comparative Study |
GroupedDBID | 2WC 53G ADBBV ALMA_UNASSIGNED_HOLDINGS BAWUL CGR CUY CVF DIK E3Z ECM EIF GX1 HYE NPM OK1 RPM WOQ 7X8 5PM |
ID | FETCH-LOGICAL-p179t-437e6fa165ab7f4f83284336d52a4a9386bd28d4c75df9cb7d5a54d96ec208233 |
ISSN | 1942-597X |
IngestDate | Thu Aug 21 18:16:39 EDT 2025 Thu Jul 10 22:47:32 EDT 2025 Thu Apr 03 07:08:58 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p179t-437e6fa165ab7f4f83284336d52a4a9386bd28d4c75df9cb7d5a54d96ec208233 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
PMID | 18693910 |
PQID | 70136558 |
PQPubID | 23479 |
PageCount | 5 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_2655837 proquest_miscellaneous_70136558 pubmed_primary_18693910 |
PublicationCentury | 2000 |
PublicationDate | 2007-Oct-11 |
PublicationDateYYYYMMDD | 2007-10-11 |
PublicationDate_xml | – month: 10 year: 2007 text: 2007-Oct-11 day: 11 |
PublicationDecade | 2000 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | AMIA ... Annual Symposium proceedings |
PublicationTitleAlternate | AMIA Annu Symp Proc |
PublicationYear | 2007 |
Publisher | American Medical Informatics Association |
Publisher_xml | – name: American Medical Informatics Association |
References | 15608257 - Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8 |
References_xml | – reference: 15608257 - Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8 |
SSID | ssj0047586 |
Score | 1.7595257 |
Snippet | Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average... |
SourceID | pubmedcentral proquest pubmed |
SourceType | Open Access Repository Aggregation Database Index Database |
StartPage | 620 |
SubjectTerms | Abstracting and Indexing as Topic Algorithms Databases as Topic Genomics Information Storage and Retrieval Multivariate Analysis Regression Analysis Subject Headings Vocabulary, Controlled |
Title | A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task |
URI | https://www.ncbi.nlm.nih.gov/pubmed/18693910 https://www.proquest.com/docview/70136558 https://pubmed.ncbi.nlm.nih.gov/PMC2655837 |
Volume | 2007 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELbYPSAuiDddXj5wi1rR-NUco9VCQSoH6Eq9VXbsqKvStGpSJPj1zDh2k6JFAi5WlMQ--LPHY_ubbwh5C0ue5kzhDavjQ85gKhor9DBl1sqCay3HGO88-yyn1_zTQixiou0QXdKYUfHz1riS_0EV3gGuGCX7D8geG4UX8Az4QgkIQ_lXGOeBQt6Kd-uevsjeJ8r6jtGJzkt31smhdjayGudfri5RTFCi7DTGJdeocl6skx340sji6eo3ul73Pdh89jFPRqNREpT5v_7YIPHrsEm6tbDjz7s1LMbf2hDsKWzLvVFZVR09KEaH5HoP47B3NLuv_YFPOA8KvMZ4PKHQrgfz6YJJFRnsUlXfTEofANf0INptPEaYIItlgeh6qoMdP52RMzbGLAkfFkcyD4ddj09GFf65bbvwO-u150bMH5D7wf-neQvmQ3LHVY_I3VlgODwmq5z2MKURU7ot6RETGjGliCm9qShgShFTipjSiCn1mNKAaa8-YvqEXL-_ml9OhyEbxnAHRrOBeaScLPVYCm1UyUswxRPOmLQi1VxnbCKNTSeWF0rYMiuMgiknuM2kK1K8TmVPyXm1rdxzQgtTCIe57t9Zzo1TmQSv27i0UEobMzED8iZ23xKsDV4h6cptD_VSocSfEJMBedZ25nLXiqIsY9cPiDrp5uMPqGN--qW6WXk98xRbZOrij22-IPe6ofWSnDf7g3sFvmBjXvth8AuaKWaD |
linkProvider | Geneva Foundation for Medical Education and Research |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comparative+analysis+of+retrieval+features+used+in+the+TREC+2006+Genomics+Track+passage+retrieval+task&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Rekapalli%2C+Hari+Krishna&rft.au=Cohen%2C+Aaron+M&rft.au=Hersh%2C+William+R&rft.date=2007-10-11&rft.eissn=1559-4076&rft.spage=620&rft_id=info%3Apmid%2F18693910&rft.externalDocID=18693910 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon |