An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI
Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity,...
Saved in:
Published in | Information research Vol. 30; no. iConf; pp. 610 - 617 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
University of Borås
11.03.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity, raising concerns about representation. To address this issue, this study assesses the dataset representation and examines potential issues in AI model design for poetry collections. Method. We annotated and published the race and ethnicity of poets in an American poetry collection curated by poets.org, which was recently used to train a poetry theme classification system. We then examined the diversity of the collection using these annotations. Analysis. We compared the racial/ethnic composition of the collection to U.S. Census data and conducted group-exclusive top word analysis, popular theme analysis, and entropy-based analysis of theme distribution diversity to evaluate linguistic and thematic diversity. Results. Our findings indicate that most underrepresented groups are well-represented in the collection, except for Latino/a/x American poets. Furthermore, we found that poems from underrepresented groups increase the collection’s linguistic and thematic diversity. Conclusions. To design responsible AI that embraces diversity, it is essential to assess dataset representation and support non-standard English and diverse themes beyond those popular with the general population. |
---|---|
AbstractList | Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity, raising concerns about representation. To address this issue, this study assesses the dataset representation and examines potential issues in AI model design for poetry collections. Method. We annotated and published the race and ethnicity of poets in an American poetry collection curated by poets.org, which was recently used to train a poetry theme classification system. We then examined the diversity of the collection using these annotations. Analysis. We compared the racial/ethnic composition of the collection to U.S. Census data and conducted group-exclusive top word analysis, popular theme analysis, and entropy-based analysis of theme distribution diversity to evaluate linguistic and thematic diversity. Results. Our findings indicate that most underrepresented groups are well-represented in the collection, except for Latino/a/x American poets. Furthermore, we found that poems from underrepresented groups increase the collection’s linguistic and thematic diversity. Conclusions. To design responsible AI that embraces diversity, it is essential to assess dataset representation and support non-standard English and diverse themes beyond those popular with the general population. |
Author | Kang, Gyuri Choi, Kahyun |
Author_xml | – sequence: 1 givenname: Kahyun surname: Choi fullname: Choi, Kahyun organization: University of Illinois Urbana-Champaign – sequence: 2 givenname: Gyuri surname: Kang fullname: Kang, Gyuri organization: Indiana University Bloomington |
BookMark | eNpVUDlrwzAUFiWFJmnnrto6udFl2R5D6BEodGnXGkl-ShQcy0hOwf--Ii6hnd7xHY_3LdCs8x0gdE_Joyiqslq5wInb-M6Kgkl-heaUyzKjkvLZn_4GLWI8EMKIKPI5-lp3WHWqHaOL2FvcexhwA0e_C6rfO5PABg97OKohDY37hhDdMGKXZGdyGLHxbQtmcL7D1ocEmfYUExOvt7fo2qo2wt1vXaLP56ePzWv29v6y3azfMsOkGDJLNbOF1LZhOc9BlWVVGdsQyK0AIpm0vBKCK-DCWgmaNZoAVbpSGqwkmi_RdvJtvDrUfXBHFcbaK1efFz7sahXSBy3UrBCikcRYkgthclJSpjlwzkQOzNAyea0mLxN8jAHsxY-S-hx1_T_qpHiYFP1Jt85AUPXBn0KKNV7OTswfRV-EUw |
ContentType | Journal Article |
DBID | IEMAZ AAYXX CITATION DOA |
DOI | 10.47989/ir30iConf47263 |
DatabaseName | Publicera OA Journals CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Library & Information Science |
EISSN | 1368-1613 |
EndPage | 617 |
ExternalDocumentID | oai_doaj_org_article_2744d60cf0544c50812b3e33245e2c18 10_47989_ir30iConf47263 article/47263 |
GroupedDBID | .4I 29I 2WC 5GY 5VS 77K AAFWJ ABDBF ABOPQ ADBBV ADMLS AEGXH AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV E3Z EBS EJD ELW GROUPED_DOAJ IEMAZ KQ8 M~E OVT P2P RNS XSB AAYXX CITATION |
ID | FETCH-LOGICAL-c264t-f1b2f76bfd2535ea8899cfd0e5f4e0626f39443ae34ff6eb2db0e1ab9abef60b3 |
IEDL.DBID | DOA |
ISSN | 1368-1613 |
IngestDate | Wed Aug 27 01:31:37 EDT 2025 Tue Aug 05 12:03:34 EDT 2025 Fri Aug 29 13:49:07 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | iConf |
Language | English |
License | https://creativecommons.org/licenses/by-nc-nd/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c264t-f1b2f76bfd2535ea8899cfd0e5f4e0626f39443ae34ff6eb2db0e1ab9abef60b3 |
OpenAccessLink | https://doaj.org/article/2744d60cf0544c50812b3e33245e2c18 |
PageCount | 8 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_2744d60cf0544c50812b3e33245e2c18 crossref_primary_10_47989_ir30iConf47263 publicera_journals_article_47263 |
PublicationCentury | 2000 |
PublicationDate | 2025-03-11T00:00:00+01:00 |
PublicationDateYYYYMMDD | 2025-03-11 |
PublicationDate_xml | – month: 03 year: 2025 text: 2025-03-11T00:00:00+01:00 day: 11 |
PublicationDecade | 2020 |
PublicationTitle | Information research |
PublicationYear | 2025 |
Publisher | University of Borås |
Publisher_xml | – name: University of Borås |
SSID | ssj0020475 |
Score | 2.3716912 |
Snippet | Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce... |
SourceID | doaj crossref publicera |
SourceType | Open Website Index Database Publisher |
StartPage | 610 |
SubjectTerms | dataset evaluation Digital library natural language processing poetry responsible AI |
Title | An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI |
URI | https://publicera.kb.se/ir/article/view/47263 https://doaj.org/article/2744d60cf0544c50812b3e33245e2c18 |
Volume | 30 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELYQCzAgnqJAkQeEWKI68SPJWFCrglQmKnUi8lOKBGlV0pXfzjlxqnZiYckQ23H0na37Trr7DqF7To2MFVGRpJmLmElllLOcRg58pZGCOsl9ofD0TUxm7HXO51utvnxOWCsP3AI38Ap2RhDtgFswDXQiThS1FHgAt4mOmzJf8HldMBVCLcJS3gr5sDTP8kG5oqT0NXQsTQTd8UGNVP8ROmhlpe1KbvmX8Qk6DsQQD9sfOkV7tjpD_VBWgB9wqBvyOOJwIc_Rx7DCMsiK4IXDy4WtsbFfrQ51qWHQ4E6XFZsuBwOXsKyZDJ_2J6HJx6ow7ABD-nPtc9rx8OUCzcaj9-dJFDomRBqITR25WCUuFcqZhFNuZQbRlHaGWO6YJRC7OF8HS6WlzDkBQbVRxMZS5VJZJ4iil2i_WlT2CmGbpJlIHdfU5EymUhkpTR5Tx7nyRuihxw7DYtkKYxQQUDRwF7tw99CTx3gzzStaNy_AzkWwc_GXnXsIbyxUhHv2vVnd7HP9H_vcoMPEt_j1KXvxLdqvV2vbB95Rq7vmiMFz-jP6BWkv2Uo |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+analysis+of+poet+demographic+and+thematic+diversity+in+a+poetry+collection+for+inclusive+AI&rft.jtitle=Information+research&rft.au=Choi%2C+Kahyun&rft.au=Kang%2C+Gyuri&rft.date=2025-03-11&rft.pub=University+of+Bor%C3%A5s&rft.issn=1368-1613&rft.eissn=1368-1613&rft.volume=30&rft.issue=iConf&rft.spage=610&rft.epage=617&rft_id=info:doi/10.47989%2Fir30iConf47263&rft.externalDBID=ir%3APRP&rft.externalDocID=article%2F47263 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1368-1613&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1368-1613&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1368-1613&client=summon |