An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI

Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity,...

Full description

Saved in:
Bibliographic Details
Published inInformation research Vol. 30; no. iConf; pp. 610 - 617
Main Authors Choi, Kahyun, Kang, Gyuri
Format Journal Article
LanguageEnglish
Published University of Borås 11.03.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity, raising concerns about representation. To address this issue, this study assesses the dataset representation and examines potential issues in AI model design for poetry collections. Method. We annotated and published the race and ethnicity of poets in an American poetry collection curated by poets.org, which was recently used to train a poetry theme classification system. We then examined the diversity of the collection using these annotations. Analysis. We compared the racial/ethnic composition of the collection to U.S. Census data and conducted group-exclusive top word analysis, popular theme analysis, and entropy-based analysis of theme distribution diversity to evaluate linguistic and thematic diversity. Results. Our findings indicate that most underrepresented groups are well-represented in the collection, except for Latino/a/x American poets. Furthermore, we found that poems from underrepresented groups increase the collection’s linguistic and thematic diversity. Conclusions. To design responsible AI that embraces diversity, it is essential to assess dataset representation and support non-standard English and diverse themes beyond those popular with the general population.
AbstractList Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity, raising concerns about representation. To address this issue, this study assesses the dataset representation and examines potential issues in AI model design for poetry collections. Method. We annotated and published the race and ethnicity of poets in an American poetry collection curated by poets.org, which was recently used to train a poetry theme classification system. We then examined the diversity of the collection using these annotations. Analysis. We compared the racial/ethnic composition of the collection to U.S. Census data and conducted group-exclusive top word analysis, popular theme analysis, and entropy-based analysis of theme distribution diversity to evaluate linguistic and thematic diversity. Results. Our findings indicate that most underrepresented groups are well-represented in the collection, except for Latino/a/x American poets. Furthermore, we found that poems from underrepresented groups increase the collection’s linguistic and thematic diversity. Conclusions. To design responsible AI that embraces diversity, it is essential to assess dataset representation and support non-standard English and diverse themes beyond those popular with the general population.
Author Kang, Gyuri
Choi, Kahyun
Author_xml – sequence: 1
  givenname: Kahyun
  surname: Choi
  fullname: Choi, Kahyun
  organization: University of Illinois Urbana-Champaign
– sequence: 2
  givenname: Gyuri
  surname: Kang
  fullname: Kang, Gyuri
  organization: Indiana University Bloomington
BookMark eNpVUDlrwzAUFiWFJmnnrto6udFl2R5D6BEodGnXGkl-ShQcy0hOwf--Ii6hnd7xHY_3LdCs8x0gdE_Joyiqslq5wInb-M6Kgkl-heaUyzKjkvLZn_4GLWI8EMKIKPI5-lp3WHWqHaOL2FvcexhwA0e_C6rfO5PABg97OKohDY37hhDdMGKXZGdyGLHxbQtmcL7D1ocEmfYUExOvt7fo2qo2wt1vXaLP56ePzWv29v6y3azfMsOkGDJLNbOF1LZhOc9BlWVVGdsQyK0AIpm0vBKCK-DCWgmaNZoAVbpSGqwkmi_RdvJtvDrUfXBHFcbaK1efFz7sahXSBy3UrBCikcRYkgthclJSpjlwzkQOzNAyea0mLxN8jAHsxY-S-hx1_T_qpHiYFP1Jt85AUPXBn0KKNV7OTswfRV-EUw
ContentType Journal Article
DBID IEMAZ
AAYXX
CITATION
DOA
DOI 10.47989/ir30iConf47263
DatabaseName Publicera OA Journals
CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef


Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Library & Information Science
EISSN 1368-1613
EndPage 617
ExternalDocumentID oai_doaj_org_article_2744d60cf0544c50812b3e33245e2c18
10_47989_ir30iConf47263
article/47263
GroupedDBID .4I
29I
2WC
5GY
5VS
77K
AAFWJ
ABDBF
ABOPQ
ADBBV
ADMLS
AEGXH
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
E3Z
EBS
EJD
ELW
GROUPED_DOAJ
IEMAZ
KQ8
M~E
OVT
P2P
RNS
XSB
AAYXX
CITATION
ID FETCH-LOGICAL-c264t-f1b2f76bfd2535ea8899cfd0e5f4e0626f39443ae34ff6eb2db0e1ab9abef60b3
IEDL.DBID DOA
ISSN 1368-1613
IngestDate Wed Aug 27 01:31:37 EDT 2025
Tue Aug 05 12:03:34 EDT 2025
Fri Aug 29 13:49:07 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue iConf
Language English
License https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c264t-f1b2f76bfd2535ea8899cfd0e5f4e0626f39443ae34ff6eb2db0e1ab9abef60b3
OpenAccessLink https://doaj.org/article/2744d60cf0544c50812b3e33245e2c18
PageCount 8
ParticipantIDs doaj_primary_oai_doaj_org_article_2744d60cf0544c50812b3e33245e2c18
crossref_primary_10_47989_ir30iConf47263
publicera_journals_article_47263
PublicationCentury 2000
PublicationDate 2025-03-11T00:00:00+01:00
PublicationDateYYYYMMDD 2025-03-11
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-03-11T00:00:00+01:00
  day: 11
PublicationDecade 2020
PublicationTitle Information research
PublicationYear 2025
Publisher University of Borås
Publisher_xml – name: University of Borås
SSID ssj0020475
Score 2.3716912
Snippet Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce...
SourceID doaj
crossref
publicera
SourceType Open Website
Index Database
Publisher
StartPage 610
SubjectTerms dataset evaluation
Digital library
natural language processing
poetry
responsible AI
Title An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI
URI https://publicera.kb.se/ir/article/view/47263
https://doaj.org/article/2744d60cf0544c50812b3e33245e2c18
Volume 30
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELYQCzAgnqJAkQeEWKI68SPJWFCrglQmKnUi8lOKBGlV0pXfzjlxqnZiYckQ23H0na37Trr7DqF7To2MFVGRpJmLmElllLOcRg58pZGCOsl9ofD0TUxm7HXO51utvnxOWCsP3AI38Ap2RhDtgFswDXQiThS1FHgAt4mOmzJf8HldMBVCLcJS3gr5sDTP8kG5oqT0NXQsTQTd8UGNVP8ROmhlpe1KbvmX8Qk6DsQQD9sfOkV7tjpD_VBWgB9wqBvyOOJwIc_Rx7DCMsiK4IXDy4WtsbFfrQ51qWHQ4E6XFZsuBwOXsKyZDJ_2J6HJx6ow7ABD-nPtc9rx8OUCzcaj9-dJFDomRBqITR25WCUuFcqZhFNuZQbRlHaGWO6YJRC7OF8HS6WlzDkBQbVRxMZS5VJZJ4iil2i_WlT2CmGbpJlIHdfU5EymUhkpTR5Tx7nyRuihxw7DYtkKYxQQUDRwF7tw99CTx3gzzStaNy_AzkWwc_GXnXsIbyxUhHv2vVnd7HP9H_vcoMPEt_j1KXvxLdqvV2vbB95Rq7vmiMFz-jP6BWkv2Uo
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+analysis+of+poet+demographic+and+thematic+diversity+in+a+poetry+collection+for+inclusive+AI&rft.jtitle=Information+research&rft.au=Choi%2C+Kahyun&rft.au=Kang%2C+Gyuri&rft.date=2025-03-11&rft.pub=University+of+Bor%C3%A5s&rft.issn=1368-1613&rft.eissn=1368-1613&rft.volume=30&rft.issue=iConf&rft.spage=610&rft.epage=617&rft_id=info:doi/10.47989%2Fir30iConf47263&rft.externalDBID=ir%3APRP&rft.externalDocID=article%2F47263
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1368-1613&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1368-1613&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1368-1613&client=summon