Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks
Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , , , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
17.08.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 2331-8422 |
DOI | 10.48550/arxiv.2102.12592 |
Cover
Loading…
Abstract | Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants' satisfaction with their computational notebook. |
---|---|
AbstractList | Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants' satisfaction with their computational notebook. ACM Trans. Comput.-Hum. Interact. 29, 2, Article 17 (April 2022), 33 pages Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants' satisfaction with their computational notebook. |
Author | April Yi Wang Muller, Michael Liu, Xuye Park, Soya Weisz, Justin D Wu, Lingfei Drozdal, Jaimie Wang, Dakuo Dugan, Casey |
Author_xml | – sequence: 1 fullname: April Yi Wang – sequence: 2 givenname: Dakuo surname: Wang fullname: Wang, Dakuo – sequence: 3 givenname: Jaimie surname: Drozdal fullname: Drozdal, Jaimie – sequence: 4 givenname: Michael surname: Muller fullname: Muller, Michael – sequence: 5 givenname: Soya surname: Park fullname: Park, Soya – sequence: 6 givenname: Justin surname: Weisz middlename: D fullname: Weisz, Justin D – sequence: 7 givenname: Xuye surname: Liu fullname: Liu, Xuye – sequence: 8 givenname: Lingfei surname: Wu fullname: Wu, Lingfei – sequence: 9 givenname: Casey surname: Dugan fullname: Dugan, Casey |
BackLink | https://doi.org/10.1145/3489465$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.2102.12592$$DView paper in arXiv |
BookMark | eNpVkE9PwjAYhxujiYh8AE828Tzs3q7r5o0MFBLUA9yXdnuXDFmLbWfk2zvBi6f33y9P3jw35NJYg4TcxWyaZEKwR-W-268pxAymMYgcLsgIOI-jLAG4JhPvd4wxSCUIwUckzG3Vd2iCCq019FWFgM4_0WXfKRMVwwEd1nS2opujD9jRYOnM-9YHOldB0U3VoqmQFrZG-p_VmmHbHfrzqPb0zQbU1n74W3LVqL3HyV8dk-3zYlsso_X7y6qYrSMlIIkkS1imJUOd5QxrjrGqIK21HFoJNYLmTEFeS5HmeVw1kqumkXmqK6U1lzEfk_sz9qSkPLi2U-5Y_qopT2qGxMM5cXD2s0cfyp3t3fCrLyHJB0YmeMJ_AIb1ad8 |
ContentType | Paper Journal Article |
Copyright | 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX |
DOI | 10.48550/arxiv.2102.12592 |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv.org |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
ExternalDocumentID | 2102_12592 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX |
ID | FETCH-LOGICAL-a524-70408b70eb890ed3e1ac26db7d3e72de2b30a29d756991cf73aff796bcabb3713 |
IEDL.DBID | GOX |
IngestDate | Tue Jul 22 23:14:44 EDT 2025 Mon Jun 30 09:23:17 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a524-70408b70eb890ed3e1ac26db7d3e72de2b30a29d756991cf73aff796bcabb3713 |
Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
OpenAccessLink | https://arxiv.org/abs/2102.12592 |
PQID | 2493718534 |
PQPubID | 2050157 |
ParticipantIDs | arxiv_primary_2102_12592 proquest_journals_2493718534 |
PublicationCentury | 2000 |
PublicationDate | 20220817 |
PublicationDateYYYYMMDD | 2022-08-17 |
PublicationDate_xml | – month: 08 year: 2022 text: 20220817 day: 17 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2022 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 1.8065754 |
SecondaryResourceType | preprint |
Snippet | Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay... ACM Trans. Comput.-Hum. Interact. 29, 2, Article 17 (April 2022), 33 pages Computational notebooks allow data scientists to express their ideas through a... |
SourceID | arxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Automation Computer Science - Human-Computer Interaction Documentation Scientists Source code |
SummonAdditionalLinks | – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3PS8MwFA66IXjzJ5tOycFrtzRpm9aLyHROYcPDhN1Kkiawg-1cO_HP9yWNigjeSgM9fK_vvbzvJe9D6IooakfCZAE3DAoUQ6zMC1OBMVnECqZj6aiL2TyZvkRPy3jpCbfaH6v8iokuUBeVshz5CMoExm1yiW7Wb4FVjbLdVS-hsYu6IWQa-4enk4dvjoUmHHbMrG1mutFdI7H5WL0PbZ0zhNRuG6Bd9-pPKHb5ZXKAus9irTeHaEeXR2jPHctU9TFqIAdsX_31oBLP2mGY19hR74FlZq3UJr59xO3kcdxUGAAH0-E70QjsPRePq0Lj399albhVdPBsIJ5Xjeuc1idoMblfjKeB10kIREyjgIMfppITLdOMaMA3FIomheTwyGmhqWRE0KzgcQKbQWU4E8bwLJFKSAmYslPUKatS9xCGxVRmTEciIZFQJAULSirDkAttYiP6qOfQytftKIzcApk7IPto8AVg7t2gzn-Mdvb_8jnap_ZegZ01yweo02y2-gKyfSMvnUk_AcBWqzA priority: 102 providerName: ProQuest |
Title | Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks |
URI | https://www.proquest.com/docview/2493718534 https://arxiv.org/abs/2102.12592 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3NS8MwFH_MefEiisqmc7yD12qXtE3rbc59KGyKTNitJG0CO9jJ1okn_3Zf0g4R8VJKmzbwe837St_vAVz5GbOUMIknDKcAxfi2zQvPPGOSgOdch8qlLqazaPIaPC7CRQNwVwsj15_Lj4ofWG1ubDxyTSY4ISW7x5j9ZWv8tKg2Jx0VVz3-Zxz5mO7SH9Xq7MXoCA5rRw_7lWSOoaGLEyhJp2_f6nKfAqcVueUtulS6ZzOttnUm9h-wYhLHcoUEIIkC72UpsV6JOFjlGn-_a1lg1aGhzu7hbFW6ndDNKcxHw_lg4tV9DzwZssATtK5iJXyt4sTXhFdPZizKlaBTwXLNFPclS3IRRuTcZUZwaYxIIpVJpTgFnWfQLFaFbgHSzVglXAcy8gOZ-TFJRDHV6wmpTWhkG1oOrfS9orZILZCpA7INnR2Aaf1Zb1KK1WgCsvDB-f9PXsABszUCljdWdKBZrrf6kix3qbqwF4_GXdi_G86eX7pOmHScfg2_Aatinis |
linkProvider | Cornell University |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NS8MwFH-oQ_TmJ35MzUGPdV3SNq0gIptzc254mLBbSdoEPNjOrX79Uf6PvqSdIoI3b6WBEF5e3vf7PYBjN6EGEiZyuGbooGjXjHlhiaN15LGUKV_a0MVgGHTvvZuxP16Aj3kvjCmrnMtEK6jTPDEx8ga6CYwb5eJdTJ4cMzXKZFfnIzRKtuir91d02WbnvTbe7wmlnatRq-tUUwUc4VPP4ci1oeSukmHkKjxNUyQ0SCXHT05TRSVzBY1S7gdoOiWaM6E1jwKZCCnxBAy3XYSax1hkKgjDzvVXSIcGHA10VuZOLVJYQ0zfHl5OjVt1ipaEybfW7K9fkt-qs84a1O7ERE3XYUFlG7Bsq0CT2SYUqHKeH6tupIwMSuzNM2Ij_Y4JBJvJnuSyR0qgc1LkBO8XOYW0RSFIJShIK08V-bnXQ0bKARJV8JEM88ImamdbMPoPAm7DUpZnagcILoYyYsoTgeuJxA2RYSSVzSYXSvta7MKOpVY8KZE3YkPI2BJyF-pzAsbVq5vF3zyy9_fyEax0R4Pb-LY37O_DKjUtDQbmltdhqZg-qwM0NAp5aK-XQPzP7PQJ03voCg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Documentation+Matters%3A+Human-Centered+AI+System+to+Assist+Data+Science+Code+Documentation+in+Computational+Notebooks&rft.jtitle=arXiv.org&rft.au=April+Yi+Wang&rft.au=Wang%2C+Dakuo&rft.au=Drozdal%2C+Jaimie&rft.au=Muller%2C+Michael&rft.date=2022-08-17&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2102.12592 |