A deep learning approach to identifying source code in images and video
While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplor...
Saved in:
Published in | 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) pp. 376 - 386 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
New York, NY, USA
ACM
28.05.2018
|
Series | ACM Conferences |
Subjects | |
Online Access | Get full text |
ISBN | 9781450357166 1450357164 |
ISSN | 2574-3864 |
DOI | 10.1145/3196398.3196402 |
Cover
Abstract | While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplored, due in part to the complexity of extraction when code is represented with images. Existing approaches to code extraction and indexing in this environment rely heavily on computationally intense optical character recognition. To improve the ease and efficiency of identifying this embedded code, as well as identifying similar code examples, we develop a deep learning solution based on convolutional neural networks and autoencoders. Focusing on Java for proof of concept, our technique is able to identify the presence of typeset and handwritten source code in thousands of video images with 85.6%-98.6% accuracy based on syntactic and contextual features learned through deep architectures. When combined with traditional approaches, this provides a more scalable basis for video indexing that can be incorporated into existing software search and mining tools. |
---|---|
AbstractList | While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplored, due in part to the complexity of extraction when code is represented with images. Existing approaches to code extraction and indexing in this environment rely heavily on computationally intense optical character recognition. To improve the ease and efficiency of identifying this embedded code, as well as identifying similar code examples, we develop a deep learning solution based on convolutional neural networks and autoencoders. Focusing on Java for proof of concept, our technique is able to identify the presence of typeset and handwritten source code in thousands of video images with 85.6%-98.6% accuracy based on syntactic and contextual features learned through deep architectures. When combined with traditional approaches, this provides a more scalable basis for video indexing that can be incorporated into existing software search and mining tools. |
Author | Ott, Jordan Atchison, Abigail Linstead, Erik Harnack, Paul Bergh, Adrienne |
Author_xml | – sequence: 1 givenname: Jordan surname: Ott fullname: Ott, Jordan email: ott109@mail.chapman.edu organization: Chapman University – sequence: 2 givenname: Abigail surname: Atchison fullname: Atchison, Abigail email: atchi102@mail.chapman.edu organization: Chapman University – sequence: 3 givenname: Paul surname: Harnack fullname: Harnack, Paul email: harna100@mail.chapman.edu organization: Chapman University – sequence: 4 givenname: Adrienne surname: Bergh fullname: Bergh, Adrienne email: abergh@chapman.edu organization: Chapman University – sequence: 5 givenname: Erik surname: Linstead fullname: Linstead, Erik email: linstead@chapman.edu organization: Chapman University |
BookMark | eNqNkDtPwzAUhc1LopTODCweWVLs62fGqioFqRILzJbjXJdAa0dJQeq_J1U7MTEd6Xyfrq7ODblMOSEhd5xNOZfqUfBSi9JODykZnJFJaewAmFCGa31ORqCMLITV8uIPuyaTvv9kjIG2knMzIssZrRFbukHfpSatqW_bLvvwQXeZNjWmXRP3h77P311AGnKNtEm02fo19tSnmv4MWr4lV9FvepycckzenxZv8-di9bp8mc9WhQdrd0VVBwlVZAoMgqzQa7DRKBBgbIiBRalNGbwcGm1B8ciNB8FtYDxKMCDG5P54t0FE13bDH93eWVUqAD7QhyP1YeuqnL96x5k7rOZOq7nTaoM6_afqqq7BKH4ByJBoCw |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
Copyright | 2018 ACM |
Copyright_xml | – notice: 2018 ACM |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1145/3196398.3196402 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9781450357166 1450357164 |
EISSN | 2574-3864 |
EndPage | 386 |
ExternalDocumentID | 8595221 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IL 6IN AAJGR ABLEC ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IEGSK LHSKQ OCL RIB RIC RIE RIL AAWTH ADZIZ CHZPO |
ID | FETCH-LOGICAL-a288t-bdc42bf0527e24bea628f7523278cfc0f4679ca452368251f17a2318c01f42723 |
IEDL.DBID | RIE |
ISBN | 9781450357166 1450357164 |
IngestDate | Wed Aug 27 02:59:15 EDT 2025 Fri Sep 13 11:04:49 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Keywords | deep learning video mining convolutional neural networks programming tutorials |
Language | English |
License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org |
LinkModel | DirectLink |
MeetingName | ICSE '18: 40th International Conference on Software Engineering |
MergedId | FETCHMERGED-LOGICAL-a288t-bdc42bf0527e24bea628f7523278cfc0f4679ca452368251f17a2318c01f42723 |
OpenAccessLink | https://dl.acm.org/doi/pdf/10.1145/3196398.3196402 |
PageCount | 11 |
ParticipantIDs | ieee_primary_8595221 acm_books_10_1145_3196398_3196402 acm_books_10_1145_3196398_3196402_brief |
PublicationCentury | 2000 |
PublicationDate | 20180528 2018-May |
PublicationDateYYYYMMDD | 2018-05-28 2018-05-01 |
PublicationDate_xml | – month: 05 year: 2018 text: 20180528 day: 28 |
PublicationDecade | 2010 |
PublicationPlace | New York, NY, USA |
PublicationPlace_xml | – name: New York, NY, USA |
PublicationSeriesTitle | ACM Conferences |
PublicationTitle | 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) |
PublicationTitleAbbrev | MSR |
PublicationYear | 2018 |
Publisher | ACM |
Publisher_xml | – name: ACM |
SSID | ssj0002684117 ssj0003211714 |
Score | 2.305185 |
Snippet | While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code... |
SourceID | ieee acm |
SourceType | Publisher |
StartPage | 376 |
SubjectTerms | Computer systems organization -- Architectures -- Other architectures -- Neural networks Computing methodologies -- Machine learning -- Machine learning approaches Convolutional neural networks Data mining Deep learning Information systems -- Information retrieval -- Specialized information retrieval -- Multimedia and multimodal retrieval -- Video search Optical character recognition software programming tutorials Software and its engineering -- Software notations and tools -- Software libraries and repositories Tutorials video mining |
Title | A deep learning approach to identifying source code in images and video |
URI | https://ieeexplore.ieee.org/document/8595221 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_bTp6mbuL8IoLgxW5t1jTpcUznFBRBJ7uVJH2RIWvH1l38603SbqIIemtDD-Hl5X31_X4PoYtA01QwST0T3nLPGDzwhFTgUWXciw_S5ACuy_cxGk_C-ymd1tDVFgsDAK75DLr20f3LT3O1tqWynuXiIhY1XjdqVmK1tvUUy1qywUza977JbFgQVmw-QUh7Ttli3nUcVLaMUhdq_m2oivMpoyZ62OymbCV5764L2VUfP4ga_7vdXdT-Qu_hp61f2kM1yPZRczO-AVe3uYVuB_gaYIEritU3PKj4xXGR4xLA60BQ-NkV-PEwTwHPMnw3NzZohUWW4tdZCnkbTUY3L8OxV81V8AThvPBkqkIitU8JAxJKEBHhmpmMlDCutPK1MZ6xEqFZiSyyVQdMmDCQKz_QIWGkf4AaWZ7BIcIUIqYFBLEGEaaax0zwiBHFpFEOGfgddG6EnNiEYZWUGGiaVAeRVAfRQZd_fpPI5Qx0B7WslJNFScSRVAI--n35GO2Y6IaX3YknqFEs13BqIohCnjnV-QTQm71_ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JS8QwFH64HPTkjuMaQfBixzYmTeYobuOK4IK3kqQvMoitzHQu_nqTtDOiCHprQw_h5eVtfd_3AHYTy3MlNI9ceCsjZ_AwUtpgxI1zLzFqlwOELt_btPvILp_58wTsj7EwiBiaz7DtH8O__Lw0Q18qO_BcXNSjxqed32e8RmuNKyqet2SEmvTvhy63EQlr-HwSxg-CunVkO7BQ-ULKpDJv38aqBK9yNgc3o_3UzSSv7WGl2-bjB1Xjfzc8D8tf-D1yN_ZMCzCBxSLMjQY4kOY-L8H5ETlBfCcNyeoLOWoYxklVkhrCG2BQ5D6U-MlxmSPpFeTizVmhAVFFTp56OZbL8Hh2-nDcjZrJCpGiUlaRzg2j2sacCqRMo0qptMLlpFRIY01snfnsGMXcSuqxrTYRygWC0sSJZVTQwxWYKsoCV4FwTIVVmHQsKpZb2RFKpoIaoZ166CRuwY4TcuZThkFWo6B51hxE1hxEC_b-_CbT_R7aFix5KWfvNRVH1gh47fflbZjpPtxcZ9cXt1frMOtiHVn3Km7AVNUf4qaLJyq9FdToE_FPwMw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FACM+15th+International+Conference+on+Mining+Software+Repositories+%28MSR%29&rft.atitle=A+Deep+Learning+Approach+to+Identifying+Source+Code+in+Images+and+Video&rft.au=Ott%2C+Jordan&rft.au=Atchison%2C+Abigail&rft.au=Harnack%2C+Paul&rft.au=Bergh%2C+Adrienne&rft.date=2018-05-01&rft.pub=ACM&rft.eissn=2574-3864&rft.spage=376&rft.epage=386&rft_id=info:doi/10.1145%2F3196398.3196402&rft.externalDocID=8595221 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450357166/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450357166/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450357166/sc.gif&client=summon&freeimage=true |