A structured hardware software architecture for peptide based diagnosis - Sub-string matching problem with limited tolerance

The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in genera...

Full description

Saved in:
Bibliographic Details
Published in7th International Conference on Information and Automation for Sustainability pp. 1 - 7
Main Authors Vidanagamachchi, Sugandima M., Devapriya Dewasurendra, S., Ragel, Roshan G., Niranjan, Mahesan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2014
Subjects
Online AccessGet full text
ISSN2151-1802
DOI10.1109/ICIAFS.2014.7069624

Cover

Loading…
Abstract The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. Two problems arise from this: a) due to these (permitted) variations, the applicability of exact string matching methodologies could be questioned and b) the difficulty of defining a reference (peptide/amino acid) sequence for a particular set of proteins that are functionally indistinguishable, but with some variation in features. This paper presents a model-based hardware acceleration of a structured and practical inference approach that is developed and validated to solve the inference problem in a mass spectrometry experiment of realistic size. Our approach starts from an examination of the known set of splice variants and isoforms of a target protein to identify the Greatest Common Stable Substring (GCSS) of amino acids and the Substrings Subjects to Limited Variation (SSLV) and their respective locations on the GCSS. The hypothesis made here is that these latter substrings (SSLV) appear inside complete peptides and not cutting across peptide boundaries. Then we define and solve the Sub-string Matching Problem with Limited Tolerance (SMPLT) using the Bit-Split Aho Corasick Algorithm with Limited Tolerance (BSACLT) that we define and automate. This approach is validated on identified peptides in a labelled and clustered data set from UNIPROT. A model-based hardware software co-design strategy is used to accelerate the computational workflow of above described protein inference problem. Identification of Baylisascaris Procyonis infection was used as an application instance. This workflow can be generalised to any inexact multiple pattern matching application by replacing the patterns in a clustered and distributed environment which permits a distance between member strings to account for permitted deviations such as substitutions, insertions and deletions. The co-designed workflow achieved up to 70 times maximum speed-up compared to a similar workflow purely run on the processor used for co-design.
AbstractList The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. Two problems arise from this: a) due to these (permitted) variations, the applicability of exact string matching methodologies could be questioned and b) the difficulty of defining a reference (peptide/amino acid) sequence for a particular set of proteins that are functionally indistinguishable, but with some variation in features. This paper presents a model-based hardware acceleration of a structured and practical inference approach that is developed and validated to solve the inference problem in a mass spectrometry experiment of realistic size. Our approach starts from an examination of the known set of splice variants and isoforms of a target protein to identify the Greatest Common Stable Substring (GCSS) of amino acids and the Substrings Subjects to Limited Variation (SSLV) and their respective locations on the GCSS. The hypothesis made here is that these latter substrings (SSLV) appear inside complete peptides and not cutting across peptide boundaries. Then we define and solve the Sub-string Matching Problem with Limited Tolerance (SMPLT) using the Bit-Split Aho Corasick Algorithm with Limited Tolerance (BSACLT) that we define and automate. This approach is validated on identified peptides in a labelled and clustered data set from UNIPROT. A model-based hardware software co-design strategy is used to accelerate the computational workflow of above described protein inference problem. Identification of Baylisascaris Procyonis infection was used as an application instance. This workflow can be generalised to any inexact multiple pattern matching application by replacing the patterns in a clustered and distributed environment which permits a distance between member strings to account for permitted deviations such as substitutions, insertions and deletions. The co-designed workflow achieved up to 70 times maximum speed-up compared to a similar workflow purely run on the processor used for co-design.
Author Ragel, Roshan G.
Vidanagamachchi, Sugandima M.
Niranjan, Mahesan
Devapriya Dewasurendra, S.
Author_xml – sequence: 1
  givenname: Sugandima M.
  surname: Vidanagamachchi
  fullname: Vidanagamachchi, Sugandima M.
  organization: Dept. of Comput. Eng., Univ. of Peradeniya, Peradeniya, Sri Lanka
– sequence: 2
  givenname: S.
  surname: Devapriya Dewasurendra
  fullname: Devapriya Dewasurendra, S.
  organization: Dept. of Comput. Eng., Univ. of Peradeniya, Peradeniya, Sri Lanka
– sequence: 3
  givenname: Roshan G.
  surname: Ragel
  fullname: Ragel, Roshan G.
  organization: Dept. of Comput. Eng., Univ. of Peradeniya, Peradeniya, Sri Lanka
– sequence: 4
  givenname: Mahesan
  surname: Niranjan
  fullname: Niranjan, Mahesan
  organization: Dept. of Electron. & Comput. Sci., Univ. of Southampton, Southampton, UK
BookMark eNotUMtqwzAQVCGFJmm-IBf9gF09bNk6htC0gUAPac9hZa8TFb-QFEKhH181yR5mh11mmN0ZmfRDj4QsOUs5Z_plu96uNvtUMJ6lBVNaieyBzHhWaJ3lulQTMhU85wkvmXgiC--_WSypFS_KKfldUR_cuQpnhzU9gasv4JD6oQlXAq462YDXPW0GR0ccg62RGvBRUFs49oO3niZ0fzZJ9LL9kXYQoiyS0Q2mxY5ebDjR1nbRqqZhaNFBX-EzeWyg9bi49zn52rx-rt-T3cfbdr3aJVYUZUi0jvnLvJa5EY1mspCNAK4aUP9DUaoIEg2LCAjxrorlTBUKpJbSaCPnZHnztYh4GJ3twP0c7s-Sfy_2YbY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICIAFS.2014.7069624
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEL
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1479945986
9781479945986
EndPage 7
ExternalDocumentID 7069624
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i278t-9915185d35b2f90373f2a16fa685d3286d323eb0323aea617c050676a3933b9b3
IEDL.DBID RIE
ISSN 2151-1802
IngestDate Wed Aug 27 02:05:06 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i278t-9915185d35b2f90373f2a16fa685d3286d323eb0323aea617c050676a3933b9b3
PageCount 7
ParticipantIDs ieee_primary_7069624
PublicationCentury 2000
PublicationDate 20141201
PublicationDateYYYYMMDD 2014-12-01
PublicationDate_xml – month: 12
  year: 2014
  text: 20141201
  day: 01
PublicationDecade 2010
PublicationTitle 7th International Conference on Information and Automation for Sustainability
PublicationTitleAbbrev ICIAFS
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000396178
ssj0001774031
Score 1.5570712
Snippet The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources in respect of the...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Amino acids
Automata
Hardware
Peptides
Proteins
Software
Title A structured hardware software architecture for peptide based diagnosis - Sub-string matching problem with limited tolerance
URI https://ieeexplore.ieee.org/document/7069624
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Pa8IwFA7O03bZDx37TQ47LjVt2tQeRSY6cAw2wZskzeuQSRWtDMb--L2knbqxwy4h5FBCEvK-vnzv-wi59ROltUwClhpfsDAMNdP4n8Egg0ikaVtmzgxm-Cj7o_BhHI1r5G5TCwMAjnwGnu26t3wzT9c2VdaKuUxkEO6RPTxmZa3WJp_CRWKr3bb5FcQ13NkR2qDGrNBZJTrk86Q16A46vWfL7Aq96qs_7FVcdOkdkuH3vEpSyZu3LrSXfvySbPzvxI9Ic1vHR582EeqY1CA_IQc7EoQN8tmhpYTsegmG2hKsd7UEusLb2XV2HxooAly6sDQYA9SGP0NNydSbriijeAcx6wKSv1KEwY6jSSu_GmrTvXRWFlPRYj4D6-cBTTLq3b90-6xyZGDTIG4XDMFkhAHeiEgHWcJFLLJA-TJT0g4GbYmNAM2xVaBwP1IeYTiUSiRC6ESLU1LP5zmcWUoVAiNQkTYxhBwQdfAoUn6sTKZ8nppz0rDrOFmUohuTagkv_h6-JPt2L0ueyRWp48LBNaKFQt-4Y_IFN2e9aQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LSgMxFA1VF-rGN77NwqWpmckk0yxFLK22ItiCu5JM7kixtKUPBPHjvcmM9YELNyFkMYQk5Jy5OfceQs4jbaxVOmaZiwRLksQyi_8ZDHKQIstqKg9mMO171egmt0_yqUIuFrkwABDEZ1D13fCW70bZ3IfKLlOutIqTJbKCuJ_IIltrEVHhQvt8t68ICzIbHgwJPawxX-qsLDsUcX3ZvG5e1R-9tiuplt_9YbAS8KW-QdqfMytkJS_V-cxWs7dfRRv_O_VNsvuVyUcfFhi1RSow3Cbr34oQ7pD3K1oUkZ1PwFGfhPVqJkCneD-HzvenBooUl469EMYB9QDoqCu0ev0pZRRvIeZ9QIbPFIlwUGnS0rGG-oAvHRTpVHQ2GoB39IBd0q3fdK4brPRkYP04rc0Y0kmJEO-EtHGuuUhFHptI5Ub5wbimsBFgObYGDO5HxiUCojJCC2G1FXtkeTgawr4XVSE1AiOtSyHhgLyDS2mi1LjcRDxzB2THr2NvXJTd6JVLePj38BlZbXTarV6reX93RNb8vhaqk2OyjIsIJ8gdZvY0HJkPUBPAtg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=7th+International+Conference+on+Information+and+Automation+for+Sustainability&rft.atitle=A+structured+hardware+software+architecture+for+peptide+based+diagnosis+-+Sub-string+matching+problem+with+limited+tolerance&rft.au=Vidanagamachchi%2C+Sugandima+M.&rft.au=Devapriya+Dewasurendra%2C+S.&rft.au=Ragel%2C+Roshan+G.&rft.au=Niranjan%2C+Mahesan&rft.date=2014-12-01&rft.pub=IEEE&rft.issn=2151-1802&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FICIAFS.2014.7069624&rft.externalDocID=7069624
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2151-1802&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2151-1802&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2151-1802&client=summon