A simple guide to de novo transcriptome assembly and annotation

Abstract A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be...

Full description

Saved in:
Bibliographic Details
Published inBriefings in bioinformatics Vol. 23; no. 2
Main Authors Raghavan, Venket, Kraft, Louis, Mesny, Fantin, Rigerte, Linda
Format Journal Article
LanguageEnglish
Published England Oxford University Press 10.03.2022
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
AbstractList A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Abstract A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Author Kraft, Louis
Mesny, Fantin
Rigerte, Linda
Raghavan, Venket
Author_xml – sequence: 1
  givenname: Venket
  surname: Raghavan
  fullname: Raghavan, Venket
  email: vraghav@mpibpc.mpg.de
– sequence: 2
  givenname: Louis
  orcidid: 0000-0002-6465-4973
  surname: Kraft
  fullname: Kraft, Louis
  email: louis.kraft@mpibpc.mpg.de
– sequence: 3
  givenname: Fantin
  surname: Mesny
  fullname: Mesny, Fantin
– sequence: 4
  givenname: Linda
  surname: Rigerte
  fullname: Rigerte, Linda
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35076693$$D View this record in MEDLINE/PubMed
BookMark eNp9kctL5jAUxYMovlfuh4IgA1JNmjSPzQwioyMIbnQd8rifE2mT2rSC_72R71NUZBZ5QH733JN7dtB6TBEQOiD4hGBFT22wp9Ya23K6hrYJE6JmuGXrr3cu6pZxuoV2cn7AuMFCkk20RVssOFd0G_0-q3Lohw6q-zl4qKZUlT2mp1RNo4nZjWGYUg-VyRl62z1XJvqyYprMFFLcQxsL02XYX5276O7iz-353_r65vLq_Oy6dozhqXa-tFaOGAaKQgPgHMZKYit5Y8miccqDZ94JYRSVIKRsvcJAvATbKgN0F_1a6g6z7cE7iMVep4cx9GZ81skE_fklhn_6Pj1pqRrCKS4CP1cCY3qcIU-6D9lB15kIac664U3DOW6lKujhF_QhzWMs3ysUw1xIwdtC_fjo6N3K22wLQJaAG1POIyy0C8uhFYOh0wTr1_x0yU-v8is1x19q3mS_p4-WdJqH_4IvXWiq6w
CitedBy_id crossref_primary_10_1177_11779322241274957
crossref_primary_10_1186_s12859_023_05245_9
crossref_primary_10_1016_j_cofs_2023_101039
crossref_primary_10_1186_s12859_023_05614_4
crossref_primary_10_1016_j_toxicon_2023_107556
crossref_primary_10_3390_nu17050792
crossref_primary_10_1016_j_chemosphere_2024_142948
crossref_primary_10_1093_nargab_lqad007
crossref_primary_10_1186_s12983_024_00538_y
crossref_primary_10_1038_s41598_022_27199_3
crossref_primary_10_1093_bib_bbae313
crossref_primary_10_1093_nargab_lqad089
crossref_primary_10_32604_phyton_2023_046943
crossref_primary_10_3389_fgene_2024_1361418
crossref_primary_10_1002_ggn2_202200024
crossref_primary_10_3390_ijms26052373
crossref_primary_10_2174_1574893618666230222122054
crossref_primary_10_1016_j_compbiolchem_2024_108028
crossref_primary_10_3390_insects16030243
crossref_primary_10_48130_vegres_0024_0031
crossref_primary_10_1002_cpz1_70016
crossref_primary_10_1093_g3journal_jkae234
crossref_primary_10_1038_s41467_023_38785_y
crossref_primary_10_1186_s13007_024_01255_7
crossref_primary_10_1016_j_scitotenv_2024_175968
crossref_primary_10_1093_mollus_eyad001
crossref_primary_10_32604_phyton_2025_059598
crossref_primary_10_3390_jof9080790
crossref_primary_10_1038_s41597_025_04496_w
crossref_primary_10_3389_fpls_2022_1072765
crossref_primary_10_1002_cpz1_1054
crossref_primary_10_1093_bfgp_elae033
crossref_primary_10_1111_mec_16866
crossref_primary_10_1038_s41597_025_04393_2
crossref_primary_10_1111_mec_17550
crossref_primary_10_1007_s13337_024_00859_w
crossref_primary_10_1016_j_mex_2023_102449
crossref_primary_10_1016_j_bbagrm_2024_195058
crossref_primary_10_1016_j_jgg_2024_03_004
crossref_primary_10_1186_s13059_023_03141_2
crossref_primary_10_1093_bioadv_vbae152
crossref_primary_10_3390_life12111939
crossref_primary_10_1016_j_cbd_2023_101177
crossref_primary_10_1002_qub2_78
crossref_primary_10_15324_kjcls_2023_55_4_235
crossref_primary_10_3390_genes15121547
crossref_primary_10_3390_ijms241612712
crossref_primary_10_1093_nar_gkae833
crossref_primary_10_3390_molecules28186654
crossref_primary_10_1016_j_hermed_2024_100899
crossref_primary_10_1186_s12859_024_05887_3
crossref_primary_10_1111_ppl_13788
crossref_primary_10_1111_mec_17382
crossref_primary_10_3390_biology12070997
crossref_primary_10_1093_database_baaf019
crossref_primary_10_2174_1574893618666230707103956
crossref_primary_10_1016_j_jhip_2024_06_003
crossref_primary_10_14712_fb2023069030099
crossref_primary_10_3390_cells13221898
crossref_primary_10_1007_s10725_024_01125_1
crossref_primary_10_3390_cimb46080520
Cites_doi 10.1093/bioinformatics/btw218
10.12659/MSMBR.892101
10.1186/s12864-017-4379-x
10.1038/nmeth.1923
10.1186/gb-2013-14-12-r134
10.1093/bib/bbw020
10.1186/2047-217X-2-9
10.1093/gigascience/giz084
10.1186/s13059-015-0865-0
10.1093/nar/gkt006
10.1371/journal.pcbi.1008622
10.1101/gr.124321.111
10.1038/s41592-018-0046-7
10.1093/bib/bby067
10.1038/s41467-019-11272-z
10.1111/1755-0998.12933
10.1093/bioinformatics/bty669
10.1186/s12864-018-4869-5
10.3389/fgene.2019.00496
10.1093/gigascience/giaa163
10.1093/bioinformatics/bts565
10.1093/nar/gky350
10.1093/bioinformatics/bts611
10.1093/bioinformatics/btu077
10.1093/bioinformatics/btp616
10.1021/cr400585q
10.1371/journal.pcbi.1002195
10.3390/genes12030352
10.1261/rna.053959.115
10.1101/pdb.top084970
10.1371/journal.pone.0158565
10.1093/nar/gkv007
10.1186/1471-2105-11-119
10.1093/bib/bbv099
10.1093/bib/bbaa045
10.1093/bioinformatics/btab184
10.12688/f1000research.6924.1
10.1016/j.cbpc.2011.05.012
10.1038/s41598-019-41502-9
10.1371/journal.pcbi.1002514
10.1016/j.celrep.2016.12.063
10.1007/978-1-4939-3743-1_5
10.1186/s13059-019-1891-0
10.1073/pnas.1806447115
10.1093/bioinformatics/btl158
10.1038/nmeth.4197
10.1038/nprot.2013.084
10.3390/ijms21051720
10.1038/s41392-021-00486-7
10.1186/s40659-017-0114-y
10.1101/gr.260174.119
10.1101/2021.02.18.431773
10.12688/f1000research.29032.2
10.1101/gr.196469.115
10.3390/genes12070953
10.1111/1755-0998.13285
10.1186/s13059-019-1832-y
10.1002/wrna.1364
10.1038/ng0506-500
10.1038/s41598-019-42560-9
10.12688/f1000research.17351.1
10.1007/978-1-4939-9074-0_24
10.1093/nar/gkaa1113
10.1371/journal.pcbi.1000160
10.5195/jmla.2018.512
10.1093/bioinformatics/btt219
10.1109/TCBB.2018.2808350
10.1038/nrg2484
10.1007/978-3-319-16480-9_51
10.1186/1471-2105-10-421
10.1146/annurev-biodatasci-072018-021255
10.1002/pmic.201700071
10.1186/s12864-017-4002-1
10.1093/bioinformatics/bty560
10.1186/s13059-014-0550-8
10.1007/978-1-4939-9074-0_5
10.1101/2021.04.12.439551
10.1093/nar/gkq1019
10.1093/bioinformatics/bty895
10.1093/nar/gkv1189
10.1093/nar/gkz268
10.1186/s12862-021-01772-2
10.14806/ej.17.1.200
10.1093/bfgp/elu035
10.1093/bioinformatics/bts635
10.1093/bioinformatics/bty1057
10.1038/s41598-020-57961-4
10.1016/j.jmb.2015.11.006
10.1007/978-1-62703-646-7_5
10.1093/bioinformatics/bts091
10.1038/srep33964
10.1093/nar/28.1.27
10.1006/jmbi.2000.4315
10.1111/1755-0998.13106
10.1201/9781420011807
10.1126/science.1162986
10.14806/ej.23.0.897
10.1093/bioinformatics/btw354
10.1186/s12859-017-1906-3
10.1038/nbt.2931
10.1186/1471-2105-12-S10-S5
10.7717/peerj.8206
10.1093/nar/gkaa1079
10.1186/s13059-020-02227-5
10.1371/journal.pone.0157022
10.1093/bioinformatics/btu031
10.1038/s41576-019-0150-2
10.1093/nar/gky379
10.1038/s41580-020-00315-9
10.3390/insects12010067
10.1186/1471-2164-12-444
10.1002/cpmb.59
10.1093/bioinformatics/btu739
10.1038/s41592-021-01101-x
10.1093/molbev/msx148
10.1038/nature25458
10.1038/nbt.3820
10.1093/gigascience/giz039
10.1093/bioinformatics/btp352
10.1371/journal.pbio.0000057
10.1038/s41587-020-0439-x
10.1093/nar/gkx1069
10.1093/nar/gkaa1009
10.1093/nar/gkab565
10.1093/nar/gkm160
10.1038/s41598-017-01617-3
10.1186/s12859-014-0357-3
10.1186/s13040-016-0095-3
10.1093/bfgp/elt016
10.1007/s40484-018-0144-7
10.1093/bioinformatics/btw405
10.1038/nrg2934
10.1186/s13059-019-1690-7
10.1093/nar/gkaa1026
10.1038/s41587-019-0036-z
10.1093/bioinformatics/bty378
10.1186/s12864-021-07563-9
10.1002/ece3.5571
10.1016/j.gde.2011.04.001
10.1038/s41592-019-0437-4
10.1186/s13059-014-0553-5
10.1371/journal.pone.0069401
10.1371/journal.pcbi.1004772
10.1002/0471250953.bi0301s42
10.1093/bioinformatics/btu170
10.7717/peerj.5428
10.1093/nar/gkz991
10.1098/rstb.2019.0097
10.1093/bioinformatics/bts094
10.1101/gr.243212.118
10.21105/joss.02959
10.12688/f1000research.21142.1
10.1093/bioinformatics/btx198
10.1186/s13059-015-0596-2
10.1186/gb-2010-11-12-220
10.1186/1471-2105-12-323
10.1111/1462-2920.12174
10.1038/nbt.3519
10.1093/gigascience/giaa140
10.1101/733311
10.1038/nmeth.1517
10.1093/nar/gkaa1047
10.3389/fgene.2019.00317
10.1371/journal.pone.0185056
10.1093/bioinformatics/btu033
10.1093/nar/gky1085
10.1093/nar/gkx428
10.1186/s13742-015-0089-y
10.1038/s41576-020-0258-4
10.1093/molbev/mst010
10.3390/md18080392
10.1093/nar/gkx1095
10.1093/bioinformatics/btm098
10.1101/gr.210641.116
10.1093/bioinformatics/btw231
10.1038/nbt.1883
10.1007/s00778-005-0153-9
10.1002/bit.27467
10.1038/d41586-019-02619-z
10.1093/gigascience/giz100
10.1101/gr.8.3.186
10.1093/nar/gkaa1100
10.1038/nrg3863
10.1186/s13059-019-1715-2
10.1155/2015/862130
10.1038/nrm.2017.77
10.1093/bioinformatics/btt509
10.1186/s13059-016-0881-8
10.1186/s12864-020-6528-x
10.1093/nar/gkn176
10.1111/mec.13526
10.1186/gb4161
10.1093/bioinformatics/bty896
10.1038/nrg3068
10.1186/s13059-020-1935-5
10.1371/journal.pone.0163962
10.1186/s12864-019-6432-4
10.1038/s41597-019-0350-9
10.1111/1755-0998.13156
10.1093/nar/gkx1002
10.1007/978-1-4757-3783-7
10.1038/nsmb0207-103
10.1093/nar/gkaa1007
10.1093/bioinformatics/14.9.755
10.1093/bioinformatics/bts480
10.1016/S0022-2836(05)80360-2
10.1016/j.bpj.2015.12.041
10.1002/pro.3715
10.1186/s12859-020-03565-8
10.1111/1755-0998.12324
10.1073/pnas.84.13.4355
10.1007/978-1-4939-2291-8_8
10.1093/nar/gkq224
10.1093/nar/gkaa913
10.1371/journal.pone.0042882
10.1007/978-1-4939-9173-0_14
10.1038/nature11247
10.1371/journal.pgen.1003569
10.1186/s12859-017-1724-7
10.1093/nar/gkaa970
10.1126/science.1138341
10.1038/75556
10.1093/bioinformatics/btv106
10.1093/nar/gkv227
10.1038/nbt.3988
10.1093/bioinformatics/18.suppl_1.S181
10.1016/j.drudis.2019.03.030
10.1038/35057062
10.1186/s12859-019-3272-9
10.1038/s41467-018-04964-5
ContentType Journal Article
Copyright The Author(s) 2022. Published by Oxford University Press. 2022
The Author(s) 2022. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2022. Published by Oxford University Press. 2022
– notice: The Author(s) 2022. Published by Oxford University Press.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
5PM
DOI 10.1093/bib/bbab563
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Computer and Information Systems Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Biotechnology Research Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Engineering Research Database
Advanced Technologies Database with Aerospace
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

MEDLINE
Genetics Abstracts
CrossRef

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1477-4054
ExternalDocumentID PMC8921630
35076693
10_1093_bib_bbab563
10.1093/bib/bbab563
Genre Journal Article
Review
GroupedDBID ---
-E4
.2P
.I3
0R~
1TH
23N
2WC
36B
4.4
48X
53G
5GY
5VS
6J9
70D
8VB
AAGQS
AAHBH
AAIJN
AAIMJ
AAJKP
AAJQQ
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUQX
AAVAP
AAVLN
ABDBF
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPQP
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABXZS
ABZBJ
ACGFO
ACGFS
ACGOD
ACIWK
ACPRK
ACUFI
ACUHS
ACUXJ
ACYTK
ADBBV
ADEYI
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADOCK
ADPDF
ADQBN
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEGXH
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AEMOZ
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHQJS
AHXPO
AIAGR
AIJHB
AJEEA
AJEUX
AKHUL
AKVCP
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
ALXQX
AMNDL
ANAKG
APIBT
APWMN
ARIXL
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BEYMZ
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
E3Z
EAD
EAP
EAS
EBA
EBC
EBD
EBR
EBS
EBU
EE~
EJD
EMB
EMK
EMOBN
EST
ESX
F5P
F9B
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
K1G
KBUDW
KOP
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NU-
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
QWB
RD5
RPM
RUSNO
RW1
RXO
SV3
TEORI
TH9
TJP
TLC
TOX
TR2
TUS
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
ZL0
~91
AAYXX
AHGBF
CITATION
ADRIX
AFXEN
BCRHZ
CGR
CUY
CVF
ECM
EIF
NPM
ROX
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
5PM
ID FETCH-LOGICAL-c440t-cd2079c1a4e93e2eecc00980b862b1f2c9ded4dc77a938e7885d90e1d8eb59ae3
IEDL.DBID TOX
ISSN 1467-5463
1477-4054
IngestDate Thu Aug 21 18:11:29 EDT 2025
Thu Jul 10 18:09:30 EDT 2025
Mon Jun 30 08:52:24 EDT 2025
Wed Feb 19 02:26:53 EST 2025
Tue Jul 01 03:39:38 EDT 2025
Thu Apr 24 22:58:24 EDT 2025
Wed Apr 02 07:00:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords annotation
assembly
RNA-seq
tools
de novo
transcriptome
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2022. Published by Oxford University Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c440t-cd2079c1a4e93e2eecc00980b862b1f2c9ded4dc77a938e7885d90e1d8eb59ae3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Review-3
content type line 23
Venket Raghavan Louis Kraft are joint first coauthors.
Fantin Mesny and Linda Rigerte are joint second coauthors.
ORCID 0000-0002-6465-4973
OpenAccessLink https://dx.doi.org/10.1093/bib/bbab563
PMID 35076693
PQID 2640678765
PQPubID 26846
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_8921630
proquest_miscellaneous_2622660589
proquest_journals_2640678765
pubmed_primary_35076693
crossref_citationtrail_10_1093_bib_bbab563
crossref_primary_10_1093_bib_bbab563
oup_primary_10_1093_bib_bbab563
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-03-10
PublicationDateYYYYMMDD 2022-03-10
PublicationDate_xml – month: 03
  year: 2022
  text: 2022-03-10
  day: 10
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Briefings in bioinformatics
PublicationTitleAlternate Brief Bioinform
PublicationYear 2022
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Ewels (2022031506250617800_ref229) 2020; 38
Mirdita (2022031506250617800_ref111) 2021; 37
Love (2022031506250617800_ref121) 2014; 15
Punta (2022031506250617800_ref157) 2008; 4
Schulz (2022031506250617800_ref64) 2012; 28
Huerta-Cepas (2022031506250617800_ref184) 2019; 47
Perkel (2022031506250617800_ref218) 2019; 573
Deorowicz (2022031506250617800_ref211) 2016; 6
Nachtigall (2022031506250617800_ref146) 2021; 22
Castrignanò (2022031506250617800_ref245) 2020; 21
Han (2022031506250617800_ref10) 2015; 9
Kukurba (2022031506250617800_ref12) 2015; 2015
International Human Genome Sequencing Consortium (2022031506250617800_ref76) 2001; 409
Beier (2022031506250617800_ref248) 2017; 33
Spillane (2022031506250617800_ref216) 2021; 21
Leipzig (2022031506250617800_ref217) 2016
Zhao (2022031506250617800_ref70) 2020; 17
Reich (2022031506250617800_ref239) 2006; 38
Bolger (2022031506250617800_ref34) 2014; 30
Mölder (2022031506250617800_ref221) 2021; 10
Larkin (2022031506250617800_ref165) 2021; 49
Varet (2022031506250617800_ref128) 2016; 11
Kanehisa (2022031506250617800_ref187) 2021; 49
McDermaid (2022031506250617800_ref124) 2019; 20
Cozzetto (2022031506250617800_ref196) 2017
Mühr (2022031506250617800_ref73) 2020; 15
European Organization for Nuclear Research and OpenAIRE (2022031506250617800_ref251) 2013
Wang (2022031506250617800_ref138) 2013; 41
Bucchini (2022031506250617800_ref203); 49
Wu (2022031506250617800_ref208) 2011; 12
Nowoshilow (2022031506250617800_ref103) 2018; 554
Statello (2022031506250617800_ref3) 2021; 22
Dohmen (2022031506250617800_ref79) 2016; 32
Razo-Mendivil (2022031506250617800_ref113) 2020; 21
McManus (2022031506250617800_ref60) 2011; 21
Vera-Khlara (2022031506250617800_ref130) 2021; 12
Eid (2022031506250617800_ref257) 2009; 323
Strozzi (2022031506250617800_ref220) 2019
Landau (2022031506250617800_ref232) 2021; 6
Morlan (2022031506250617800_ref39) 2012; 7
Zhang (2022031506250617800_ref99) 2017; 18
Guo (2022031506250617800_ref136) 2015
Sayadi (2022031506250617800_ref155) 2016; 11
Armenteros (2022031506250617800_ref173) 2019; 37
Törönen (2022031506250617800_ref209) 2018; 46
Steinegger (2022031506250617800_ref109) 2018; 9
Dobin (2022031506250617800_ref94) 2013; 29
Kapranov (2022031506250617800_ref142) 2007; 316
Sillitoe (2022031506250617800_ref178) 2021; 49
Li (2022031506250617800_ref107) 2006; 22
Pulido (2022031506250617800_ref206) 2021
Waardenberg (2022031506250617800_ref131) 2019; 7
Finotello (2022031506250617800_ref134) 2015; 14
Oshlack (2022031506250617800_ref115) 2010; 11
Liu (2022031506250617800_ref57) 2016; 12
Soneson (2022031506250617800_ref258) 2019; 10
Altenhoff (2022031506250617800_ref185) 1; 49
Love (2022031506250617800_ref126) 2017; 1
stackoverflow (2022031506250617800_ref226) 2020
Cavallaro (2022031506250617800_ref51) 2021; 22
Zhang (2022031506250617800_ref59) 2021; 6
Miller (2022031506250617800_ref215) 2019; 35
Tarazona (2022031506250617800_ref44) 2011; 21
Kalvari (2022031506250617800_ref41) 2021; 49
Jassal (2022031506250617800_ref190) 2020; 48
Gollery (2022031506250617800_ref169) 2008
Van Bel (2022031506250617800_ref202) 2013; 14
Mantione (2022031506250617800_ref9) 2014; 20
Slatko (2022031506250617800_ref6) 2018; 122
Harris (2022031506250617800_ref166) 2020; 48
Voss (2022031506250617800_ref231) 2017
Grüning (2022031506250617800_ref243) 2018; 15
NCBI Resource Coordinators (2022031506250617800_ref161) 2018; 46
Reiter (2022031506250617800_ref223) 2021; 10
Amarasinghe (2022031506250617800_ref255) 2020; 21
Zyprych-Walczak (2022031506250617800_ref116) 2015
Kashyap (2022031506250617800_ref145) 2020; 21
Altenhoff (2022031506250617800_ref207) 2019; 29
Carruthers (2022031506250617800_ref24) 2018; 19
McCorrison (2022031506250617800_ref49) 2014; 15
Amaral (2022031506250617800_ref143) 2013; 12
Milicchio (2022031506250617800_ref233) 2016; 9
Camacho (2022031506250617800_ref159) 2009; 10
Holoch (2022031506250617800_ref4) 2015; 16
Bushnell (2022031506250617800_ref31) 2017; 12
Crusoe (2022031506250617800_ref47) 2015; 4
Seppey (2022031506250617800_ref77) 2019
Casimiro-Soriguer (2022031506250617800_ref200) 2017; 17
Mikheyev (2022031506250617800_ref256) 2014; 14
Yu (2022031506250617800_ref66) 2013; 29
Byrne (2022031506250617800_ref254) 2019; 374
Li (2022031506250617800_ref95) 2009; 25
Kanehisa (2022031506250617800_ref189) 2000; 28
Lu (2022031506250617800_ref199) 2020; 48
Bushmanova (2022031506250617800_ref83) 2016; 32
Zhang (2022031506250617800_ref213) 2017
Bray (2022031506250617800_ref97) 2016; 34
Rosen (2022031506250617800_ref21) 2021; 12
Soderlund (2022031506250617800_ref205) 2013; 8
Davidson (2022031506250617800_ref114) 2017; 18
Harrison (2022031506250617800_ref174) 2017; 18
Smith-Unna (2022031506250617800_ref81) 2016; 26
Malik (2022031506250617800_ref112) 2018; 34
Altschul (2022031506250617800_ref158) 1990; 215
Tang (2022031506250617800_ref148) 2015; 43
Conery (2022031506250617800_ref219) 2005; 14
Emms (2022031506250617800_ref214) 2019; 20
Götz (2022031506250617800_ref186) 2008; 36
Garcia (2022031506250617800_ref26) 2012; 155
Michael (2022031506250617800_ref234) 2010; 11
Shahjaman (2022031506250617800_ref125) 2019; 8
UniProt Consortium (2022031506250617800_ref162) 2021; 49
Schaarschmidt (2022031506250617800_ref101) 2020; 21
Koonin (2022031506250617800_ref153) 2003
Shen (2022031506250617800_ref74) 2016; 11
Lewis (2022031506250617800_ref179) 2018; 46
Afgan (2022031506250617800_ref23) 2018; 46
Van Roey (2022031506250617800_ref175) 2014; 114
Katoh (2022031506250617800_ref210) 2013; 30
Everaert (2022031506250617800_ref100) 2017; 7
Wood (2022031506250617800_ref35) 2019; 20
The ENCODE Project Consortium (2022031506250617800_ref104) 2012; 489
Pearson (2022031506250617800_ref156) 2014
Dessimoz (2022031506250617800_ref182) 2016
Davidson (2022031506250617800_ref62) 2014; 15
Suzek (2022031506250617800_ref163) 2015; 31
Stephens (2022031506250617800_ref119) 2017; 18
Huerta-Cepas (2022031506250617800_ref183) 2017; 34
Kotliar (2022031506250617800_ref230) 2019; 8
Durai (2022031506250617800_ref50) 2019; 9
Ewels (2022031506250617800_ref28) 2016; 32
Steinegger (2022031506250617800_ref108) 2017; 35
Van den Berge (2022031506250617800_ref132) 2019; 2
Musacchia (2022031506250617800_ref198) 2015; 31
Stark (2022031506250617800_ref7) 2019; 20
Nawrocki (2022031506250617800_ref139) 2013; 29
Köster (2022031506250617800_ref227) 2012; 28
Peréz-Sánchez (2022031506250617800_ref247) 2015
Zhu (2022031506250617800_ref118) 2019; 35
Vandepoele (2022031506250617800_ref168) 2013; 15
Limin (2022031506250617800_ref86) 2012; 28
Li (2022031506250617800_ref5) 2019; 10
Ewing (2022031506250617800_ref32) 1998; 8
Amstutz (2022031506250617800_ref225) 2016
Bryant (2022031506250617800_ref75) 2017; 18
Schimmel (2022031506250617800_ref2) 2018; 19
(2022031506250617800_ref167) 2018; 46
Ceschin (2022031506250617800_ref84) 2020; 10
Hyatt (2022031506250617800_ref147) 2010; 11
Blankenberg (2022031506250617800_ref236) 2014; 15
Stamatakis (2022031506250617800_ref212) 2014; 30
Struhl (2022031506250617800_ref52) 2007; 14
Kang (2022031506250617800_ref137) 2017; 45
Suzek (2022031506250617800_ref164) 2007; 23
Quast (2022031506250617800_ref42) 2013; 41
Hansen (2022031506250617800_ref53) 2010; 38
Smith-Unna (2022031506250617800_ref80) 2016; 26
R Core Team (2022031506250617800_ref120) 2021
Krogh (2022031506250617800_ref193) 2001; 305
Martin (2022031506250617800_ref30) 2011; 17
Pearson (2022031506250617800_ref154) 2013; Chapter 3
Rivera-Vicéns (2022031506250617800_ref89) 2021
Buchfink (2022031506250617800_ref160) 2021; 18
Eddy (2022031506250617800_ref151) 2011; 7
Kim (2022031506250617800_ref36) 2016; 26
Volden (2022031506250617800_ref259) 2018; 115
Martin (2022031506250617800_ref15) 2011; 12
Peona (2022031506250617800_ref16) 2018; 18
Li (2022031506250617800_ref38) 2015
Ozsolak (2022031506250617800_ref54) 2011; 12
O’Leary (2022031506250617800_ref140) 2016; 44
Wu (2022031506250617800_ref102) 2018; 19
Zhao (2022031506250617800_ref68) 2019; 20
Chabikwa (2022031506250617800_ref20) 2020; 7
Wedemeyer (2022031506250617800_ref48) 2017; 18
Ortiz (2022031506250617800_ref90) 2021; 12
Hölzer (2022031506250617800_ref58) 2019; 8
Buccitelli (2022031506250617800_ref1) 2020; 21
Hrdlickova (2022031506250617800_ref14) 2017; 8
Mirdita (2022031506250617800_ref110) 2019; 35
Li (2022031506250617800_ref96) 2011; 12
Salzberg (2022031506250617800_ref13) 2019; 20
Steinegger (2022031506250617800_ref152) 2019; 16
Patro (2022031506250617800_ref98) 2017; 14
MacManes (2022031506250617800_ref88) 2018; 6
Risso (2022031506250617800_ref127) 2014; 32
Okonechnikov (2022031506250617800_ref238) 2012; 28
Li (2022031506250617800_ref82) 2014; 15
Conesa (2022031506250617800_ref91) 2016; 17
Todd (2022031506250617800_ref17) 2016; 25
Wang (2022031506250617800_ref8) 2009; 10
Thunders (2022031506250617800_ref253) 2017; 50
Leinonen (2022031506250617800_ref250) 2011; 39
Grabherr (2022031506250617800_ref46) 2011; 29
Jones (2022031506250617800_ref176) 2014; 30
Altenhoff (2022031506250617800_ref195) 2012; 8
Moreno-Santillán (2022031506250617800_ref19) 2019; 9
Soderlund (2022031506250617800_ref204) 2019
Sena Brandine (2022031506250617800_ref27) 2019; 8
Kanehisa (2022031506250617800_ref191) 2016; 428
Kerkvliet (2022031506250617800_ref85) 2019; 9
Song (2022031506250617800_ref29) 2015; 4
Cabau (2022031506250617800_ref87) 2017; 5
Chang (2022031506250617800_ref71) 2015; 16
Zhao (2022031506250617800_ref106) 2019; 24
Bushmanova (2022031506250617800_ref56) 2019; 8
Wu (2022031506250617800_ref129) 2016; 32
Di Tommaso (2022031506250617800_ref224) 2017; 35
Mistry (2022031506250617800_ref177) 2021; 49
Ritchie (2022031506250617800_ref123) 2015; 43
Chen (2022031506250617800_ref33) 2018; 34
Bryant (2022031506250617800_ref192) 2017; 18
Zdobnov (2022031506250617800_ref78) 2021; 49
Canzar (2022031506250617800_ref55) 2016; 17
Robinson (2022031506250617800_ref122) 2010; 26
Wang (2022031506250617800_ref43) 2011; 12
Liu (2022031506250617800_ref72) 2019; 20
Gene Ontology Consortium (2022031506250617800_ref180) 2021; 49
Wilfinger (2022031506250617800_ref117) 2021; 22
Ashburner (2022031506250617800_ref181) 2000; 25
Kanehisa (2022031506250617800_ref188) 2019; 28
Zhao (2022031506250617800_ref37) 2018; 8
Schurch (2022031506250617800_ref133) 2016; 22
Lagesen (2022031506250617800_ref14
References_xml – volume: 32
  start-page: 2210
  issue: 14
  year: 2016
  ident: 2022031506250617800_ref83
  article-title: rnaQUAST: a quality assessment tool forde novotranscriptome assemblies: table 1
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw218
– volume: 20
  start-page: 138
  year: 2014
  ident: 2022031506250617800_ref9
  article-title: Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq
  publication-title: Med Sci Monit Basic Res
  doi: 10.12659/MSMBR.892101
– volume: 19
  start-page: 32
  issue: 1
  year: 2018
  ident: 2022031506250617800_ref24
  article-title: De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
  publication-title: BMC Genomics
  doi: 10.1186/s12864-017-4379-x
– volume: 9
  start-page: 357
  issue: 4
  year: 2012
  ident: 2022031506250617800_ref93
  article-title: Fast gapped-read alignment with bowtie 2
  publication-title: Nat Methods
  doi: 10.1038/nmeth.1923
– volume: 14
  issue: 12
  year: 2013
  ident: 2022031506250617800_ref202
  article-title: TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes
  publication-title: Genome Biol
  doi: 10.1186/gb-2013-14-12-r134
– year: 2016
  ident: 2022031506250617800_ref217
  article-title: A review of bioinformatic pipeline frameworks
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbw020
– volume: 2
  start-page: 9
  issue: 1
  year: 2013
  ident: 2022031506250617800_ref246
  article-title: Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data
  publication-title: Gigascience
  doi: 10.1186/2047-217X-2-9
– volume: 8
  issue: 7
  year: 2019
  ident: 2022031506250617800_ref230
  article-title: CWL-airflow: a lightweight pipeline manager supporting common workflow language
  publication-title: Gigascience
  doi: 10.1093/gigascience/giz084
– volume: 17
  start-page: 16
  issue: 1
  year: 2016
  ident: 2022031506250617800_ref55
  article-title: CIDANE: comprehensive isoform discovery and abundance estimation
  publication-title: Genome Biol
  doi: 10.1186/s13059-015-0865-0
– volume: 41
  start-page: e74
  issue: 6
  year: 2013
  ident: 2022031506250617800_ref138
  article-title: CPAT: coding-potential assessment tool using an alignment-free logistic regression model
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkt006
– volume: 17
  issue: 2
  year: 2021
  ident: 2022031506250617800_ref222
  article-title: Using prototyping to choose a bioinformatics workflow management system
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1008622
– volume: 21
  start-page: 2213
  issue: 12
  year: 2011
  ident: 2022031506250617800_ref44
  article-title: Differential expression in RNA-seq: a matter of depth
  publication-title: Genome Res
  doi: 10.1101/gr.124321.111
– volume: 15
  start-page: 475
  issue: 7
  year: 2018
  ident: 2022031506250617800_ref243
  article-title: Bioconda: sustainable and comprehensive software distribution for the life sciences
  publication-title: Nat Methods
  doi: 10.1038/s41592-018-0046-7
– volume: 20
  start-page: 2044
  issue: 6
  year: 2019
  ident: 2022031506250617800_ref124
  article-title: Interpretation of differential gene expression results of RNA-seq data: review and integration
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bby067
– volume: 10
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref258
  article-title: A comprehensive examination of nanopore native RNA sequencing for characterization of complex transcriptomes
  publication-title: Nat Commun
  doi: 10.1038/s41467-019-11272-z
– volume: 18
  start-page: 1188
  issue: 6
  year: 2018
  ident: 2022031506250617800_ref16
  article-title: How complete are “complete” genome assemblies?-an avian perspective
  publication-title: Mol Ecol Resour
  doi: 10.1111/1755-0998.12933
– volume: 35
  year: 2019
  ident: 2022031506250617800_ref215
  article-title: Justorthologs: a fast, accurate and user-friendly ortholog identification algorithm
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty669
– volume-title: Linux in Easy Steps
  year: 2010
  ident: 2022031506250617800_ref241
– volume: 19
  issue: 1
  year: 2018
  ident: 2022031506250617800_ref102
  article-title: Limitations of alignment-free tools in total RNA-seq quantification
  publication-title: BMC Genomics
  doi: 10.1186/s12864-018-4869-5
– volume: 10
  start-page: 496
  year: 2019
  ident: 2022031506250617800_ref5
  article-title: Coding or noncoding, the converging concepts of RNAs
  publication-title: Front Genet
  doi: 10.3389/fgene.2019.00496
– volume: 10
  issue: 2
  year: 2021
  ident: 2022031506250617800_ref22
  article-title: Transcriptome annotation in the cloud: complexity, best practices, and cost
  publication-title: Gigascience
  doi: 10.1093/gigascience/giaa163
– volume-title: Zenodo
  year: 2013
  ident: 2022031506250617800_ref251
– volume: 28
  start-page: 3150
  issue: 23
  year: 2012
  ident: 2022031506250617800_ref86
  article-title: CD-HIT: accelerated for clustering the next-generation sequencing data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts565
– volume: 46
  start-page: W84
  issue: W1
  year: 2018
  ident: 2022031506250617800_ref209
  article-title: PANNZER2: a rapid functional annotation web server
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky350
– volume: 28
  start-page: 3211
  issue: 24
  year: 2012
  ident: 2022031506250617800_ref40
  article-title: SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts611
– volume: 30
  start-page: 1660
  issue: 12
  year: 2014
  ident: 2022031506250617800_ref63
  article-title: SOAPdenovo-trans: de novo transcriptome assembly with short RNA-Seq reads
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu077
– volume: 26
  year: 2010
  ident: 2022031506250617800_ref122
  article-title: Edger: a bioconductor package for differential expression analysis of digital gene expression data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp616
– volume: 114
  start-page: 6733
  issue: 13
  year: 2014
  ident: 2022031506250617800_ref175
  article-title: Short linear motifs: ubiquitous and functionally diverse protein interaction modules directing cell regulation
  publication-title: Chem Rev
  doi: 10.1021/cr400585q
– volume: 7
  issue: 10
  year: 2011
  ident: 2022031506250617800_ref151
  article-title: Accelerated profile HMM searches
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1002195
– volume: 12
  start-page: 352
  issue: 3
  year: 2021
  ident: 2022031506250617800_ref130
  article-title: Temporal dynamic methods for bulk RNA-Seq time series data
  publication-title: Genes (Basel)
  doi: 10.3390/genes12030352
– year: 2017
  ident: 2022031506250617800_ref231
  article-title: Full-stack genomics pipelining with GATK4 + WDL + Cromwell
  publication-title: ISCB Community Journal
– volume: 22
  start-page: 839
  issue: 6
  year: 2016
  ident: 2022031506250617800_ref133
  article-title: How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?
  publication-title: RNA
  doi: 10.1261/rna.053959.115
– volume: 2015
  start-page: 951
  issue: 11
  year: 2015
  ident: 2022031506250617800_ref12
  article-title: RNA sequencing and analysis
  publication-title: Cold Spring Harb Protoc
  doi: 10.1101/pdb.top084970
– volume: 11
  issue: 7
  year: 2016
  ident: 2022031506250617800_ref155
  article-title: The de novo transcriptome and its functional annotation in the seed beetle callosobruchus maculatus
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0158565
– volume: 43
  year: 2015
  ident: 2022031506250617800_ref123
  article-title: Limma powers differential expression analyses for rna-sequencing and microarray studies
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkv007
– volume: 11
  start-page: 119
  issue: 1
  year: 2010
  ident: 2022031506250617800_ref147
  article-title: Prodigal: prokaryotic gene recognition and translation initiation site identification
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-11-119
– volume: 17
  start-page: 1009
  issue: 6
  year: 2016
  ident: 2022031506250617800_ref172
  article-title: Multiple sequence alignment modeling: methods and applications
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbv099
– volume: 22
  issue: 3
  year: 2021
  ident: 2022031506250617800_ref146
  article-title: CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbaa045
– volume: 37
  start-page: 3029
  issue: 18
  year: 2021
  ident: 2022031506250617800_ref111
  article-title: Fast and sensitive taxonomic assignment to metagenomic contigs
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btab184
– volume: 4
  start-page: 900
  year: 2015
  ident: 2022031506250617800_ref47
  article-title: The khmer software package: enabling efficient nucleotide sequence analysis
  publication-title: F1000Res
  doi: 10.12688/f1000research.6924.1
– volume: 155
  start-page: 95
  issue: 1
  year: 2012
  ident: 2022031506250617800_ref26
  article-title: Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly
  publication-title: Comp Biochem Physiol C Toxicol Pharmacol
  doi: 10.1016/j.cbpc.2011.05.012
– volume: 9
  start-page: 5133
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref50
  article-title: Improving in-silico normalization using read weights
  publication-title: Sci Rep
  doi: 10.1038/s41598-019-41502-9
– start-page: 621690
  year: 2015
  ident: 2022031506250617800_ref116
  article-title: The impact of normalization methods on RNA-seq data analysis
  publication-title: Biomed Res Int
– volume: 8
  issue: 5
  year: 2012
  ident: 2022031506250617800_ref195
  article-title: Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1002514
– volume: 18
  start-page: 762
  issue: 3
  year: 2017
  ident: 2022031506250617800_ref192
  article-title: A tissue-mapped axolotl DE novo transcriptome enables identification of limb regeneration factors
  publication-title: Cell Rep
  doi: 10.1016/j.celrep.2016.12.063
– start-page: 55
  volume-title: The Gene Ontology Handbook
  year: 2017
  ident: 2022031506250617800_ref196
  doi: 10.1007/978-1-4939-3743-1_5
– volume: 20
  start-page: 257
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref35
  article-title: Improved metagenomic analysis with kraken 2
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1891-0
– volume: 115
  start-page: 9726
  issue: 39
  year: 2018
  ident: 2022031506250617800_ref259
  article-title: Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA
  publication-title: Proc Natl Acad Sci U S A
  doi: 10.1073/pnas.1806447115
– volume: 22
  start-page: 1658
  issue: 13
  year: 2006
  ident: 2022031506250617800_ref107
  article-title: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl158
– volume: 14
  start-page: 417
  issue: 4
  year: 2017
  ident: 2022031506250617800_ref98
  article-title: Salmon provides fast and bias-aware quantification of transcript expression
  publication-title: Nat Methods
  doi: 10.1038/nmeth.4197
– volume: 8
  start-page: 1494
  issue: 8
  year: 2013
  ident: 2022031506250617800_ref45
  article-title: De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis
  publication-title: Nat Protoc
  doi: 10.1038/nprot.2013.084
– volume: 21
  start-page: 1720
  issue: 5
  year: 2020
  ident: 2022031506250617800_ref101
  article-title: Evaluation of seven different RNA-Seq alignment tools based on experimental data from the model plant arabidopsis thaliana
  publication-title: Int J Mol Sci
  doi: 10.3390/ijms21051720
– volume: 6
  start-page: 78
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref59
  article-title: Alternative splicing and cancer: a systematic review
  publication-title: Signal Transduct Target Ther
  doi: 10.1038/s41392-021-00486-7
– volume: 50
  start-page: 7
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref253
  article-title: De novo transcriptome assembly, functional annotation and differential gene expression analysis of juvenile and adult e. fetida, a model oligochaete used in ecotoxicological studies
  publication-title: Biol Res
  doi: 10.1186/s40659-017-0114-y
– volume: 30
  start-page: 1191
  issue: 8
  year: 2020
  ident: 2022031506250617800_ref67
  article-title: RNA-bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes
  publication-title: Genome Res
  doi: 10.1101/gr.260174.119
– year: 2021
  ident: 2022031506250617800_ref89
  article-title: TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
  doi: 10.1101/2021.02.18.431773
– volume: 10
  start-page: 33
  year: 2021
  ident: 2022031506250617800_ref221
  article-title: Sustainable data analysis with snakemake
  publication-title: F1000Res
  doi: 10.12688/f1000research.29032.2
– volume: 26
  start-page: 1134
  issue: 8
  year: 2016
  ident: 2022031506250617800_ref81
  article-title: TransRate: reference-free quality assessment of de novo transcriptome assemblies
  publication-title: Genome Res
  doi: 10.1101/gr.196469.115
– volume: 12
  start-page: 953
  issue: 7
  year: 2021
  ident: 2022031506250617800_ref90
  article-title: Pincho: a modular approach to high quality DE novo transcriptomics
  publication-title: Genes (Basel)
  doi: 10.3390/genes12070953
– volume: 21
  start-page: 621
  issue: 2
  year: 2021
  ident: 2022031506250617800_ref201
  article-title: TOA: a software package for automated functional annotation in non-model plant species
  publication-title: Mol Ecol Resour
  doi: 10.1111/1755-0998.13285
– volume: 20
  year: 2019
  ident: 2022031506250617800_ref214
  article-title: Orthofinder: phylogenetic orthology inference for comparative genomics
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1832-y
– volume: 8
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref14
  article-title: RNA-Seq methods for transcriptome analysis
  publication-title: Wiley Interdiscip Rev RNA
  doi: 10.1002/wrna.1364
– year: 2021
  ident: 2022031506250617800_ref120
  article-title: R: a language and environment for statistical computing
– volume: 38
  start-page: 500
  issue: 5
  year: 2006
  ident: 2022031506250617800_ref239
  article-title: GenePattern 2.0
  publication-title: Nat Genet
  doi: 10.1038/ng0506-500
– volume: 9
  start-page: 6222
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref19
  article-title: De novo transcriptome assembly and functional annotation in five species of bats
  publication-title: Sci Rep
  doi: 10.1038/s41598-019-42560-9
– volume: 8
  year: 2019
  ident: 2022031506250617800_ref125
  article-title: Robust and efficient identification of biomarkers from rna-seq data using median control chart
  publication-title: F1000Research
  doi: 10.12688/f1000research.17351.1
– start-page: 723
  volume-title: Evolutionary Genomics
  year: 2019
  ident: 2022031506250617800_ref220
  doi: 10.1007/978-1-4939-9074-0_24
– volume: 49
  start-page: D325
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref180
  article-title: The gene ontology resource: enriching a GOld mine
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1113
– volume: 4
  issue: 10
  year: 2008
  ident: 2022031506250617800_ref157
  article-title: The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1000160
– volume-title: Python: A dynamic, open source programming language
  year: 2021
  ident: 2022031506250617800_ref242
– volume: 106
  start-page: 494
  issue: 4
  year: 2018
  ident: 2022031506250617800_ref244
  article-title: High-performance computing service for bioinformatics and data science
  publication-title: J Med Libr Assoc
  doi: 10.5195/jmla.2018.512
– volume: 29
  start-page: i326
  issue: 13
  year: 2013
  ident: 2022031506250617800_ref66
  article-title: IDBA-Tran: a more robust de novo de bruijn graph assembler for transcriptomes with uneven expression levels
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt219
– volume: 17
  start-page: 938
  issue: 3
  year: 2020
  ident: 2022031506250617800_ref70
  article-title: IsoTree: a new framework for de novo transcriptome assembly from RNA-seq reads
  publication-title: IEEE/ACM Trans Comput Biol Bioinform
  doi: 10.1109/TCBB.2018.2808350
– volume: 10
  start-page: 57
  issue: 1
  year: 2009
  ident: 2022031506250617800_ref8
  article-title: RNA-Seq: a revolutionary tool for transcriptomics
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg2484
– start-page: 527
  volume-title: Bioinformatics and Biomedical Engineering
  year: 2015
  ident: 2022031506250617800_ref247
  doi: 10.1007/978-3-319-16480-9_51
– volume: 10
  start-page: 421
  issue: 1
  year: 2009
  ident: 2022031506250617800_ref159
  article-title: BLAST+: architecture and applications
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-10-421
– volume: 2
  start-page: 139
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref132
  article-title: RNA sequencing data: Hitchhiker’s guide to expression analysis
  publication-title: Annu Rev Biomed Data Sci
  doi: 10.1146/annurev-biodatasci-072018-021255
– volume: 17
  issue: 12
  year: 2017
  ident: 2022031506250617800_ref200
  article-title: Sma3s: a universal tool for easy functional annotation of proteomes and transcriptomes
  publication-title: Proteomics
  doi: 10.1002/pmic.201700071
– volume: 18
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref99
  article-title: Evaluation and comparison of computational tools for RNA-seq isoform quantification
  publication-title: BMC Genomics
  doi: 10.1186/s12864-017-4002-1
– volume: 34
  start-page: i884
  issue: 17
  year: 2018
  ident: 2022031506250617800_ref33
  article-title: fastp: an ultra-fast all-in-one FASTQ preprocessor
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty560
– volume: 15
  year: 2014
  ident: 2022031506250617800_ref121
  article-title: Moderated estimation of fold change and dispersion for rna-seq data with deseq2
  publication-title: Genome Biol
  doi: 10.1186/s13059-014-0550-8
– start-page: 149
  volume-title: Evolutionary Genomics
  year: 2019
  ident: 2022031506250617800_ref194
  doi: 10.1007/978-1-4939-9074-0_5
– year: 2021
  ident: 2022031506250617800_ref149
  article-title: Borf: improved ORF prediction in de-novo assembled transcriptome annotation
  doi: 10.1101/2021.04.12.439551
– volume: 39
  start-page: D19
  issue: Database issue
  year: 2011
  ident: 2022031506250617800_ref250
  article-title: The sequence read archive
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkq1019
– volume: 35
  year: 2019
  ident: 2022031506250617800_ref118
  article-title: Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty895
– volume: 44
  start-page: D733
  issue: D1
  year: 2016
  ident: 2022031506250617800_ref140
  article-title: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkv1189
– volume: 47
  start-page: W636
  issue: W1
  year: 2019
  ident: 2022031506250617800_ref150
  article-title: The EMBL-EBI search and sequence analysis tools APIs in 2019
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkz268
– volume: 21
  year: 2021
  ident: 2022031506250617800_ref216
  article-title: Signal, bias, and the role of transcriptome assembly quality in phylogenomic inference
  publication-title: BMC ecology and evolution
  doi: 10.1186/s12862-021-01772-2
– volume: 17
  start-page: 10
  issue: 1
  year: 2011
  ident: 2022031506250617800_ref30
  article-title: Cutadapt removes adapter sequences from high-throughput sequencing reads
  publication-title: EMBnet J
  doi: 10.14806/ej.17.1.200
– volume: 14
  start-page: 130
  issue: 2
  year: 2015
  ident: 2022031506250617800_ref134
  article-title: Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis
  publication-title: Brief Funct Genomics
  doi: 10.1093/bfgp/elu035
– volume: 29
  start-page: 15
  issue: 1
  year: 2013
  ident: 2022031506250617800_ref94
  article-title: STAR: ultrafast universal RNA-seq aligner
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts635
– volume: 35
  start-page: 2856
  issue: 16
  year: 2019
  ident: 2022031506250617800_ref110
  article-title: MMseqs2 desktop and local web server app for fast, interactive sequence searches
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty1057
– volume: 10
  start-page: 1053
  issue: 1
  year: 2020
  ident: 2022031506250617800_ref84
  article-title: The rhinella arenarum transcriptome: de novo assembly, annotation and gene prediction
  publication-title: Sci Rep
  doi: 10.1038/s41598-020-57961-4
– volume: 428
  start-page: 726
  issue: 4
  year: 2016
  ident: 2022031506250617800_ref191
  article-title: BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences
  publication-title: J Mol Biol
  doi: 10.1016/j.jmb.2015.11.006
– start-page: 75
  volume-title: Multiple Sequence Alignment Methods
  year: 2014
  ident: 2022031506250617800_ref156
  doi: 10.1007/978-1-62703-646-7_5
– volume: 28
  start-page: 1166
  issue: 8
  year: 2012
  ident: 2022031506250617800_ref238
  article-title: Unipro UGENE: a unified bioinformatics toolkit
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts091
– volume: 6
  year: 2016
  ident: 2022031506250617800_ref211
  article-title: Famsa: fast and accurate multiple sequence alignment of huge protein families
  publication-title: Sci Rep
  doi: 10.1038/srep33964
– volume: 3
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref25
  article-title: Sequencing error profiles of illumina sequencing instruments
  publication-title: NAR Genom Bioinform
– volume: 15
  start-page: 410
  issue: 7
  year: 2014
  ident: 2022031506250617800_ref62
  article-title: Corset: enabling differential gene expression analysis for de novo assembled transcriptomes
  publication-title: Genome Biol
– volume: 28
  year: 2000
  ident: 2022031506250617800_ref189
  article-title: KEGG: Kyoto encyclopedia of genes and genomes
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/28.1.27
– volume: 305
  start-page: 567
  issue: 3
  year: 2001
  ident: 2022031506250617800_ref193
  article-title: Predicting transmembrane protein topology with a hidden markov model: application to complete genomes
  publication-title: J Mol Biol
  doi: 10.1006/jmbi.2000.4315
– volume: 20
  start-page: 591
  issue: 2
  year: 2020
  ident: 2022031506250617800_ref197
  article-title: EnTAP: bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes
  publication-title: Mol Ecol Resour
  doi: 10.1111/1755-0998.13106
– volume-title: Handbook of Hidden Markov Models in Bioinformatics
  year: 2008
  ident: 2022031506250617800_ref169
  doi: 10.1201/9781420011807
– volume: 323
  start-page: 133
  issue: 5910
  year: 2009
  ident: 2022031506250617800_ref257
  article-title: Real-time DNA sequencing from single polymerase molecules
  publication-title: Science
  doi: 10.1126/science.1162986
– volume: 18
  year: 2017
  ident: 2022031506250617800_ref119
  article-title: False discovery rates: a new deal
  publication-title: Biostatistics
– volume: 23
  start-page: 897
  issue: 0
  year: 2017
  ident: 2022031506250617800_ref237
  article-title: Galaksio, a user friendly workflow-centric front end for galaxy
  publication-title: EMBnet J
  doi: 10.14806/ej.23.0.897
– volume: 32
  start-page: 3047
  issue: 19
  year: 2016
  ident: 2022031506250617800_ref28
  article-title: MultiQC: summarize analysis results for multiple tools and samples in a single report
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw354
– volume: 18
  start-page: 476
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref174
  article-title: fLPS: fast discovery of compositional biases for the protein universe
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-017-1906-3
– volume: 32
  start-page: 896
  issue: 9
  year: 2014
  ident: 2022031506250617800_ref127
  article-title: Normalization of RNA-seq data using factor analysis of control genes or samples
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.2931
– volume: 55
  issue: 100792
  year: 2021
  ident: 2022031506250617800_ref249
  article-title: De novo transcriptome assembly for pachygrapsus marmoratus, an intertidal brachyuran crab
  publication-title: Mar Genomics
– volume: 12
  start-page: S5
  issue: S10
  year: 2011
  ident: 2022031506250617800_ref43
  article-title: Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-12-S10-S5
– volume: 7
  start-page: e8206
  year: 2019
  ident: 2022031506250617800_ref131
  article-title: consensusDE: an R package for assessing consensus of multiple RNA-seq algorithms with RUV correction
  publication-title: PeerJ
  doi: 10.7717/peerj.8206
– volume: 49
  start-page: D266
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref178
  article-title: CATH: increased structural coverage of functional space
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1079
– volume: 22
  start-page: 56
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref51
  article-title: 3 ’-5 ’ crosstalk contributes to transcriptional bursting
  publication-title: Genome Biol
  doi: 10.1186/s13059-020-02227-5
– volume: 11
  start-page: e0157022
  issue: 6
  year: 2016
  ident: 2022031506250617800_ref128
  article-title: SARTools: a DESeq2- and EdgeR-based R pipeline for comprehensive differential analysis of RNA-Seq data
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0157022
– volume: 30
  start-page: 1236
  issue: 9
  year: 2014
  ident: 2022031506250617800_ref176
  article-title: InterProScan 5: genome-scale protein function classification
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu031
– volume: 20
  start-page: 631
  issue: 11
  year: 2019
  ident: 2022031506250617800_ref7
  article-title: RNA sequencing: the teenage years
  publication-title: Nat Rev Genet
  doi: 10.1038/s41576-019-0150-2
– volume: 46
  start-page: W537
  issue: W1
  year: 2018
  ident: 2022031506250617800_ref23
  article-title: The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky379
– volume: 22
  start-page: 96
  issue: 2
  year: 2021
  ident: 2022031506250617800_ref3
  article-title: Gene regulation by long non-coding RNAs and its biological functions
  publication-title: Nat Rev Mol Cell Biol
  doi: 10.1038/s41580-020-00315-9
– volume: 12
  start-page: 67
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref21
  article-title: A de novo transcriptomics approach reveals genes involved in thrips tabaci resistance to spinosad
  publication-title: Insects
  doi: 10.3390/insects12010067
– volume: 12
  start-page: 444
  issue: 1
  year: 2011
  ident: 2022031506250617800_ref208
  article-title: WebMGA: a customizable web server for fast metagenomic sequence analysis
  publication-title: BMC Genomics
  doi: 10.1186/1471-2164-12-444
– volume: 122
  issue: 1
  year: 2018
  ident: 2022031506250617800_ref6
  article-title: Overview of next-generation sequencing technologies
  publication-title: Curr Protoc Mol Biol
  doi: 10.1002/cpmb.59
– volume: 31
  start-page: 926
  issue: 6
  year: 2015
  ident: 2022031506250617800_ref163
  article-title: UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu739
– volume: 18
  start-page: 366
  issue: 4
  year: 2021
  ident: 2022031506250617800_ref160
  article-title: Sensitive protein alignments at tree-of-life scale using DIAMOND
  publication-title: Nat Methods
  doi: 10.1038/s41592-021-01101-x
– volume: 34
  start-page: 2115
  issue: 8
  year: 2017
  ident: 2022031506250617800_ref183
  article-title: Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper
  publication-title: Mol Biol Evol
  doi: 10.1093/molbev/msx148
– volume: 554
  start-page: 50
  issue: 7690
  year: 2018
  ident: 2022031506250617800_ref103
  article-title: The axolotl genome and the evolution of key tissue formation regulators
  publication-title: Nature
  doi: 10.1038/nature25458
– volume: 35
  start-page: 316
  issue: 4
  year: 2017
  ident: 2022031506250617800_ref224
  article-title: Nextflow enables reproducible computational workflows
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.3820
– volume: 8
  issue: 5
  year: 2019
  ident: 2022031506250617800_ref58
  article-title: De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers
  publication-title: Gigascience
  doi: 10.1093/gigascience/giz039
– volume: 25
  start-page: 2078
  issue: 16
  year: 2009
  ident: 2022031506250617800_ref95
  article-title: The sequence alignment/map format and SAMtools
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp352
– volume: 1
  issue: 2
  year: 2003
  ident: 2022031506250617800_ref252
  article-title: The what and whys of DOIs
  publication-title: PLoS Biol
  doi: 10.1371/journal.pbio.0000057
– volume: 38
  start-page: 276
  issue: 3
  year: 2020
  ident: 2022031506250617800_ref229
  article-title: The nf-core framework for community-curated bioinformatics pipelines
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-020-0439-x
– volume: 46
  start-page: D435
  issue: D1
  year: 2018
  ident: 2022031506250617800_ref179
  article-title: Gene3D: extensive prediction of globular domains in proteins
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx1069
– volume: 49
  start-page: D389
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref78
  article-title: OrthoDB in 2020: evolutionary and functional annotations of orthologs
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1009
– volume: 49
  issue: 17
  ident: 2022031506250617800_ref203
  article-title: TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkab565
– volume: 35
  start-page: 3100
  issue: 9
  year: 2007
  ident: 2022031506250617800_ref141
  article-title: RNAmmer: consistent and rapid annotation of ribosomal RNA genes
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkm160
– volume: 7
  start-page: 1559
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref100
  article-title: Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data
  publication-title: Sci Rep
  doi: 10.1038/s41598-017-01617-3
– volume: 15
  start-page: 357
  issue: 1
  year: 2014
  ident: 2022031506250617800_ref49
  article-title: NeatFreq: reference-free data reduction and coverage normalization for de novo sequence assembly
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-014-0357-3
– volume: 9
  start-page: 16
  issue: 1
  year: 2016
  ident: 2022031506250617800_ref233
  article-title: Visual programming for next-generation sequencing data analytics
  publication-title: BioData Min
  doi: 10.1186/s13040-016-0095-3
– volume: 12
  start-page: 254
  issue: 3
  year: 2013
  ident: 2022031506250617800_ref143
  article-title: Non-coding RNAs in homeostasis, disease and stress responses: an evolutionary perspective
  publication-title: Brief Funct Genomics
  doi: 10.1093/bfgp/elt016
– volume: 6
  start-page: 195
  issue: 3
  year: 2018
  ident: 2022031506250617800_ref135
  article-title: Modeling and analysis of RNA-seq data: a review from a statistical perspective
  publication-title: Quant Biol
  doi: 10.1007/s40484-018-0144-7
– volume: 32
  start-page: 3351
  issue: 21
  year: 2016
  ident: 2022031506250617800_ref129
  article-title: MetaCycle: an integrated R package to evaluate periodicity in large scale data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw405
– volume: 12
  start-page: 87
  issue: 2
  year: 2011
  ident: 2022031506250617800_ref54
  article-title: RNA sequencing: advances, challenges and opportunities
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg2934
– volume: 20
  start-page: 81
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref72
  article-title: TransLiG: a de novo transcriptome assembler that uses line graph iteration
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1690-7
– volume: 49
  start-page: D899
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref165
  article-title: FlyBase: updates to the drosophila melanogaster knowledge base
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1026
– volume: 37
  start-page: 420
  issue: 4
  year: 2019
  ident: 2022031506250617800_ref173
  article-title: SignalP 5.0 improves signal peptide predictions using deep neural networks
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-019-0036-z
– volume: 34
  start-page: 3265
  issue: 19
  year: 2018
  ident: 2022031506250617800_ref112
  article-title: Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty378
– volume: 22
  start-page: 322
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref117
  article-title: Strategies for detecting and identifying biological signals amidst the variation commonly found in RNA sequencing data
  publication-title: BMC Genomics
  doi: 10.1186/s12864-021-07563-9
– volume: 9
  start-page: 10513
  issue: 18
  year: 2019
  ident: 2022031506250617800_ref85
  article-title: The bellerophon pipeline, improving de novo transcriptomes and removing chimeras
  publication-title: Ecol Evol
  doi: 10.1002/ece3.5571
– volume: 21
  start-page: 373
  issue: 4
  year: 2011
  ident: 2022031506250617800_ref60
  article-title: RNA structure and the mechanisms of alternative splicing
  publication-title: Curr Opin Genet Dev
  doi: 10.1016/j.gde.2011.04.001
– volume: 18
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref114
  article-title: SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes
  publication-title: Genome Biol
– volume: 16
  start-page: 603
  issue: 7
  year: 2019
  ident: 2022031506250617800_ref152
  article-title: Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold
  publication-title: Nat Methods
  doi: 10.1038/s41592-019-0437-4
– volume: 15
  start-page: 553
  issue: 12
  year: 2014
  ident: 2022031506250617800_ref82
  article-title: Evaluation of de novo transcriptome assemblies from RNA-Seq data
  publication-title: Genome Biol
  doi: 10.1186/s13059-014-0553-5
– volume-title: Stack Overflow Developer Survey
  year: 2020
  ident: 2022031506250617800_ref226
– volume: 8
  issue: 7
  year: 2013
  ident: 2022031506250617800_ref205
  article-title: TCW: transcriptome computational workbench
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0069401
– volume: 12
  issue: 2
  year: 2016
  ident: 2022031506250617800_ref57
  article-title: BinPacker: packing-based DE novo transcriptome assembly from RNA-seq data
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1004772
– volume: Chapter 3
  issue: 1
  year: 2013
  ident: 2022031506250617800_ref154
  article-title: An introduction to sequence similarity (“homology”) searching
  publication-title: Curr Protoc Bioinformatics
  doi: 10.1002/0471250953.bi0301s42
– volume: 30
  start-page: 2114
  issue: 15
  year: 2014
  ident: 2022031506250617800_ref34
  article-title: Trimmomatic: a flexible trimmer for illumina sequence data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu170
– volume: 6
  year: 2018
  ident: 2022031506250617800_ref88
  article-title: The oyster river protocol: a multi-assembler and kmer approach for de novo transcriptome assembly
  publication-title: PeerJ
  doi: 10.7717/peerj.5428
– volume: 48
  start-page: D265
  issue: D1
  year: 2020
  ident: 2022031506250617800_ref199
  article-title: CDD/SPARCLE: the conserved domain database in 2020
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkz991
– volume: 374
  issue: 1786
  year: 2019
  ident: 2022031506250617800_ref254
  article-title: Realizing the potential of full-length transcriptome sequencing
  publication-title: Philos Trans R Soc Lond B Biol Sci
  doi: 10.1098/rstb.2019.0097
– volume: 28
  start-page: 1086
  issue: 8
  year: 2012
  ident: 2022031506250617800_ref64
  article-title: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts094
– volume: 29
  start-page: 1152
  issue: 7
  year: 2019
  ident: 2022031506250617800_ref207
  article-title: OMA standalone: orthology inference among public and custom genomes and transcriptomes
  publication-title: Genome Res
  doi: 10.1101/gr.243212.118
– volume: 6
  issue: 57
  year: 2021
  ident: 2022031506250617800_ref232
  article-title: The targets R package: a dynamic make-like function-oriented pipeline toolkit for reproducibility and high-performance computing
  publication-title: J Open Source Softw
  doi: 10.21105/joss.02959
– volume: 8
  start-page: 1874
  year: 2019
  ident: 2022031506250617800_ref27
  article-title: Falco: high-speed FastQC emulation for quality control of sequencing data
  publication-title: F1000Res
  doi: 10.12688/f1000research.21142.1
– volume: 33
  start-page: 2583
  issue: 16
  year: 2017
  ident: 2022031506250617800_ref248
  article-title: MISA-web: a web server for microsatellite prediction
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx198
– volume: 16
  start-page: 30
  issue: 1
  year: 2015
  ident: 2022031506250617800_ref71
  article-title: Bridger: a new framework for de novo transcriptome assembly using RNA-seq data
  publication-title: Genome Biol
  doi: 10.1186/s13059-015-0596-2
– volume: 11
  start-page: 220
  issue: 12
  year: 2010
  ident: 2022031506250617800_ref115
  article-title: From RNA-seq reads to differential expression results
  publication-title: Genome Biol
  doi: 10.1186/gb-2010-11-12-220
– volume: 12
  start-page: 323
  issue: 1
  year: 2011
  ident: 2022031506250617800_ref96
  article-title: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-12-323
– volume: 15
  start-page: 2147
  issue: 8
  year: 2013
  ident: 2022031506250617800_ref168
  article-title: Pico-PLAZA, a genome database of microbial photosynthetic eukaryotes
  publication-title: Environ Microbiol
  doi: 10.1111/1462-2920.12174
– volume: 26
  start-page: 1134
  issue: 8
  year: 2016
  ident: 2022031506250617800_ref80
  article-title: TransRate: reference-free quality assessment of de novo transcriptome assemblies
  publication-title: Genome Res
  doi: 10.1101/gr.196469.115
– volume: 34
  start-page: 525
  issue: 5
  year: 2016
  ident: 2022031506250617800_ref97
  article-title: Near-optimal probabilistic RNA-seq quantification
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.3519
– volume: 10
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref223
  article-title: Streamlining data-intensive biology with workflow systems
  publication-title: Gigascience
  doi: 10.1093/gigascience/giaa140
– volume-title: Common workflow language
  year: 2016
  ident: 2022031506250617800_ref225
– volume: 5
  issue: e2988
  year: 2017
  ident: 2022031506250617800_ref87
  article-title: Compacting and correcting trinity and oases RNA-Seq de novo assemblies
  publication-title: PeerJ
– year: 2019
  ident: 2022031506250617800_ref204
  article-title: Transcriptome computational workbench (TCW): analysis of single and comparative transcriptomes
  doi: 10.1101/733311
– volume: 7
  start-page: 909
  issue: 11
  year: 2010
  ident: 2022031506250617800_ref65
  article-title: De novo assembly and analysis of RNA-seq data
  publication-title: Nat Methods
  doi: 10.1038/nmeth.1517
– volume: 49
  start-page: D192
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref41
  article-title: Rfam 14: expanded coverage of metagenomic, viral and microRNA families
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1047
– volume: 10
  start-page: 317
  year: 2019
  ident: 2022031506250617800_ref11
  article-title: Single-cell RNA-seq technologies and related computational data analysis
  publication-title: Front Genet
  doi: 10.3389/fgene.2019.00317
– volume: 12
  issue: 10
  year: 2017
  ident: 2022031506250617800_ref31
  article-title: BBMerge – accurate paired shotgun read merging via overlap
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0185056
– volume: 30
  year: 2014
  ident: 2022031506250617800_ref212
  article-title: Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu033
– volume: 47
  start-page: D309
  issue: D1
  year: 2019
  ident: 2022031506250617800_ref184
  article-title: eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky1085
– volume: 45
  start-page: W12
  issue: W1
  year: 2017
  ident: 2022031506250617800_ref137
  article-title: CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx428
– volume: 4
  start-page: 48
  issue: 1
  year: 2015
  ident: 2022031506250617800_ref29
  article-title: Rcorrector: efficient and accurate error correction for illumina RNA-seq reads
  publication-title: Gigascience
  doi: 10.1186/s13742-015-0089-y
– volume: 21
  start-page: 630
  issue: 10
  year: 2020
  ident: 2022031506250617800_ref1
  article-title: mRNAs, proteins and the emerging principles of gene expression control
  publication-title: Nat Rev Genet
  doi: 10.1038/s41576-020-0258-4
– volume: 30
  year: 2013
  ident: 2022031506250617800_ref210
  article-title: Mafft multiple sequence alignment software version 7: improvements in performance and usability
  publication-title: Mol Biol Evol
  doi: 10.1093/molbev/mst010
– volume: 18
  start-page: 392
  issue: 8
  year: 2020
  ident: 2022031506250617800_ref18
  article-title: E novo transcriptome assembly and gene expression profiling of the copepod calanus helgolandicus feeding on the PUA-producing diatom skeletonema marinoi
  publication-title: Mar Drugs
  doi: 10.3390/md18080392
– volume: 46
  start-page: D8
  issue: D1
  year: 2018
  ident: 2022031506250617800_ref161
  article-title: Database resources of the national center for biotechnology information
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx1095
– volume: 23
  start-page: 1282
  issue: 10
  year: 2007
  ident: 2022031506250617800_ref164
  article-title: UniRef: comprehensive and non-redundant UniProt reference clusters
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm098
– volume: 26
  start-page: 1721
  issue: 12
  year: 2016
  ident: 2022031506250617800_ref36
  article-title: Centrifuge: rapid and sensitive classification of metagenomic sequences
  publication-title: Genome Res
  doi: 10.1101/gr.210641.116
– volume: 32
  start-page: 2577
  issue: 17
  year: 2016
  ident: 2022031506250617800_ref79
  article-title: DOGMA: domain-based transcriptome and proteome quality assessment
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw231
– volume: 29
  start-page: 644
  issue: 7
  year: 2011
  ident: 2022031506250617800_ref46
  article-title: Full-length transcriptome assembly from RNA-Seq data without a reference genome
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.1883
– volume: 14
  start-page: 318
  issue: 3
  year: 2005
  ident: 2022031506250617800_ref219
  article-title: Rule-based workflow management for bioinformatics
  publication-title: VLDB J
  doi: 10.1007/s00778-005-0153-9
– volume: 117
  start-page: 3224
  issue: 10
  year: 2020
  ident: 2022031506250617800_ref144
  article-title: Expanding the chinese hamster ovary cell long noncoding RNA transcriptome using RNASeq
  publication-title: Biotechnol Bioeng
  doi: 10.1002/bit.27467
– volume: 573
  start-page: 149
  issue: 7772
  year: 2019
  ident: 2022031506250617800_ref218
  article-title: Workflow systems turn raw data into scientific knowledge
  publication-title: Nature
  doi: 10.1038/d41586-019-02619-z
– volume: 8
  issue: 9
  year: 2019
  ident: 2022031506250617800_ref56
  article-title: rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
  publication-title: Gigascience
  doi: 10.1093/gigascience/giz100
– volume: 8
  start-page: 186
  issue: 3
  year: 1998
  ident: 2022031506250617800_ref32
  article-title: Base-calling of automated sequencer traces using phred. II. Error probabilities
  publication-title: Genome Res
  doi: 10.1101/gr.8.3.186
– volume: 49
  start-page: D480
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref162
  article-title: UniProt: the universal protein knowledgebase in 2021
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1100
– volume: 16
  start-page: 71
  issue: 2
  year: 2015
  ident: 2022031506250617800_ref4
  article-title: RNA-mediated epigenetic regulation of gene expression
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg3863
– volume: 20
  start-page: 92
  issue: 1
  year: 2019
  ident: 2022031506250617800_ref13
  article-title: Next-generation genome annotation: we still struggle to get it right
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1715-2
– year: 2015
  ident: 2022031506250617800_ref136
  article-title: RNAseq by total RNA library identifies additional RNAs compared to poly(a) RNA library
  publication-title: Biomed Res Int
  doi: 10.1155/2015/862130
– volume: 11
  start-page: 128
  issue: 8
  year: 2010
  ident: 2022031506250617800_ref234
  article-title: Schatz
  publication-title: The missing graphical user interface for genomics Genome Biol
– volume: 19
  start-page: 45
  issue: 1
  year: 2018
  ident: 2022031506250617800_ref2
  article-title: The emerging complexity of the tRNA world: mammalian tRNAs beyond protein synthesis
  publication-title: Nat Rev Mol Cell Biol
  doi: 10.1038/nrm.2017.77
– volume: 29
  start-page: 2933
  issue: 22
  year: 2013
  ident: 2022031506250617800_ref139
  article-title: Infernal 1.1: 100-fold faster RNA homology searches
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt509
– volume: 17
  start-page: 13
  year: 2016
  ident: 2022031506250617800_ref91
  article-title: A survey of best practices for RNA-seq data analysis
  publication-title: Genome Biol
  doi: 10.1186/s13059-016-0881-8
– volume: 21
  start-page: 148
  issue: 1
  year: 2020
  ident: 2022031506250617800_ref113
  article-title: Compacta: a fast contig clustering tool for de novo assembled transcriptomes
  publication-title: BMC Genomics
  doi: 10.1186/s12864-020-6528-x
– volume: 36
  start-page: 3420
  issue: 10
  year: 2008
  ident: 2022031506250617800_ref186
  article-title: High-throughput functional annotation and data mining with the Blast2GO suite
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkn176
– volume: 25
  start-page: 1224
  issue: 6
  year: 2016
  ident: 2022031506250617800_ref17
  article-title: The power and promise of RNA-seq in ecology and evolution
  publication-title: Mol Ecol
  doi: 10.1111/mec.13526
– volume: 8
  issue: 1
  year: 2018
  ident: 2022031506250617800_ref37
  article-title: Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polya+ selection versus rRNA depletion
  publication-title: Sci Rep
– volume: 48
  start-page: D762
  issue: D1
  year: 2020
  ident: 2022031506250617800_ref166
  article-title: WormBase: a modern model organism information resource
  publication-title: Nucleic Acids Res
– volume: 15
  start-page: 403
  issue: 2
  year: 2014
  ident: 2022031506250617800_ref236
  article-title: Dissemination of scientific software with galaxy ToolShed
  publication-title: Genome Biol
  doi: 10.1186/gb4161
– volume: 35
  start-page: 1960
  issue: 11
  year: 2019
  ident: 2022031506250617800_ref92
  article-title: TPMCalculator: one-step software to quantify mRNA abundance of genomic features
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty896
– volume: 12
  start-page: 671
  issue: 10
  year: 2011
  ident: 2022031506250617800_ref15
  article-title: Next-generation transcriptome assembly
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg3068
– volume: 21
  issue: 1
  year: 2020
  ident: 2022031506250617800_ref255
  article-title: Opportunities and challenges in long-read sequencing data analysis
  publication-title: Genome Biol
  doi: 10.1186/s13059-020-1935-5
– volume: 11
  issue: 10
  year: 2016
  ident: 2022031506250617800_ref74
  article-title: SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0163962
– volume: 21
  start-page: 153
  issue: 1
  year: 2020
  ident: 2022031506250617800_ref145
  article-title: Pan-tissue transcriptome analysis of long noncoding RNAs in the american beaver castor canadensis
  publication-title: BMC Genomics
  doi: 10.1186/s12864-019-6432-4
– volume-title: guigolab/FA-nf: 0.3.1 release
  year: 2021
  ident: 2022031506250617800_ref206
– volume: 7
  start-page: 9
  issue: 1
  year: 2020
  ident: 2022031506250617800_ref20
  article-title: De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango
  publication-title: Sci Data
  doi: 10.1038/s41597-019-0350-9
– volume: 21
  start-page: 18
  issue: 1
  year: 2021
  ident: 2022031506250617800_ref61
  article-title: Error, noise and bias in de novo transcriptome assemblies
  publication-title: Mol Ecol Resour
  doi: 10.1111/1755-0998.13156
– volume: 46
  start-page: D1190
  issue: D1
  year: 2018
  ident: 2022031506250617800_ref167
  article-title: PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx1002
– volume-title: The gene ontology handbook
  year: 2016
  ident: 2022031506250617800_ref182
– volume-title: The Linux Command Line: A Complete Introduction
  year: 2019
  ident: 2022031506250617800_ref240
– volume: 1
  year: 2017
  ident: 2022031506250617800_ref126
  article-title: Importing transcript abundance datasets with tximport
  publication-title: Dim Txi Inf Rep Sample1
– volume-title: Sequence - Evolution - Function: Computational Approaches in Comparative Genomics
  year: 2003
  ident: 2022031506250617800_ref153
  doi: 10.1007/978-1-4757-3783-7
– volume: 14
  start-page: 103
  issue: 2
  year: 2007
  ident: 2022031506250617800_ref52
  article-title: Transcriptional noise and the fidelity of initiation by RNA polymerase II
  publication-title: Nat Struct Mol Biol
  doi: 10.1038/nsmb0207-103
– volume: 49
  start-page: D373
  year: 1
  ident: 2022031506250617800_ref185
  article-title: OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1007
– volume: 14
  start-page: 755
  issue: 9
  year: 1998
  ident: 2022031506250617800_ref171
  article-title: Profile hidden markov models
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/14.9.755
– volume: 28
  start-page: 2520
  issue: 19
  year: 2012
  ident: 2022031506250617800_ref227
  article-title: Snakemake–a scalable bioinformatics workflow engine
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts480
– volume: 215
  start-page: 403
  issue: 3
  year: 1990
  ident: 2022031506250617800_ref158
  article-title: Basic local alignment search tool
  publication-title: J Mol Biol
  doi: 10.1016/S0022-2836(05)80360-2
– ident: 2022031506250617800_ref228
– volume: 110
  start-page: 1038
  issue: 5
  year: 2016
  ident: 2022031506250617800_ref235
  article-title: Models and simulations as a service: exploring the use of galaxy for delivering computational models
  publication-title: Biophys J
  doi: 10.1016/j.bpj.2015.12.041
– volume: 28
  start-page: 1947
  issue: 11
  year: 2019
  ident: 2022031506250617800_ref188
  article-title: Toward understanding the origin and evolution of cellular organisms
  publication-title: Protein Sci
  doi: 10.1002/pro.3715
– volume: 9
  start-page: 29
  issue: Suppl 1
  year: 2015
  ident: 2022031506250617800_ref10
  article-title: Advanced applications of RNA sequencing and challenges
  publication-title: Bioinform Biol Insights
– volume: 21
  start-page: 352
  issue: Suppl 10
  year: 2020
  ident: 2022031506250617800_ref245
  article-title: ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-020-03565-8
– volume: 14
  start-page: 1097
  issue: 6
  year: 2014
  ident: 2022031506250617800_ref256
  article-title: A first look at the oxford nanopore MinION sequencer
  publication-title: Mol Ecol Resour
  doi: 10.1111/1755-0998.12324
– volume: 84
  start-page: 4355
  issue: 13
  year: 1987
  ident: 2022031506250617800_ref170
  article-title: Profile analysis: detection of distantly related proteins
  publication-title: Proc Natl Acad Sci U S A
  doi: 10.1073/pnas.84.13.4355
– start-page: 137
  volume-title: RNA Bioinformatics
  year: 2015
  ident: 2022031506250617800_ref38
  doi: 10.1007/978-1-4939-2291-8_8
– volume: 38
  issue: 12
  year: 2010
  ident: 2022031506250617800_ref53
  article-title: Biases in illumina transcriptome sequencing caused by random hexamer priming
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkq224
– volume: 49
  start-page: D412
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref177
  article-title: Pfam: the protein families database in 2021
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa913
– volume-title: RECOMB international workshop on comparative genomics
  year: 2017
  ident: 2022031506250617800_ref213
– volume: 7
  issue: 8
  year: 2012
  ident: 2022031506250617800_ref39
  article-title: Selective depletion of rRNA enables whole transcriptome profiling of archival fixed tissue
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0042882
– start-page: 227
  volume-title: Gene Prediction
  year: 2019
  ident: 2022031506250617800_ref77
  doi: 10.1007/978-1-4939-9173-0_14
– volume: 18
  start-page: 762
  issue: 3
  year: 2017
  ident: 2022031506250617800_ref75
  article-title: A tissue-mapped axolotl DE novo transcriptome enables identification of limb regeneration factors
  publication-title: Cell Rep
  doi: 10.1016/j.celrep.2016.12.063
– volume: 489
  start-page: 57
  issue: 7414
  year: 2012
  ident: 2022031506250617800_ref104
  article-title: An integrated encyclopedia of DNA elements in the human genome
  publication-title: Nature
  doi: 10.1038/nature11247
– volume: 9
  issue: 6
  year: 2013
  ident: 2022031506250617800_ref105
  article-title: Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs
  publication-title: PLoS Genet
  doi: 10.1371/journal.pgen.1003569
– volume: 18
  start-page: 324
  issue: 1
  year: 2017
  ident: 2022031506250617800_ref48
  article-title: An improved filtering algorithm for big read datasets and its application to single-cell assembly
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-017-1724-7
– volume: 15
  issue: 8
  year: 2020
  ident: 2022031506250617800_ref73
  article-title: De novo sequence assembly requires bioinformatic checking of chimeric sequences
  publication-title: PLoS One
– volume: 49
  start-page: D545
  issue: D1
  year: 2021
  ident: 2022031506250617800_ref187
  article-title: KEGG: integrating viruses and cellular organisms
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa970
– volume: 316
  start-page: 1484
  issue: 5830
  year: 2007
  ident: 2022031506250617800_ref142
  article-title: RNA maps reveal new RNA classes and a possible function for pervasive transcription
  publication-title: Science
  doi: 10.1126/science.1138341
– volume: 25
  start-page: 25
  issue: 1
  year: 2000
  ident: 2022031506250617800_ref181
  article-title: Gene ontology: tool for the unification of biology. The gene ontology consortium
  publication-title: Nat Genet
  doi: 10.1038/75556
– volume: 31
  start-page: 2199
  issue: 13
  year: 2015
  ident: 2022031506250617800_ref198
  article-title: Annocript: a flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv106
– volume: 43
  start-page: e78
  issue: 12
  year: 2015
  ident: 2022031506250617800_ref148
  article-title: Identification of protein coding regions in RNA transcripts
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkv227
– volume: 48
  start-page: D498
  issue: D1
  year: 2020
  ident: 2022031506250617800_ref190
  article-title: The reactome pathway knowledgebase
  publication-title: Nucleic Acids Res
– volume: 35
  start-page: 1026
  issue: 11
  year: 2017
  ident: 2022031506250617800_ref108
  article-title: MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.3988
– volume: 18
  start-page: S181
  issue: Suppl 1
  year: 2002
  ident: 2022031506250617800_ref69
  article-title: Splicing graphs and EST assembly problem
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.suppl_1.S181
– volume: 24
  start-page: 1258
  issue: 6
  year: 2019
  ident: 2022031506250617800_ref106
  article-title: Alternative splicing, RNA-seq and drug discovery
  publication-title: Drug Discov Today
  doi: 10.1016/j.drudis.2019.03.030
– volume: 41
  start-page: D590
  issue: Database issue
  year: 2013
  ident: 2022031506250617800_ref42
  article-title: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
  publication-title: Nucleic Acids Res
– volume: 409
  start-page: 860
  issue: 6822
  year: 2001
  ident: 2022031506250617800_ref76
  article-title: Initial sequencing and analysis of the human genome
  publication-title: Nature
  doi: 10.1038/35057062
– volume: 20
  start-page: 698
  issue: Suppl 25
  year: 2019
  ident: 2022031506250617800_ref68
  article-title: DTA-SiST: de novo transcriptome assembly by using simplified suffix trees
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-019-3272-9
– volume: 9
  issue: 1
  year: 2018
  ident: 2022031506250617800_ref109
  article-title: Clustering huge protein sequence sets in linear time
  publication-title: Nat Commun
  doi: 10.1038/s41467-018-04964-5
SSID ssj0020781
Score 2.5862803
SecondaryResourceType review_article
Snippet Abstract A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome...
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
SubjectTerms Annotations
Assembly
Gene sequencing
Genome
Genomes
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation
Proteins
Review
Ribonucleic acid
RNA
Sequence Analysis, RNA - methods
Transcriptome
Transcriptomes
Workflow
Title A simple guide to de novo transcriptome assembly and annotation
URI https://www.ncbi.nlm.nih.gov/pubmed/35076693
https://www.proquest.com/docview/2640678765
https://www.proquest.com/docview/2622660589
https://pubmed.ncbi.nlm.nih.gov/PMC8921630
Volume 23
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dS8MwEA8yEHwRv51OjbAnoaxt0qR5kiGOIagvG-ytNM1VB1srrhP233tZu7KNoQ_tS65fd2nud9zdL4S00UkaVxhc_biPAUpqmKNDA44QIU-lVqmWtsH59U30h_xlFIyqAtnZjhS-Yh091h2tYx0IS-qJ7tdS5A_eR3VcZflqyiYi6Vh296oNb-vaDcez0cy2him3SyPXfE3viBxWIJF2S6sekz3ITsh-uW3k4pQ8dulsbFl96cd8bIAWOcVzlv_ktLCuZ7kQ5FOgCIxhqicLGmcGjywv8-5nZNh7Hjz1nWojBCfh3C2cxOA3qsSLOSgGPqDaLQ-oqzEc0V7qJ8qA4SaRMlYsBIxqA6Nc8EwIOlAxsHPSyPIMLgnVIgQPpBFBgrERsNgysPGUex4zCPZkkzystBQlFUu43axiEpXZahahSqNKpU3SroW_SnKM3WJ3qO6_JVorU0TVPzSLEKpZVypF0CT39TDOfpvSiDPI51YG4aPN7KomuSgtVz-HIdQVQuHN5YZNawHLrL05ko0_lwzbofIRp7pX_774NTnwbT_EssCvRRrF9xxuEKUU-nY5R38BmaLmHA
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+simple+guide+to+de+novo+transcriptome+assembly+and+annotation&rft.jtitle=Briefings+in+bioinformatics&rft.au=Raghavan%2C+Venket&rft.au=Kraft%2C+Louis&rft.au=Mesny%2C+Fantin&rft.au=Rigerte%2C+Linda&rft.date=2022-03-10&rft.pub=Oxford+University+Press&rft.issn=1467-5463&rft.eissn=1477-4054&rft.volume=23&rft.issue=2&rft_id=info:doi/10.1093%2Fbib%2Fbbab563&rft_id=info%3Apmid%2F35076693&rft.externalDocID=PMC8921630
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon