Cover

Loading…
More Information
Summary:Microproteins encoded by small open reading frames comprise the “dark matter” of proteomes. Although microproteins have been detected in diverse organisms from all three domains of life, many more remain to be identified, and only a few have been functionally characterized. In this comprehensive study of intergenic small open reading frames (ismORFs, 15–70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identify 67,297 clusters of ismORFs subject to purifying selection. Expression of tagged Escherichia coli microproteins is detected for 11 of the 16 tested, validating the predictions. Although the ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins, some globular folds, oligomeric structures, and possible interactions with proteins encoded by neighboring genes are predicted. Complete information on the predicted microprotein families, including evidence of transcription and translation, and structure predictions are available as an easily searchable resource for investigation of microprotein functions. [Display omitted] •Thousands of novel bacterial microproteins predicted, easily accessible as a resource•Most microproteins are lineage-specific, revealing diversity of bacterial proteomes•Comparative genome analysis suggests de novo emergence of numerous microproteins•Microprotein structures, oligomerization, and interactions predicted This study uncovers thousands of microprotein families encoded by intergenic small open reading frames (smORFs) in bacterial genomes and provides a publicly available resource featuring structural predictions, predicted interactions between microproteins and larger proteins, and experimental validation data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1097-2765
1097-4164
1097-4164
DOI:10.1016/j.molcel.2025.01.025