Research in Computational Molecular Biology 26th Annual International Conference, RECOMB 2022, San Diego, CA, USA, May 22-25, 2022, Proceedings
This book constitutes the proceedings of the 26th Annual Conference on Research in Computational Molecular Biology, RECOMB 2022, held in San Diego, CA, USA in May 2022. The 17 regular and 23 short papers presented were carefully reviewed and selected from 188 submissions. The papers report on origin...
Saved in:
Main Author | |
---|---|
Format | eBook Conference Proceeding |
Language | English |
Published |
Cham
Springer Nature
2022
Springer International Publishing AG Springer International Publishing |
Edition | 1 |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Table of Contents:
- Intro -- Preface -- To Benny, to the RECOMB Community in Memory of Benny -- Organization -- Contents -- Extended Abstracts -- Unsupervised Integration of Single-Cell Multi-omics Datasets with Disproportionate Cell-Type Representation -- 1 Introduction -- 2 Method -- 2.1 Unbalanced Optimal Transport of SCOTv2 -- 2.2 Extending SCOTv2 for Multi-domain Alignment -- 2.3 Embedding with the Coupling Matrix -- 2.4 Heuristic Process for Self-tuning Hyperparameters -- 3 Experimental Setup -- 3.1 Datasets -- 3.2 Evaluation Metrics and Baseline Methods -- 4 Results -- 4.1 SCOTv2 Gives High-Quality Alignments Consistently Across All Single-Datasets -- 4.2 Hyperparameter Self-tuning Aligns Well Without Depending on Orthogonal Correspondence Information -- 4.3 SCOTv2 Scales Well with Increasing Number of Samples -- 5 Discussion -- References -- Semi-supervised Single-Cell Cross-modality Translation Using Polarbear -- 1 Introduction -- 1.1 Related Work -- 2 Methods -- 2.1 Polarbear Model -- 2.2 Hyperparameter Tuning -- 2.3 Performance Measures -- 2.4 Single-Cell Data Pre-processing -- 2.5 Cluster-Level Analysis -- 3 Results -- 3.1 Polarbear Accurately Translates Between Single-Cell Data Domains -- 3.2 Polarbear Generalizes to New Cell Types -- 3.3 Polarbear Can Match Corresponding Cells Across Modalities -- 4 Discussion -- References -- Transcription Factor-Centric Approach to Identify Non-recurring Putative Regulatory Drivers in Cancer -- 1 Introduction -- 2 Data and Methods -- 2.1 ICGC Simple Somatic Mutations and Gene Expression Data -- 2.2 Promoter and Enhancer Data -- 2.3 Defining the Effects of Mutations on TF Binding, and the Significance of These Effects -- 2.4 Analytical and Simulation-Based Approaches to Compute the Significance of Mutation Effects on TF Binding -- 2.5 Integrating Results Across All Regulatory Regions of a Gene -- 3 Results
- Ultrafast and Interpretable Single-Cell 3D Genome Analysis with Fast-Higashi
- 3.2 Generating Incomplete Gene Trees -- 3.3 Species Tree Estimation Methods -- 3.4 Measurements -- 4 Results and Discussion -- 4.1 Results on 15-Taxon Dataset -- 4.2 Results on 37-Taxon Mammalian Simulated Dataset -- 4.3 Results on Biological Dataset -- 4.4 Running Time -- 5 Conclusions -- References -- Safety and Completeness in Flow Decompositions for RNA Assembly -- 1 Introduction -- 1.1 Safety Framework for Addressing Multiple Solutions -- 1.2 Safety in Flow Decomposition for RNA Assembly -- 1.3 Our Results -- 2 Preliminaries and Notations -- 3 Characterization of Safe and Complete Paths -- 4 Simple Verification and Enumeration Algorithms -- 5 Experimental Evaluation -- 5.1 Datasets -- 5.2 Evaluation Metrics -- 5.3 Implementation and Environment Details -- 5.4 Results -- 6 Conclusion -- References -- NetMix2: Unifying Network Propagation and Altered Subnetworks -- 1 Introduction -- 2 Methods -- 2.1 Altered Subnetwork Problem -- 2.2 Network Propagation and the Propagation Family -- 2.3 NetMix2 -- 2.4 Scores-Only and Network-Only Baselines -- 3 Results -- 3.1 Somatic Mutations in Cancer -- 4 Discussion -- References -- Multi-modal Genotype and Phenotype Mutual Learning to Enhance Single-Modal Input Based Longitudinal Outcome Prediction -- 1 Introduction -- 2 Related Work -- 3 Proposed Method -- 3.1 Problem Formulation -- 3.2 Notation -- 3.3 Longitudinal Predictive Model -- 4 Experiments -- 4.1 Experimental Setup -- 4.2 Experimental Results -- 5 Conclusion -- References -- Fast, Flexible, and Exact Minimum Flow Decompositions via ILP -- 1 Introduction -- 1.1 Minimum Flow Decomposition in Multiassembly -- 1.2 Limitations of Current ILP Solutions -- 1.3 Our Contributions -- 2 Preliminaries -- 3 ILP Formulations -- 3.1 Minimum Flow Decomposition -- 3.2 Subpath Constraints -- 3.3 Inexact Flow -- 4 Experiments -- 5 Conclusions -- References
- 3.2 Interpretability of ThreSPCA Informed Variants -- 3.3 Comparing ThreSPCA to State-of-the-Art -- 4 Discussion -- Appendix 1.A SPCA via Thresholding: Discussions and Proofs -- Appendix 1.B Additional Experiments -- Appendix 1.B.1 Simulated Studies -- Appendix 1.B.2 Experiments on 1KG data -- Appendix 1.B.3 Comparing ThreSPCA with the State-of-the-Art -- References -- Gene Set Priorization Guided by Regulatory Networks with p-values through Kernel Mixed Model -- 1 Introduction -- 2 Method -- 2.1 Background -- 2.2 Method -- 3 Simulation Experiments -- 3.1 Competing Methods -- 3.2 General Data Generation Process -- 3.3 Results -- 4 Study of Transcriptome Association of Alzheimer's Disease -- 5 Conclusion -- A Additional Simulation Experiments -- B Covaraite Regressing -- References -- Real-Valued Group Testing for Quantitative Molecular Assays -- 1 Introduction -- 1.1 Problem Statement and Contribution -- 2 Methods -- 2.1 Notation -- 2.2 Overview of the Matrix Design and Decoding Algorithms -- 2.3 Constructing Matrices for Real-Valued Group Testing -- 3 Results -- 3.1 Comparison of Matrix Properties with Existing Approaches -- 3.2 Effectiveness on Simulated Data -- 3.3 Effectiveness in Wet Lab -- 4 Conclusion -- References -- On the Effect of Intralocus Recombination on Triplet-Based Species Tree Estimation -- 1 Introduction -- 1.1 Key Definitions -- 1.2 Inference Methods -- 1.3 Multispecies Coalescent with Recombination -- 1.4 Estimating Sequence Distances -- 2 Inconsistency of R^* -- 2.1 Statement and Overview -- 2.2 Key Lemmas -- 2.3 Proof of Theorem 1 -- 3 Simulation Study -- 4 Discussion -- References -- QT-GILD: Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data -- 1 Introduction -- 2 Quartet Imputation Problem -- 2.1 Problem Definition -- 3 Experimental Study -- 3.1 Datasets
- 3.1 Integrated Analysis Across Regulatory Regions Identifies 54 Genes with Significant TF Binding Changes Due to Mutations in Regulatory DNA -- 3.2 Genes with Significant Mutations in Their Regulatory Regions Show Large Expression Differences in Mutated Versus Non-mutated Samples -- 4 Discussion -- 5 Acknowledgements, Code Availability, and Supplemental Materials -- References -- DeepMinimizer: A Differentiable Framework for Optimizing Sequence-Specific Minimizer Schemes -- 1 Introduction -- 2 Related Work -- 3 Methods -- 3.1 Background -- 3.2 Search Space Reparameterization -- 3.3 Proxy Objective -- 3.4 Specification of TemplateNet -- 3.5 Specification of the Divergence Measure -- 4 Results -- 5 Conclusion -- A Proof of Proposition 1 -- B Other Empirical Results -- References -- MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs -- 1 Introduction -- 2 Methods -- 2.1 Step 0: Assemble Reads into Contigs and Construct the Assembly Graph -- 2.2 Step 1: Identify Contigs with Single-Copy Marker Genes -- 2.3 Step 2: Order Single-Copy Marker Genes and Estimate the Number of Initial Bins -- 2.4 Step 3: Bin Contigs with Single-copy Marker Genes -- 2.5 Step 4: Bin Remaining Contigs Using Label Propagation -- 3 Experimental Setup -- 3.1 Datasets and Tools -- 3.2 Evaluation Metrics -- 4 Results and Discussion -- 4.1 Benchmarks Using SimHC+ Dataset -- 4.2 Benchmarks Using Real Datasets -- 5 Discussion and Conclusion -- A Appendix -- References -- A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World -- 1 Introduction -- 1.1 Our Contributions -- 1.2 Prior Work -- 2 Materials and Methods -- 2.1 The ThreSPCA Algorithm -- 2.2 Data -- 2.3 Experiments -- 3 Results -- 3.1 ThreSPCA Reveals Genetic Diversity Across the World
- Co-linear Chaining with Overlaps and Gap Costs -- 1 Introduction -- 2 Concepts and Definitions -- 2.1 Co-linear Chaining Problem with Overlap and Gap Costs -- 2.2 Anchored Edit Distance -- 2.3 Graph Representation of Alignment -- 3 Our Algorithms -- 4 Proof of Equivalence -- 4.1 Details of Lemma 2 Proof -- 5 Implementation -- 6 Evaluation -- References -- The Complexity of Approximate Pattern Matching on de Bruijn Graphs -- 1 Introduction -- 1.1 Technical Background and Our Results -- 2 NP-Completeness of Problem 1 on de Bruijn Graphs -- 2.1 Reduction -- 3 Hardness for Problem 2 on de Bruijn Graphs -- 3.1 Proof of Correctness -- 4 Discussion -- References -- ProTranslator: Zero-Shot Protein Function Prediction Using Textual Description -- 1 Introduction -- 2 Methods -- 2.1 Problem Definition -- 2.2 Embedding GO Functions Based on the Textual Description -- 2.3 Embedding Proteins Based on Sequence, Description and Network -- 2.4 Protein Function Prediction Based on GO Embeddings and Protein Embeddings -- 2.5 Annotate Novel Functions, Sparse Functions and Gene Sets to Pathways -- 2.6 Text Generation by the Protein Sequence Features -- 3 Experimental Setup -- 3.1 Calculating Similarities Between GO Functions -- 3.2 Datasets and Evaluation -- 3.3 Comparison Approaches -- 4 Results -- 4.1 Gene Ontology Term Description Similarity Reflects Function Annotation Similarity -- 4.2 ProTranslator Enables Protein Function Prediction in the Zero-Shot Setting -- 4.3 ProTranslator Obtains Substantial Improvement in the Few-Shot Setting -- 4.4 ProTranslator Annotated Genes to Pathways by Only Using the Pathway Description -- 4.5 ProTranslator Generates Text Description for a Gene Set -- 4.6 Ablation Experiment -- 5 Conclusion and Discussion -- References -- Short Papers -- Single-Cell Multi-omic Velocity Infers Dynamic and Decoupled Gene Regulation -- References