Machine Learning and Knowledge Discovery in Databases. Research Track European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part III

The multi-volume set LNAI 12975 until 12979 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2021, which was held during September 13-17, 2021. The conference was originally planned to take place in Bilbao, Spain, but...

Full description

Saved in:
Bibliographic Details
Main Authors Oliver, Nuria, Pérez-Cruz, Fernando, Kramer, Stefan, Read, Jesse, Lozano, Jose A
Format eBook
LanguageEnglish
Published Cham Springer International Publishing AG 2021
Springer International Publishing
Edition1
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Table of Contents:
  • 5.2 Inner ADMM -- 5.3 Convergence -- 6 Experiments -- 6.1 Time and Objective Performance Comparison -- 6.2 Robustness Analysis -- 6.3 Classification Performance -- 7 Conclusion -- References -- Black-Box Optimizer with Stochastic Implicit Natural Gradient -- 1 Introduction -- 2 Notation and Symbols -- 3 Implicit Natural Gradient Optimization -- 3.1 Optimization with Exponential-Family Sampling -- 3.2 Implicit Natural Gradient -- 4 Update Rule for Gaussian Sampling -- 4.1 Stochastic Update -- 4.2 Direct Update for and -- 5 Convergence Rate -- 6 Optimization for Discrete Variable -- 7 Empirical Study -- 7.1 Evaluation on Synthetic Continuous Test Benchmarks -- 7.2 Evaluation on RL Test Problems -- 7.3 Evaluation on Discrete Test Problems -- 8 Conclusions -- References -- More General and Effective Model Compression via an Additive Combination of Compressions -- 1 Introduction -- 2 Related Work -- 3 Compression via an Additive Combination as Constrained Optimization -- 4 Optimization via a Learning-Compression Algorithm -- 4.1 Exactly Solvable C Step -- 5 Experiments on CIFAR10 -- 5.1 Q+P: Quantization Plus Pruning -- 5.2 Q+L: Quantization Plus Low-Rank -- 5.3 L+P: Low-Rank Plus Pruning -- 6 Experiments on ImageNet -- 7 Conclusion -- References -- Hyper-parameter Optimization for Latent Spaces -- 1 Introduction -- 2 Background and Related Work -- 3 Hyper-parameter Optimization for Latent Spaces in Recommender Systems -- 3.1 The Nelder-Mead Approach -- 4 Empirical Evaluation -- 4.1 Baselines and Evaluation Protocol -- 4.2 Experiments on Real-World Data -- 4.3 Experiments on Synthetic Data -- 5 Conclusions -- References -- Bayesian Optimization with a Prior for the Optimum -- 1 Introduction -- 2 Background -- 2.1 Bayesian Optimization -- 2.2 Tree-Structured Parzen Estimator -- 3 BO with a Prior for the Optimum -- 3.1 BOPrO Priors -- 3.2 Model
  • 3.3 Pseudo-posterior -- 3.4 Model and Pseudo-posterior Visualization -- 3.5 Acquisition Function -- 3.6 Putting It All Together -- 4 Experiments -- 4.1 Prior Forgetting -- 4.2 Comparison Against Strong Baselines -- 4.3 The Spatial Use-Case -- 5 Related Work -- 6 Conclusions and Future Work -- A Prior Forgetting Supplementary Experiments -- B Mathematical Derivations -- B.1 EI Derivation -- B.2 Proof of Proposition1 -- C Experimental Setup -- D Spatial Real-World Application -- E Multivariate Prior Comparison -- F Misleading Prior Comparison -- G Comparison to Other Baselines -- H Prior Baselines Comparison -- I -Sensitivity Study -- J -Sensitivity Study -- References -- Rank Aggregation for Non-stationary Data Streams -- 1 Introduction -- 2 Preliminaries and Notation -- 2.1 Modeling Evolving Preferences: Evolving Mallows Model -- 3 Unbalanced Borda for Stream Ranking -- 3.1 Sample Complexity for Returning 0 on Average -- 3.2 Sample Complexity for Returning 0 with High Probability -- 3.3 Choosing Optimally -- 4 Generalizing Voting Rules -- 5 Experiments -- 5.1 Rank Aggregation for Dynamic Preferences with uBorda -- 5.2 Condorcet Winner in Dynamic Preferences -- 6 Conclusions -- References -- Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction -- 1 Introduction -- 1.1 Adaptive Optimization Methods -- 1.2 Regularized Optimization Methods -- 1.3 Motivation -- 1.4 Outline of Contents -- 1.5 Notations and Technical Background -- 2 Algorithm -- 2.1 Closed-Form Solution -- 2.2 Concrete Examples -- 3 Convergence and Regret Analysis -- 4 Experiments -- 4.1 Experiment Setup -- 4.2 Adam vs. Group Adam -- 4.3 Adagrad vs. Group Adagrad -- 4.4 Discussion -- 5 Conclusion -- References -- Fast Conditional Network Compression Using Bayesian HyperNetworks -- 1 Introduction -- 2 Related Work -- 3 Preliminaries
  • 3.1 Bayesian Neural Networks
  • 3.3 Relation to Existing Approaches -- 4 Experiments -- 4.1 Experimental Setup -- 4.2 Performance Gains Introduced by Joslim -- 4.3 Ablation Studies -- 5 Conclusion -- References -- A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization -- 1 Introduction -- 2 Preliminaries -- 3 Variance Controlled SVRG with a Combined Unbiased/Biased Estimation -- 3.1 Weighted Unbiased Estimator Analysis -- 3.2 Biased Estimator Analysis -- 3.3 Convergence Analysis for Smooth Non-convex Optimization -- 3.4 Scaling Batch Samples -- 3.5 Best of Two Worlds -- 4 Application -- 5 Discussion -- References -- Very Fast Streaming Submodular Function Maximization -- 1 Introduction -- 2 Related Work -- 3 The Three Sieves Algorithm -- 4 Experimental Evaluation -- 4.1 Batch Experiments -- 4.2 Streaming Experiments -- 5 Conclusion -- References -- Dep-L0: Improving L0-Based Network Sparsification via Dependency Modeling -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Sparse Structure Learning -- 3.2 Group Sparsity -- 3.3 Gate Partition -- 3.4 Neural Dependency Modeling -- 4 Experiments -- 4.1 CIFAR10 Results -- 4.2 CIFAR100 Results -- 4.3 ImageNet Results -- 4.4 Study of Learned Sparse Structures -- 4.5 Run-Time Comparison -- 5 Conclusion and Future Work -- References -- Variance Reduced Stochastic Proximal Algorithm for AUC Maximization -- 1 Introduction -- 2 AUC Formulation -- 3 Method -- 4 Convergence Analysis -- 4.1 Bounding the Variance -- 4.2 Proof of Theorem 1 -- 4.3 Complexity Analysis -- 5 Experiment -- 5.1 VRSPAM Has Lower Variance -- 5.2 VRSPAM Has Faster Convergence -- 6 Conclusion -- References -- Robust Regression via Model Based Methods -- 1 Introduction -- 2 Related Work -- 3 Robust Regression and Applications -- 4 Robust Regression via MBO -- 5 Stochastic Alternating Direction Method of Multipliers -- 5.1 SADM
  • Intro -- Preface -- Organization -- Contents - Part III -- Generative Models -- Deep Conditional Transformation Models -- 1 Introduction -- 1.1 Transformation Models -- 1.2 Related Work and Our Contribution -- 2 Model and Network Definition -- 2.1 Model Definition -- 2.2 Network Definition -- 2.3 Penalization -- 2.4 Bijectivitiy and Monotonocity Constraints -- 2.5 Interpretability and Identifiability Constraints -- 3 Numerical Experiments -- 4 Application -- 4.1 Movie Reviews -- 4.2 UTKFace -- 4.3 Benchmark Study -- 5 Conclusion and Outlook -- References -- Disentanglement and Local Directions of Variance -- 1 Introduction -- 2 Related Work -- 3 Disentanglement, PCA and VAEs -- 3.1 Preliminaries -- 3.2 Disentanglement in a PCA Setting -- 3.3 PCA Behavior in Variational Autoencoders -- 4 Measuring Induced Variance and Consistency -- 4.1 Ground-Truth Factor Induced Variance -- 4.2 Local Directions of Variance -- 4.3 Consistency of Encodings -- 5 Experimental Setup -- 5.1 Datasets -- 5.2 Models -- 6 Results -- 6.1 The Effect of Different Per-Factor Contributions -- 6.2 The Effect of Non-global Variance Structure in the Data -- 6.3 The Effect of Non-global Variance Structure in the Models -- 7 Conclusions -- References -- Neural Topic Models for Hierarchical Topic Detection and Visualization -- 1 Introduction -- 2 Visual and Hierarchical Neural Topic Model -- 2.1 Generative Model -- 2.2 Parameterizing Path Distribution and Level Distribution -- 2.3 Parameterizing Word Distribution -- 2.4 Visualizing the Topic Tree -- 2.5 Dynamically Growing the Topic Tree -- 2.6 Autoencoding Variational Inference -- 3 Experiments -- 3.1 Tree-Structure and Visualization Quantitative Evaluation -- 3.2 Topic Coherence and Running Time Comparison -- 3.3 Visualization Qualitative Evaluation -- 4 Related Work -- 5 Conclusion -- References
  • Semi-structured Document Annotation Using Entity and Relation Types -- 1 Introduction -- 2 Related Work -- 3 Problem Statement -- 4 Proposed Approach -- 4.1 Document Structure Recovery Using Generative PGM -- 4.2 Document Structure Annotation Using PLP -- 4.3 Entity and Relation Discovery -- 5 Experiments -- 6 Conclusions -- References -- Learning Disentangled Representations with the Wasserstein Autoencoder -- 1 Introduction -- 2 Importance of Total Correlation in Disentanglement -- 2.1 Total Correlation -- 2.2 Total Correlation in ELBO -- 3 Is WAE Naturally Good at Disentangling? -- 3.1 WAE -- 3.2 TCWAE -- 3.3 Estimators -- 4 Experiments -- 4.1 Quantitative Analysis: Disentanglement on Toy Data Sets -- 4.2 Qualitative Analysis: Disentanglement on Real-World Data Sets -- 5 Conclusion -- References -- Search and optimization -- Which Minimizer Does My Neural Network Converge To? -- 1 Introduction -- 2 Background -- 3 Impact of Initialization -- 4 Impact of Adaptive Optimization -- 5 Impact of Stochastic Optimization -- 6 Beyond Strong Overparameterization -- 7 Experiments -- 8 Related Work -- 9 Discussion -- References -- Information Interaction Profile of Choice Adoption -- 1 Introduction -- 1.1 Contributions -- 2 Related Work -- 3 InterRate -- 3.1 Problem Definition -- 3.2 Likelihood -- 3.3 Proof of Convexity -- 4 Experimental Setup -- 4.1 Kernel Choice -- 4.2 Parameters Learning -- 4.3 Background Noise in the Data -- 4.4 Evaluation Criteria -- 4.5 Baselines -- 5 Results -- 5.1 Synthetic Data -- 5.2 Real Data -- 6 Discussion -- 7 Conclusion -- References -- Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks -- 1 Introduction -- 2 Related Work -- 2.1 Slimmable Neural Networks -- 2.2 Neural Architecture Search -- 2.3 Channel Pruning -- 3 Methodology -- 3.1 Problem Formulation -- 3.2 Proposed Approach: Joslim