DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems
Extracting actionable insight from complex unlabeled scientific data is an open challenge and key to unlocking data-driven discovery in science. Complementary and alternative to supervised machine learning approaches, unsupervised physics-based methods based on behavior-driven theories hold great pr...
Saved in:
Main Authors | , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.09.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Extracting actionable insight from complex unlabeled scientific data is an
open challenge and key to unlocking data-driven discovery in science.
Complementary and alternative to supervised machine learning approaches,
unsupervised physics-based methods based on behavior-driven theories hold great
promise. Due to computational limitations, practical application on real-world
domain science problems has lagged far behind theoretical development. We
present our first step towards bridging this divide - DisCo - a
high-performance distributed workflow for the behavior-driven local causal
state theory. DisCo provides a scalable unsupervised physics-based
representation learning method that decomposes spatiotemporal systems into
their structurally relevant components, which are captured by the latent local
causal state variables. Complex spatiotemporal systems are generally highly
structured and organize around a lower-dimensional skeleton of coherent
structures, and in several firsts we demonstrate the efficacy of DisCo in
capturing such structures from observational and simulated scientific data. To
the best of our knowledge, DisCo is also the first application software
developed entirely in Python to scale to over 1000 machine nodes, providing
good performance along with ensuring domain scientists' productivity. We
developed scalable, performant methods optimized for Intel many-core processors
that will be upstreamed to open-source Python library packages. Our capstone
experiment, using newly developed DisCo workflow and libraries, performs
unsupervised spacetime segmentation analysis of CAM5.1 climate simulation data,
processing an unprecedented 89.5 TB in 6.6 minutes end-to-end using 1024 Intel
Haswell nodes on the Cori supercomputer obtaining 91% weak-scaling and 64%
strong-scaling efficiency. |
---|---|
DOI: | 10.48550/arxiv.1909.11822 |