On the choice of training data for machine learning of geostrophic mesoscale turbulence
'Data' plays a central role in data-driven methods, but is not often the subject of focus in investigations of machine learning algorithms as applied to Earth System Modeling related problems. Here we consider the case of eddy-mean interaction in rotating stratified turbulence in the prese...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
02.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | 'Data' plays a central role in data-driven methods, but is not often the
subject of focus in investigations of machine learning algorithms as applied to
Earth System Modeling related problems. Here we consider the case of eddy-mean
interaction in rotating stratified turbulence in the presence of lateral
boundaries, a problem of relevance to ocean modeling, where the eddy fluxes
contain dynamically inert rotational components that are expected to
contaminate the learning process. An often utilized choice in the literature is
to learn from the divergence of the eddy fluxes. Here we provide theoretical
arguments and numerical evidence that learning from the eddy fluxes with the
rotational component appropriately filtered out results in models with
comparable or better skill, but substantially improved robustness. If we simply
want a data-driven model to have predictive skill then the choice of data
choice and/or quality may not be critical, but we argue it is highly desirable
and perhaps even necessary if we want to leverage data-driven methods to aid in
discovering unknown or hidden physical processes within the data itself. |
---|---|
DOI: | 10.48550/arxiv.2307.00734 |