Analysis and Modeling of Geodetic Data Based on Machine Learning

This paper underscores the significance of earth deformation observation in analyzing earth tide curves and predicting earthquakes, positioning it as a cornerstone of Earth observation technology. We delve into the critical task of detecting and diagnosing anomalies in geodetic data. Utilizing Pytho...

Full description

Saved in:
Bibliographic Details
Published inApplied mathematics and nonlinear sciences Vol. 9; no. 1
Main Author Wu, Tong
Format Journal Article
LanguageEnglish
Published Sciendo 01.01.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper underscores the significance of earth deformation observation in analyzing earth tide curves and predicting earthquakes, positioning it as a cornerstone of Earth observation technology. We delve into the critical task of detecting and diagnosing anomalies in geodetic data. Utilizing Python for data preprocessing, our approach identifies missing values, categorizes them by their spatial occurrence, and employs spline interpolation and autoregressive prediction methods for data imputation. This process ensures the integrity of the dataset for subsequent analysis and modeling, reinforcing the precision and reliability of geodetic data analysis in Earth science research. To expand the data set, we propose three models. Model I: Adding gaussian noise to the data. Model II: Resample the data. Model III: Using machine learning methods to learn the internal laws of the data and predict itself to generate new data. For each model, we discuss its advantages and disadvantages. Finally, we structurally fuse the three models to complete data enhancement. To extract the noise, we use DB4 wavelet transform to denoise the data set and extract the noise. Then we make descriptive statistics on the noise distribution, and use Laplace distribution to fit the probability distribution of noise, and finally get the accurate noise distribution. We start from the time domain and frequency domain to extract the features of the data. First, 17 features are extracted in the time domain, then the discrete fourier transform algorithm is used to transform the data into frequency domain data, and 13 are extracted. Therefore, we encode each data as a feature vector with a length of 30. We first use the decision tree as the baseline model to establish the recognition model to select the features. Logistic Regression, KNN, Naive Bayes and SVM are used to establish the recognition model. Finally, we use the Voting ensemble learning method to fuse the model, achieving an accuracy of 86% on the test set.
ISSN:2444-8656
2444-8656
DOI:10.2478/amns-2024-0691