Anomalous source detection in high-dimensional sequence data

Devices and techniques are generally described for evaluation of text data using large n-grams. In various examples, a first vector may be generated for first text data, wherein each element of the vector comprises a value indicating whether the first text data includes a respective n-gram included...

Full description

Saved in:
Bibliographic Details
Main Authors Sommer, Matthew Michael, Axten, Kellen K, Boswell, Aaron, Colon, Brendan Cruz, Thalken, Jason L
Format Patent
LanguageEnglish
Published 03.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Devices and techniques are generally described for evaluation of text data using large n-grams. In various examples, a first vector may be generated for first text data, wherein each element of the vector comprises a value indicating whether the first text data includes a respective n-gram included in a corpus of text data. First label data indicating that a user associated with the first text data has connected to a first computer-implemented service more than a threshold number of times during a past time period may be determined. A first machine learning model may be trained based at least in part on the first vector and the first label data. The first machine learning model may be used to determine a first probability associated with a first n-gram of the first vector. In some examples, at least a first user associated with the first n-gram may be determined.
Bibliography:Application Number: US202117541833