Anomalous source detection in high-dimensional sequence data

Devices and techniques are generally described for evaluation of text data using large n-grams. In various examples, a first vector may be generated for first text data, wherein each element of the vector comprises a value indicating whether the first text data includes a respective n-gram included...

Full description

Saved in:

Bibliographic Details
Main Authors	Sommer, Matthew Michael, Axten, Kellen K, Boswell, Aaron, Colon, Brendan Cruz, Thalken, Jason L
Format	Patent
Language	English
Published	03.09.2024
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Devices and techniques are generally described for evaluation of text data using large n-grams. In various examples, a first vector may be generated for first text data, wherein each element of the vector comprises a value indicating whether the first text data includes a respective n-gram included in a corpus of text data. First label data indicating that a user associated with the first text data has connected to a first computer-implemented service more than a threshold number of times during a past time period may be determined. A first machine learning model may be trained based at least in part on the first vector and the first label data. The first machine learning model may be used to determine a first probability associated with a first n-gram of the first vector. In some examples, at least a first user associated with the first n-gram may be determined.
Bibliography:	Application Number: US202117541833