Document classification of files on the client side before upload

A method for classifying a document in real-time is disclosed. The method includes identifying one or more sections of the document likely to contain text based on a contrast between dark space and light space in an image of the document. Optical character recognition is performed within the identif...

Full description

Saved in:
Bibliographic Details
Main Authors Farmer, II, William J, Kesavan, Sreenidhi Narayanamangalathu, Bilenkin, Dimitri, Palanivelu, Karthikeyan, Mangalik, Siddharth, Jackson, William Clayton
Format Patent
LanguageEnglish
Published 02.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A method for classifying a document in real-time is disclosed. The method includes identifying one or more sections of the document likely to contain text based on a contrast between dark space and light space in an image of the document. Optical character recognition is performed within the identified sections of the document to identify a set of words within each identified section of the document. The sets of words are extracted from the identified sections of the document, and a subset of the sets of words is selected for classifying the document based on a preconfigured option. The document is then classified by inputting the selected subset of words into one or more machine learning models. The method includes transmitting the document and the determined classification of the document to an external server.
Bibliography:Application Number: US202117223922