Feature Reweighting in Text Classifier Generation Using Unlabeled Data
A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is d...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Patent |
Language | English |
Published |
28.10.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data. |
---|---|
Bibliography: | Application Number: US202016860565 |