INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING
Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicat...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Patent |
Language | English |
Published |
05.04.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing. |
---|---|
AbstractList | Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing. |
Author | Voigt Rob King Gary C Long Jessica D Krawczyk Stefan Munro Robert Saxena Tripti Erle Schuyler D Callahan Brendan D Brenier Jason |
Author_xml | – fullname: Brenier Jason – fullname: Erle Schuyler D – fullname: Saxena Tripti – fullname: Voigt Rob – fullname: Callahan Brendan D – fullname: Long Jessica D – fullname: Munro Robert – fullname: King Gary C – fullname: Krawczyk Stefan |
BookMark | eNqNyjsOgkAQAFAKLfzdYRJrEvAXKUd2wI3LYNhBQ0WIWRsNkOD9I4UHsHrNm3uTtmvdzHtpFjJGp8QCtrJCGcgZBVTFmOkYjalAZ9civ5EFLRYunN8NqZQAWUGcK_JPaAmSvABGKQs0YJDTEsdSsqLCyjg1p0tv-mzeg1v9XHjrhCQ--67vajf0zcO17lOXdhOExyDaR7sDhtv_1hc23Tl7 |
ContentType | Patent |
DBID | EVB |
DatabaseName | esp@cenet |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Chemistry Sciences Physics |
ExternalDocumentID | US2018095946A1 |
GroupedDBID | EVB |
ID | FETCH-epo_espacenet_US2018095946A13 |
IEDL.DBID | EVB |
IngestDate | Fri Jul 19 13:05:06 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-epo_espacenet_US2018095946A13 |
Notes | Application Number: US201715596855 |
OpenAccessLink | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20180405&DB=EPODOC&CC=US&NR=2018095946A1 |
ParticipantIDs | epo_espacenet_US2018095946A1 |
PublicationCentury | 2000 |
PublicationDate | 20180405 |
PublicationDateYYYYMMDD | 2018-04-05 |
PublicationDate_xml | – month: 04 year: 2018 text: 20180405 day: 05 |
PublicationDecade | 2010 |
PublicationYear | 2018 |
RelatedCompanies | Voigt Rob King Gary C Long Jessica D Krawczyk Stefan Munro Robert Saxena Tripti Idibon, Inc Erle Schuyler D Callahan Brendan D Brenier Jason |
RelatedCompanies_xml | – name: King Gary C – name: Brenier Jason – name: Long Jessica D – name: Krawczyk Stefan – name: Voigt Rob – name: Saxena Tripti – name: Callahan Brendan D – name: Munro Robert – name: Erle Schuyler D – name: Idibon, Inc |
Score | 3.1344008 |
Snippet | Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for... |
SourceID | epo |
SourceType | Open Access Repository |
SubjectTerms | CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS |
Title | INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING |
URI | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20180405&DB=EPODOC&locale=&CC=US&NR=2018095946A1 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bT8IwFD4heH1T1HhB00Szt8UQd6EPxJStXBQ6wjqEJ7KxLTGaQWTGv-9ZAeWJx7Ynzdr02znfes43gIfIpnWETaqHsWHqRhrFOo1mlh6HNIzRY9h2pNQ-hdUJjJexOS7B56YWRumE_ihxRETUDPGeq_f14v8jlqtyK5eP0Tt2zZ9bsuFqa3Zcq-OZNDW32eADz_UczXEaga-J4WqMmtSwGHKlPQyk7QIPfNQs6lIW206ldQL7A5wvy0-hlGQVOHI2_16rwGF_feVdgQOVozlbYucah8sz-CiUbJGGt7mQxJ_4kveJ7DBJ3IlgSuKgNyHd_mDojbhPutInr8J7U_LHhAmXOJ7L9SbzOUESSFTuA-uRHhPtgKHJqvJAomVXtM_hvsWl09Hx-ad_2zUN_O3FPl1AOZtnySUQI6UYHcVpLUxCI4xstEe2gBw4qVOMsWtXUN010_Xu4Rs4Lpoqq8WsQjn_-k5u0WHn0Z3a518CLJGX |
link.rule.ids | 230,309,783,888,25576,76882 |
linkProvider | European Patent Office |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8Qv_BNUeMHahPN3hZD3Bh7IKZsg022jrAO4YlsbEuMZhCZ8d_3VkB54rV3adqm17tfe_crwGOs6S00m0yOEkWVlSxOZD2eNeUk0qMEPYamxYLtkzXtUHkdq-MKfG5qYQRP6I8gR0SLmqG9F-K8XvxfYpkit3L5FL9j0_yly9umtEbHjRbuSVUyO21r4Ju-IRlGOwwkNlzJdFVXmhSx0j4G2a3yvwNr1CnrUhbbTqV7AgcD7C8vTqGS5jWoGpu_12pw5K2fvGtwKHI0Z0tsXNvh8gw-SiZbhOE9i3ESTAJueYTblBNzwqigOHAnxPEGQ39kBcThAekz_03QHxPKTGL4piV3aGARBIFE5D5Ql7iU9UKKKqvKA46aDuudw0PX4oYt4_inf8s1DYPtyT5fwF4-z9NLIEqmY3SUZI0ojZQo1lAf0QJi4LSlY4zduIL6rp6ud4vvoWpzz526DuvfwHEpEhkuah32iq_v9BaddxHfiTX_Bf8wlIc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=INTELLIGENT+SYSTEM+THAT+DYNAMICALLY+IMPROVES+ITS+KNOWLEDGE+AND+CODE-BASE+FOR+NATURAL+LANGUAGE+UNDERSTANDING&rft.inventor=Brenier+Jason&rft.inventor=Erle+Schuyler+D&rft.inventor=Saxena+Tripti&rft.inventor=Voigt+Rob&rft.inventor=Callahan+Brendan+D&rft.inventor=Long+Jessica+D&rft.inventor=Munro+Robert&rft.inventor=King+Gary+C&rft.inventor=Krawczyk+Stefan&rft.date=2018-04-05&rft.externalDBID=A1&rft.externalDocID=US2018095946A1 |