INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING

Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicat...

Full description

Saved in:
Bibliographic Details
Main Authors Brenier Jason, Erle Schuyler D, Saxena Tripti, Voigt Rob, Callahan Brendan D, Long Jessica D, Munro Robert, King Gary C, Krawczyk Stefan
Format Patent
LanguageEnglish
Published 05.04.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing.
AbstractList Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing.
Author Voigt Rob
King Gary C
Long Jessica D
Krawczyk Stefan
Munro Robert
Saxena Tripti
Erle Schuyler D
Callahan Brendan D
Brenier Jason
Author_xml – fullname: Brenier Jason
– fullname: Erle Schuyler D
– fullname: Saxena Tripti
– fullname: Voigt Rob
– fullname: Callahan Brendan D
– fullname: Long Jessica D
– fullname: Munro Robert
– fullname: King Gary C
– fullname: Krawczyk Stefan
BookMark eNqNyjsOgkAQAFAKLfzdYRJrEvAXKUd2wI3LYNhBQ0WIWRsNkOD9I4UHsHrNm3uTtmvdzHtpFjJGp8QCtrJCGcgZBVTFmOkYjalAZ9civ5EFLRYunN8NqZQAWUGcK_JPaAmSvABGKQs0YJDTEsdSsqLCyjg1p0tv-mzeg1v9XHjrhCQ--67vajf0zcO17lOXdhOExyDaR7sDhtv_1hc23Tl7
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
ExternalDocumentID US2018095946A1
GroupedDBID EVB
ID FETCH-epo_espacenet_US2018095946A13
IEDL.DBID EVB
IngestDate Fri Jul 19 13:05:06 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_US2018095946A13
Notes Application Number: US201715596855
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20180405&DB=EPODOC&CC=US&NR=2018095946A1
ParticipantIDs epo_espacenet_US2018095946A1
PublicationCentury 2000
PublicationDate 20180405
PublicationDateYYYYMMDD 2018-04-05
PublicationDate_xml – month: 04
  year: 2018
  text: 20180405
  day: 05
PublicationDecade 2010
PublicationYear 2018
RelatedCompanies Voigt Rob
King Gary C
Long Jessica D
Krawczyk Stefan
Munro Robert
Saxena Tripti
Idibon, Inc
Erle Schuyler D
Callahan Brendan D
Brenier Jason
RelatedCompanies_xml – name: King Gary C
– name: Brenier Jason
– name: Long Jessica D
– name: Krawczyk Stefan
– name: Voigt Rob
– name: Saxena Tripti
– name: Callahan Brendan D
– name: Munro Robert
– name: Erle Schuyler D
– name: Idibon, Inc
Score 3.1344008
Snippet Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for...
SourceID epo
SourceType Open Access Repository
SubjectTerms CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
Title INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20180405&DB=EPODOC&locale=&CC=US&NR=2018095946A1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bT8IwFD4heH1T1HhB00Szt8UQd6EPxJStXBQ6wjqEJ7KxLTGaQWTGv-9ZAeWJx7Ynzdr02znfes43gIfIpnWETaqHsWHqRhrFOo1mlh6HNIzRY9h2pNQ-hdUJjJexOS7B56YWRumE_ihxRETUDPGeq_f14v8jlqtyK5eP0Tt2zZ9bsuFqa3Zcq-OZNDW32eADz_UczXEaga-J4WqMmtSwGHKlPQyk7QIPfNQs6lIW206ldQL7A5wvy0-hlGQVOHI2_16rwGF_feVdgQOVozlbYucah8sz-CiUbJGGt7mQxJ_4kveJ7DBJ3IlgSuKgNyHd_mDojbhPutInr8J7U_LHhAmXOJ7L9SbzOUESSFTuA-uRHhPtgKHJqvJAomVXtM_hvsWl09Hx-ad_2zUN_O3FPl1AOZtnySUQI6UYHcVpLUxCI4xstEe2gBw4qVOMsWtXUN010_Xu4Rs4Lpoqq8WsQjn_-k5u0WHn0Z3a518CLJGX
link.rule.ids 230,309,783,888,25576,76882
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8Qv_BNUeMHahPN3hZD3Bh7IKZsg022jrAO4YlsbEuMZhCZ8d_3VkB54rV3adqm17tfe_crwGOs6S00m0yOEkWVlSxOZD2eNeUk0qMEPYamxYLtkzXtUHkdq-MKfG5qYQRP6I8gR0SLmqG9F-K8XvxfYpkit3L5FL9j0_yly9umtEbHjRbuSVUyO21r4Ju-IRlGOwwkNlzJdFVXmhSx0j4G2a3yvwNr1CnrUhbbTqV7AgcD7C8vTqGS5jWoGpu_12pw5K2fvGtwKHI0Z0tsXNvh8gw-SiZbhOE9i3ESTAJueYTblBNzwqigOHAnxPEGQ39kBcThAekz_03QHxPKTGL4piV3aGARBIFE5D5Ql7iU9UKKKqvKA46aDuudw0PX4oYt4_inf8s1DYPtyT5fwF4-z9NLIEqmY3SUZI0ojZQo1lAf0QJi4LSlY4zduIL6rp6ud4vvoWpzz526DuvfwHEpEhkuah32iq_v9BaddxHfiTX_Bf8wlIc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=INTELLIGENT+SYSTEM+THAT+DYNAMICALLY+IMPROVES+ITS+KNOWLEDGE+AND+CODE-BASE+FOR+NATURAL+LANGUAGE+UNDERSTANDING&rft.inventor=Brenier+Jason&rft.inventor=Erle+Schuyler+D&rft.inventor=Saxena+Tripti&rft.inventor=Voigt+Rob&rft.inventor=Callahan+Brendan+D&rft.inventor=Long+Jessica+D&rft.inventor=Munro+Robert&rft.inventor=King+Gary+C&rft.inventor=Krawczyk+Stefan&rft.date=2018-04-05&rft.externalDBID=A1&rft.externalDocID=US2018095946A1