System and method for extracting text from portable document format data

Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format (PDF) data defining a plurality of glyphs, classifying the plurality of glyphs into one or more glyphs sets, and calculating an extended glyphs...

Full description

Saved in:
Bibliographic Details
Main Authors YANCHENA VADIM, SCHWIBERT STEFAN, IGUARO, HELENE
Format Patent
LanguageChinese
English
Published 02.09.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format (PDF) data defining a plurality of glyphs, classifying the plurality of glyphs into one or more glyphs sets, and calculating an extended glyphs bounding box for each glyphs. Each set of glyphs is processed to determine one or more text regions, each text region associated with one or more glyphs from the set of glyphs, the one or more glyphs having commonly overlapping extended bounding boxes. 本文所描述的是一种计算机实施的方法。该方法包括:由包括处理单元的计算机系统访问对多个字形进行定义的可移植文档格式(PDF)数据,将多个字形分类成一个或多个字形集,以及计算每个字形的扩展字形边界框。处理每个字形集以确定一个或多个文本区域,每个文本区域与来自该字形集的一个或多个字形相关联,该一个或多个字形具有共同重叠的扩展边界框。
AbstractList Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format (PDF) data defining a plurality of glyphs, classifying the plurality of glyphs into one or more glyphs sets, and calculating an extended glyphs bounding box for each glyphs. Each set of glyphs is processed to determine one or more text regions, each text region associated with one or more glyphs from the set of glyphs, the one or more glyphs having commonly overlapping extended bounding boxes. 本文所描述的是一种计算机实施的方法。该方法包括:由包括处理单元的计算机系统访问对多个字形进行定义的可移植文档格式(PDF)数据,将多个字形分类成一个或多个字形集,以及计算每个字形的扩展字形边界框。处理每个字形集以确定一个或多个文本区域,每个文本区域与来自该字形集的一个或多个字形相关联,该一个或多个字形具有共同重叠的扩展边界框。
Author YANCHENA VADIM
SCHWIBERT STEFAN
IGUARO, HELENE
Author_xml – fullname: YANCHENA VADIM
– fullname: SCHWIBERT STEFAN
– fullname: IGUARO, HELENE
BookMark eNqNyj0OwjAMQOEMMPB3B3MAhgiGZkQVqBML7JVJnILU2FFiJLg9IHEApveGb24mLEwz051fVSkBcoBEepMAUQrQUwt6vfMA-nmIRRJkKYrXkSCIfyRi_dKECgEVl2Yacay0-nVh1sfDpe02lKWnmtETk_btydqdc42zzX77j3kD34w2Xg
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
DocumentTitleAlternate 用于从可移植文档格式数据提取文本的系统及方法
ExternalDocumentID CN114998918A
GroupedDBID EVB
ID FETCH-epo_espacenet_CN114998918A3
IEDL.DBID EVB
IngestDate Fri Aug 30 05:42:48 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language Chinese
English
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_CN114998918A3
Notes Application Number: CN202210195184
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220902&DB=EPODOC&CC=CN&NR=114998918A
ParticipantIDs epo_espacenet_CN114998918A
PublicationCentury 2000
PublicationDate 20220902
PublicationDateYYYYMMDD 2022-09-02
PublicationDate_xml – month: 09
  year: 2022
  text: 20220902
  day: 02
PublicationDecade 2020
PublicationYear 2022
RelatedCompanies CANVA PTY LTD
RelatedCompanies_xml – name: CANVA PTY LTD
Score 3.551802
Snippet Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format...
SourceID epo
SourceType Open Access Repository
SubjectTerms CALCULATING
COMPUTING
COUNTING
PHYSICS
Title System and method for extracting text from portable document format data
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220902&DB=EPODOC&locale=&CC=CN&NR=114998918A
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB5qfd40KlofrCC5BZvXNj0EsZuEIDQtUqW3kmxSH0hSTIrgr3d2k1ovet2FZXdgZr7d_b4ZgOu5bTiYuTkit7mtWYnZ1Zws4xpCg1R0tzZqfcUwouGjdT-1py14W2lhZJ3QT1kcET2Ko79XMl4v1o9YnuRWljfJKw4Vt8HE9dTmdmwYgmaoegPXH4-8EVMZc1mkRg8uwn68WPR1524DNgWMFnX2_aeBUKUsfqeUYB-2xrhaXh1A6-tFgV226rymwM6w-fBWYFsyNHmJg40XlocQ1mXGSZynpO4ATRB6EgyzUvKUPxNB5yBCOUIkvE7eM5IWfCleAkmNUomghh7BVeBPWKjh5mY_lpixaH0O8xjaeZFnJ0AozUzuxD2aGrFl9_REgBKnS1NqmmY_1k-h8_c6nf8mz2BPWFWSqoxzaFcfy-wCs3CVXErzfQPNUIsX
link.rule.ids 230,309,786,891,25594,76904
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB5qfdSbRkXrawXJLdhkm0cPQeymJWqbFonSW8lj6wNJik0R_PXOblrrRa-7sOwOzM43u983A3A5MQ0HI3eCyG1ias2YNjSH80RDaJCK7tZGqa_oB5b_2LwbmaMKvC21MLJO6KcsjogelaC_F_K-nq4esTzJrZxdxa84lF93Q9dTF9mxYQiaoeq13c5w4A2YypjLAjV4cBH2Y2LR0p2bNVi3MSUUdfY7T22hSpn-DindHdgY4mpZsQuVrxcFamzZeU2Brf7iw1uBTcnQTGY4uPDC2R74ZZlxEmUpKTtAE4SeBK9ZKXnKnomgcxChHCESXsfvnKR5MhcvgaREqURQQ_fhotsJma_h5sY_lhizYHUOegDVLM_4IRDL4jRxIttKjahp2nosQInTsFKLUtqK9COo_71O_b_Jc6j5Yb837t0G98ewLSwsCVbGCVSLjzk_xYhcxGfSlN90744C
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=System+and+method+for+extracting+text+from+portable+document+format+data&rft.inventor=YANCHENA+VADIM&rft.inventor=SCHWIBERT+STEFAN&rft.inventor=IGUARO%2C+HELENE&rft.date=2022-09-02&rft.externalDBID=A&rft.externalDocID=CN114998918A