System and method for extracting text from portable document format data
Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format (PDF) data defining a plurality of glyphs, classifying the plurality of glyphs into one or more glyphs sets, and calculating an extended glyphs...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
02.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format (PDF) data defining a plurality of glyphs, classifying the plurality of glyphs into one or more glyphs sets, and calculating an extended glyphs bounding box for each glyphs. Each set of glyphs is processed to determine one or more text regions, each text region associated with one or more glyphs from the set of glyphs, the one or more glyphs having commonly overlapping extended bounding boxes.
本文所描述的是一种计算机实施的方法。该方法包括:由包括处理单元的计算机系统访问对多个字形进行定义的可移植文档格式(PDF)数据,将多个字形分类成一个或多个字形集,以及计算每个字形的扩展字形边界框。处理每个字形集以确定一个或多个文本区域,每个文本区域与来自该字形集的一个或多个字形相关联,该一个或多个字形具有共同重叠的扩展边界框。 |
---|---|
AbstractList | Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format (PDF) data defining a plurality of glyphs, classifying the plurality of glyphs into one or more glyphs sets, and calculating an extended glyphs bounding box for each glyphs. Each set of glyphs is processed to determine one or more text regions, each text region associated with one or more glyphs from the set of glyphs, the one or more glyphs having commonly overlapping extended bounding boxes.
本文所描述的是一种计算机实施的方法。该方法包括:由包括处理单元的计算机系统访问对多个字形进行定义的可移植文档格式(PDF)数据,将多个字形分类成一个或多个字形集,以及计算每个字形的扩展字形边界框。处理每个字形集以确定一个或多个文本区域,每个文本区域与来自该字形集的一个或多个字形相关联,该一个或多个字形具有共同重叠的扩展边界框。 |
Author | YANCHENA VADIM SCHWIBERT STEFAN IGUARO, HELENE |
Author_xml | – fullname: YANCHENA VADIM – fullname: SCHWIBERT STEFAN – fullname: IGUARO, HELENE |
BookMark | eNqNyj0OwjAMQOEMMPB3B3MAhgiGZkQVqBML7JVJnILU2FFiJLg9IHEApveGb24mLEwz051fVSkBcoBEepMAUQrQUwt6vfMA-nmIRRJkKYrXkSCIfyRi_dKECgEVl2Yacay0-nVh1sfDpe02lKWnmtETk_btydqdc42zzX77j3kD34w2Xg |
ContentType | Patent |
DBID | EVB |
DatabaseName | esp@cenet |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Chemistry Sciences Physics |
DocumentTitleAlternate | 用于从可移植文档格式数据提取文本的系统及方法 |
ExternalDocumentID | CN114998918A |
GroupedDBID | EVB |
ID | FETCH-epo_espacenet_CN114998918A3 |
IEDL.DBID | EVB |
IngestDate | Fri Aug 30 05:42:48 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | Chinese English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-epo_espacenet_CN114998918A3 |
Notes | Application Number: CN202210195184 |
OpenAccessLink | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220902&DB=EPODOC&CC=CN&NR=114998918A |
ParticipantIDs | epo_espacenet_CN114998918A |
PublicationCentury | 2000 |
PublicationDate | 20220902 |
PublicationDateYYYYMMDD | 2022-09-02 |
PublicationDate_xml | – month: 09 year: 2022 text: 20220902 day: 02 |
PublicationDecade | 2020 |
PublicationYear | 2022 |
RelatedCompanies | CANVA PTY LTD |
RelatedCompanies_xml | – name: CANVA PTY LTD |
Score | 3.551802 |
Snippet | Described herein is a computer implemented method. The method includes accessing, by a computer system including a processing unit, portable document format... |
SourceID | epo |
SourceType | Open Access Repository |
SubjectTerms | CALCULATING COMPUTING COUNTING PHYSICS |
Title | System and method for extracting text from portable document format data |
URI | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220902&DB=EPODOC&locale=&CC=CN&NR=114998918A |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB5qfd40KlofrCC5BZvXNj0EsZuEIDQtUqW3kmxSH0hSTIrgr3d2k1ovet2FZXdgZr7d_b4ZgOu5bTiYuTkit7mtWYnZ1Zws4xpCg1R0tzZqfcUwouGjdT-1py14W2lhZJ3QT1kcET2Ko79XMl4v1o9YnuRWljfJKw4Vt8HE9dTmdmwYgmaoegPXH4-8EVMZc1mkRg8uwn68WPR1524DNgWMFnX2_aeBUKUsfqeUYB-2xrhaXh1A6-tFgV226rymwM6w-fBWYFsyNHmJg40XlocQ1mXGSZynpO4ATRB6EgyzUvKUPxNB5yBCOUIkvE7eM5IWfCleAkmNUomghh7BVeBPWKjh5mY_lpixaH0O8xjaeZFnJ0AozUzuxD2aGrFl9_REgBKnS1NqmmY_1k-h8_c6nf8mz2BPWFWSqoxzaFcfy-wCs3CVXErzfQPNUIsX |
link.rule.ids | 230,309,786,891,25594,76904 |
linkProvider | European Patent Office |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB5qfdSbRkXrawXJLdhkm0cPQeymJWqbFonSW8lj6wNJik0R_PXOblrrRa-7sOwOzM43u983A3A5MQ0HI3eCyG1ias2YNjSH80RDaJCK7tZGqa_oB5b_2LwbmaMKvC21MLJO6KcsjogelaC_F_K-nq4esTzJrZxdxa84lF93Q9dTF9mxYQiaoeq13c5w4A2YypjLAjV4cBH2Y2LR0p2bNVi3MSUUdfY7T22hSpn-DindHdgY4mpZsQuVrxcFamzZeU2Brf7iw1uBTcnQTGY4uPDC2R74ZZlxEmUpKTtAE4SeBK9ZKXnKnomgcxChHCESXsfvnKR5MhcvgaREqURQQ_fhotsJma_h5sY_lhizYHUOegDVLM_4IRDL4jRxIttKjahp2nosQInTsFKLUtqK9COo_71O_b_Jc6j5Yb837t0G98ewLSwsCVbGCVSLjzk_xYhcxGfSlN90744C |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=System+and+method+for+extracting+text+from+portable+document+format+data&rft.inventor=YANCHENA+VADIM&rft.inventor=SCHWIBERT+STEFAN&rft.inventor=IGUARO%2C+HELENE&rft.date=2022-09-02&rft.externalDBID=A&rft.externalDocID=CN114998918A |