Computational Processing of the Portuguese Language 8th International Conference, PROPOR 2008 Aveiro, Portugal, September 8-10, 2008 : Proceedings

This book constitutes the thoroughly refereed proceedings of the 8th International Workshop on Computational Processing of the Portuguese Language, PROPOR 2008, held in Aveiro, Portugal, in September 2008. The 21 revised full papers and 16 revised short papers presented were carefully reviewed and s...

Full description

Saved in:
Bibliographic Details
Main Authors Carbonell, Jaime G, Siekmann, Jörg, Teixeira, António, Quaresma, Paulo
Format eBook
LanguageEnglish
Published Berlin Springer Berlin / Heidelberg 2008
Springer
Edition1
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3540859799
9783540859796

Cover

Table of Contents:
  • Answering Portuguese Questions -- XisQue: An Online QA Service for Portuguese -- Using Semantic Prototypes for Discourse Status Classification -- Using System Expectations to Manage User Interactions -- Adaptive Modeling and High Quality Spectral Estimation for Speech Enhancement -- On the Voiceless Aspirated Stops in Brazilian Portuguese -- Comparison of Phonetic Segmentation Tools for European Portuguese -- Spoltech and OGI-22 Baseline Systems for Speech Recognition in Brazilian Portuguese -- Development of a Speech Recognizer with the Tecnovoz Database -- Dynamic Language Modeling for the European Portuguese -- An Approach to Natural Language Equation -- Reading in Digital Talking Books -- Topic Segmentation in a Media Watch System -- Author Index
  • Intro -- Preface -- Organization -- Table of Contents -- Event Detection by HMM, SVM and ANN: A Comparative Study -- Frication and Voicing Classification -- A Spoken Dialog System Speech Interface Based on a Microphone Array -- PAPEL: A Dictionary-Based Lexical Ontology for Portuguese -- Comparing Window and Syntax Based Strategies for Semantic Extraction -- The Mitkov Algorithm for Anaphora Resolution in Portuguese -- Semantic Similarity, Ontologies and the Portuguese Language: A Close Look at the subject -- Boundary Refining Aiming at Speech Synthesis Applications -- Evolutionary-Based Design of a Brazilian Portuguese Recording Script for a Concatenative Synthesis System -- DIXI - A Generic Text-to-Speech System for European Portuguese -- European Portuguese Articulatory Based Text-to-Speech: First Results -- Statistical Machine Translation of Broadcast News from Spanish to Portuguese -- Combining Multiple Features for Automatic Text Summarization through Machine Learning -- Some Experiments on Clustering Similar Sentences of Texts in Portuguese -- Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning -- Learning Coreference Resolution for Portuguese Texts -- Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament -- Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data -- A Platform of Distributed Speech Recognition for the European Portuguese Language -- Supporting e-Learning with Language Technology for Portuguese -- ParaMT: A Paraphraser for Machine Translation -- Second HAREM: New Challenges and Old Wisdom -- Floresta Sintá(c)tica: Bigger, Thicker and Easier -- The Identification and Description of Frozen Prepositional Phrases through a Corpus-Oriented Study -- CorrefSum: Referencial Cohesion Recovery in Extractive Summaries
  • Semantic Similarity, Ontologies and the Portuguese Language: A Close Look at the subject -- Introduction -- Problems on Mapping between Ontologies -- SiSe Measure -- Other Approaches on Mapping between Ontologies -- Conclusions -- References -- Boundary Refining Aiming at Speech Synthesis Applications -- Introduction and Problem Statement -- Segmentation Based on Hidden Markov Models -- Boundary Refining -- Context-Dependent Boundary Refining -- Experimental Results -- Concluding Remarks -- References -- Evolutionary-Based Design of a Brazilian Portuguese Recording Script for a Concatenative Synthesis System -- Introduction -- Grapheme-to-Phoneme Conversion -- Prediction of Prosodic Patterns -- Lexical Classification -- Segmentation into Phrases -- Sentence Classification -- Prosodic Annotation -- Application to the Brazilian Portuguese Language -- Feature Vector Representation -- Automatic Selection Based on Genetic Algorithms -- Experimental Results -- Conclusions and Future Work -- References -- DIXI - A Generic Text-to-Speech System for European Portuguese -- Introduction -- Tecnovoz Project -- System Flexibility -- Data-Driven Approaches -- Paper Organization -- Corpora Building -- Corpus Design -- Phonetic Segmentation and Multi-level Utterance Descriptions -- Linguistic Analysis -- Speaker-Adapted Prosodic Models -- Grapheme to Phone -- System Architecture -- Text Splitter -- Text Normalizer -- Part-of-Speech Tagging -- Prosodic Phrasing -- Phonological and Phonetic Descriptions -- Acoustic Synthesis -- Conclusions -- References -- European Portuguese Articulatory Based Text-to-Speech: First Results -- Introduction -- Articulatory Phonology -- TADA - TAsk Dynamics Application -- System Architecture and Strategies -- Linguistic Processing -- Gestural Model for European Portuguese -- Synthesizer -- Synthesis Example -- IdentificationTest
  • Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data -- Introduction -- Baseline Transcription Systems -- TV Broadcast News Transcription System -- The Telephone Speech Recognizer -- Corpora Description -- TV Broadcast News Corpus (TVBN) -- Fixed Telephone Corpus (FT) -- Mobile Telephone Corpus (MT) -- Radio Broadcast Corpus (RB) -- Detection of Telephone Segments in Radio Broadcast -- Automatic Transcription of Radio Telephone Speech -- Baseline Systems Performace -- Robust Network Training -- Future Work and Challenges -- Conclusions -- References -- A Platform of Distributed Speech Recognition for the European Portuguese Language -- Introduction -- Current Speech Recognition System -- System Architecture -- Network -- Application on the Client -- Server Configuration -- Portability to an Embedded System -- PLP Component -- ForwardMLP Component -- Results -- Conclusions -- References -- Supporting e-Learning with Language Technology for Portuguese -- Introduction -- The Corpus -- The Keyword Extractor -- The Glossary Candidate Detector -- Semantic Search Tool -- Integration in the LMS and Extrinsic Evaluation -- Conclusions -- References -- ParaMT: A Paraphraser for Machine Translation -- Introduction -- Support Verb Constructions -- Machine Translation Problem Evidence -- ParaMT Resources and Methodology -- Paraphrases for Machine Translation -- Preliminary Quantitative Evaluation -- Conclusions -- References -- Second HAREM: New Challenges and Old Wisdom -- Introduction -- "Old", Persistent, Features of HAREM -- Improvements -- The New HAREM Collection and Its Annotation -- New Challenges -- References -- Floresta Sint\'{a}(c)tica: Bigger, Thicker and Easier -- Introduction -- Bigger: The "Selva" -- Thicker -- Easier: Milhafre -- Concluding Remarks -- References
  • Conclusions -- References -- Statistical Machine Translation of Broadcast News from Spanish to Portuguese -- Introduction -- Corpora Description -- Audio Corpora -- Text Corpora -- Parallel Text Corpora -- Spanish Broadcasts News Recognizer -- Introduction -- Reference Platform -- Vocabulary and Lexical Model -- Alignment and Training of Acoustic Model -- Language Model -- Evaluation -- Machine Translation -- Summary and Future Work -- References -- Combining Multiple Features for Automatic Text Summarization through Machine Learning -- Introduction -- SuPor-2 Features -- Features Based on Complex Networks -- Feature Selection -- Classifiers -- Assessment of eAS Using Multiple Features -- Determining the Best Feature Set and Classifier -- Comparison to Other Summarizers -- Final Remarks -- References -- Some Experiments on Clustering Similar Sentences of Texts in Portuguese -- Introduction -- Related Work -- The Clustering Framework -- Experimental Evaluation -- The Corpus -- The Evaluation Measures -- Experimental Results -- Conclusions -- References -- Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning -- Introduction -- Entropy Guided Transformation Learning -- Decision Trees -- Transformation-Based Learning -- Part-of-Speech Tagging Using ETL -- Experiments -- Mac-Morpho Corpus -- Tycho Brahe Corpus -- Conclusions -- References -- Learning Coreference Resolution for Portuguese Texts -- Introduction -- Related Work -- A Coreference Resolution Approach for Portuguese -- Evaluation -- FinalRemarks -- References -- Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament -- Introduction -- Corpora Collection -- Textual Corpora -- Audio Corpora -- Baseline Transcription System -- Domain Adaptation -- Vocabulary and Lexical Model -- Language Model -- Acoustic Model -- Conclusions -- References
  • Results for Bigram LM Obtained from the Corpora Transcriptions
  • Intro -- Title Page -- Preface -- Organization -- Table of Contents -- Event Detection by HMM, SVM and ANN: A Comparative Study -- Introduction -- Event-Based System Description -- Baseline HMM Classifier -- SVM Classifier -- Speech Event Detection by Non Negative Matrix Deconvolution -- Hybrid SVM /HMM Speech Event Detector -- ANN Classifier -- Conclusions -- References -- Frication and Voicing Classification -- Introduction -- Background -- Motivation -- Speech Data -- European Portuguese -- British English -- Dividing the Data -- Extraction of Reference f0 -- f0 Determination Algorithms -- Combining f0 Tracks -- Duration Analysis -- Method -- Results -- Conclusions -- References -- A Spoken Dialog System Speech Interface Based on a Microphone Array -- Introduction -- The Virtual Butler System -- Spoken Dialog System -- Microphone Array Front-End -- Experimental Evaluation -- Conclusions -- References -- PAPEL: A Dictionary-Based Lexical Ontology for Portuguese -- Introduction -- Related Work -- Relation Extraction from MRDs -- Related Resources -- Building PAPEL -- Relations -- Parsing the Definitions -- The Results -- Regression Testing -- Detailed Example: Causation -- The Patterns -- Results -- Conclusions and Further Work -- References -- Comparing Window and Syntax Based Strategies for Semantic Extraction -- Introduction -- Earlier Comparisons between Both Approaches -- Window-Based Contexts -- Syntax-Based Contexts -- Dependency Parsing with Generic Regular Expressions -- Lexico-Syntactic Contexts -- Experiments -- Corpus -- Vector Similarity -- Initial List of Seed Proper Nouns -- Results -- Conclusion -- References -- The Mitkov Algorithm for Anaphora Resolution in Portuguese -- Introduction -- The Mitkov's Algorithm -- Adapting Mitkov's Algorithm for PR in Brazilian Portuguese -- Assessing RAPM -- Final Remarks -- References
  • The Identification and Description of Frozen Prepositional Phrases through a Corpus-Oriented Study -- Introduction -- Traditional Criteria for the Description of Frozen PPs Structural Patterns -- Sets of Frozen Adverbial PPs -- Quantitative Results -- Concluding Remarks and Future Work -- References -- CorrefSum: Referencial Cohesion Recovery in Extractive Summaries -- Introduction -- The CorrefSum System -- Experiments and Evaluation -- Conclusions -- References -- Answering Portuguese Questions -- Architecture of Esfinge -- Experimental Setup -- Evaluation and Discussion of the Results -- References -- XisQu\^{e}: An Online QA Service for Portuguese: An Online QA Service for Portuguese -- Introduction -- The Underlying QA System -- Performance -- Timeliness -- Appropriateness -- Conclusion -- References -- Using Semantic Prototypes for Discourse Status Classification -- Introduction -- Classification Experiments -- References -- Using System Expectations to Manage User Interactions -- Introduction -- Using Expectations in Parser Selection -- Evaluation and Results -- Conclusions and Future Work -- References -- Adaptive Modeling and High Quality Spectral Estimation for Speech Enhancement -- Introduction -- Signal and Noise Modeling and PSD Estimation -- Results -- Conclusion -- References -- On the Voiceless Aspirated Stops in Brazilian Portuguese -- Introduction -- Stop Consonants -- Analysis Procedure^{1} -- Results of VOT Analysis and Discussion -- Conclusions -- Comparison of Phonetic Segmentation Tools for European Portuguese -- References -- Spoltech and OGI-22 Baseline Systems for Speech Recognition in Brazilian Portuguese -- Introduction -- UFPAdic: A Pronunciation Dictionary for BP -- Building Language Models from CETENFolha -- Front-End and Acoustic Modeling -- Spoltech Corpus -- Baseline Results