An integrated architecture for information extraction from documents in Turkish

Adalı, Şerif

An integrated architecture for information extraction from documents in Turkish

dc.contributor.advisor	Sönmez, Coşkun A
dc.contributor.author	Adalı, Şerif
dc.contributor.authorID	504012098
dc.contributor.department	Computer Engineering
dc.date.accessioned	2024-03-21T06:15:11Z
dc.date.available	2024-03-21T06:15:11Z
dc.date.issued	2009-12-25
dc.description	Thesis (Ph.D.) -- İstanbul Technical University, Institute of Science and Technology, 2009
dc.description.abstract	In this study, ontology based information extraction and document layout analysistechniques are integrated for extracting domain specific events and entities. Proposed?Concept Zoning? technique provides easy definition of extraction concepts andincreases portability of the IE system and requires only concept definitions whencompared to approaches that rely on large sets of linguistic patterns. Proposedarchitecture works well when applied to restricted domain applications. It alsosuccessfuly detects data in tabular, list or itimized form. In case of an unknown event,concept similarity is calculated by comparing the concepts in the input document againstthe concepts in the ontology and new attributes, key concept nodes and conceptsproperties are incrementally added to the knowledge base by the user. Domain ontologyis enriched by adding newly discovered instances. Experimental results indicate that ahigh performance document processing system has to cover enough number of lexicalresources, extraction concepts and document models. In addition, document layoutanalysis is used for detecting unknown entity types and approach verifies extractedinformation and relations among them by using key values defined for each domainevent.
dc.description.degree	Ph. D.
dc.identifier.uri	http://hdl.handle.net/11527/24668
dc.language.iso	en_US
dc.publisher	Institute of Science and Technology
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.subject	information extraction
dc.subject	bilgi çıkarımı
dc.subject	natural language processing
dc.subject	doğal dil işleme
dc.subject	architecture
dc.subject	mimari
dc.title	An integrated architecture for information extraction from documents in Turkish
dc.title.alternative	Türkçe belgelerden bilgi çıkarımı için tümleşik bir mimari
dc.type	Doctoral Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504012098.pdf
Boyut:: 1.03 MB
Format:: Adobe Portable Document Format

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

FBE- Bilgisayar Mühendisliği Lisansüstü Programı - Doktora