
Technologies for Advanced Knowledge Extraction
The project TAKE aims to adapt, develop and utilize a range of language and knowledge technologies for the gradual automatic extraction of knowledge from the World Wide Web. Rule-based and statistical methods for language processing will be combined for systematically extending a body of formalized knowledge.
The central technology for this endeavor is semantically driven advanced information extraction, especially relation extraction, i.e., the detection of instances of semantic relations in large volumes of texts. Such relevant relations may belong to several classes such as facts, definitions, events, citations and opinions.
In TAKE, information extraction is not viewed as a pragmatic shortcut to getting at least something out of natural language texts but rather as a method for gradually approaching the unsolved problem of text understanding in a systematic and controlled way.
Existing bodies of formalized linguistic knowledge such as lexicons, morphologies and grammars will be utilized as well as tools for statistical processing.
The developed methods, architectures and systems will be tested and demonstrated in two knowledge domains:
- scientific/technological literature in a selected field of research, i.e., language technology, and
- general biographical texts.
Publications of the project are listed below.
TAKE is funded under contract 01IW08003 by the Federal Ministry of Education and Research.
Project Managers:
Hans Uszkoreit,
Ulrich Schäfer
Systems
|
Since the ACL-HLT 2011 conference (paper),
the ACL Anthology Searchbench is available online at http://aclasb.dfki.de. It is also
reachable via the ACL Anthology start page itself.
The Searchbench combines semantic, full text and bibliographic search
in more than 25,000 Computational Linguistics papers of the ACL
Anthology from the past 47 years, including the complete Journal.
|
Highlights are
- Semantic statements search: you can search for subject-predicate-object
triples in millions of sentences, where predicates can also be
synonyms (example), and taking passives and sentence negation into account.
The semantic statements search can also by used as an online domain term glossary, here is a sample query for dependency parsing.
- combination with bibliographic and full text filters
- autosuggest search fields, faceted search
- search result/filter URLs can be bookmarked or emailed
- display of search result sentences in original PDF layout.
This requires the Adobe Acrobat Reader browser plug-in with
Preferences/Search/"external highlight server" enabled and doesn't
work well on older, scanned papers (page should always be correct). For details see Help at the left bottom of the Searchbench user interface.
- new graphical citation browser. It shows words from citation sentences on the edges between nodes (which represent papers).
You can click on edges or right mouse button on paper nodes to see the citation sentences in context and highlight in PDF.
Links to external paper search tools are generated in case a cited paper is not in the Anthology.
The Searchbench itself requires a recent web browser (Firefox 3.6 or higher,
Safari 5, Opera 11, Chrome 12, IE 8/9) with JavaScript enabled.
The Searchbench is not perfect - it is a milestone in the ongoing
research project (TAKE). There was no manual correction of OCR or NLP
errors. Missing author affiliation data of 2010 and 2011 papers will
be added later.
However, we hope you find it a useful tool also for your scientific
work. Your feedback is welcome ("Feedback" button at left bottom)!
- The TAKE Searchbench team Ulrich Schäfer, Bernd Kiefer, Christian
Spurk, Jörg Steffen and Rui Wang
...with thanks to all others who have contributed to this endeavor
(see "About" at left bottom).
The Searchbench has been developed in the context of the BMBF-funded
project TAKE, the DFG Cluster of Excellence on Multimodal Computing
and Interaction (M2CI) and the international DELPH-IN collaboration.
A previous version of the ACL Anthology Searchbench is described in the ACL-2011 paper The ACL Anthology Searchbench.
Events
TAKE Publications
2012
- Ulrich Schäfer, Bernd Kiefer, Christian Spurk, Jörg Steffen, Rui Wang, Benjamin Weitz, Magdalena Wolska: The Searchbench - Combining Sentence-semantic, Full-text and Bibliographic Search in Digital Libraries, LIBER quarterly Journal, Vol. 22, no. 4 (2012) 285-309, ISSN: 1435-5205, e-ISSN: 2213-056X. February 2013.
- Melanie Reiplinger, Ulrich Schäfer, Magdalena Wolska: Extracting Glossary Sentences from Scholarly Articles: A Comparative Evaluation of Pattern Bootstrapping and Deep Analysis.
Proceedings of the ACL-2012 Main Conference Workshop on Rediscovering 50 Years of Discoveries, pages 55-65. Jeju Island, Republic of Korea, 2012. bibtex.
- Ulrich Schäfer, Jonathon Read, Stephan Oepen: Towards an ACL Anthology Corpus with Logical Document Structure. An Overview of the ACL 2012 Contributed Task.
Proceedings of the ACL-2012 Main Conference Workshop on Rediscovering 50 Years of Discoveries, pages 88-97. Jeju Island, Republic of Korea, 2012. bibtex.
- Ulrich Schäfer, Benjamin Weitz: Combining OCR Outputs for Logical Document Structure Markup. Technical Background to the ACL 2012 Contributed Task.
Proceedings of the ACL-2012 Main Conference Workshop on Rediscovering 50 Years of Discoveries, pages 104-109. Jeju Island, Republic of Korea, 2012. bibtex.
- Benjamin Weitz, Ulrich Schäfer: A Graphical Citation Browser for the ACL Anthology. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), pages 1718-1722, ISBN 978-2-9517408-7-7, ELRA, Istanbul, Turkey, 2012. bibtex. Try the system here (sample link)!
- Ulrich Schäfer: Satzsemantische Suche - präziser Finden mit der TAKE Searchbench.
DOK.magazin - Technologien, Strategien & Service für das digitale Dokument, volume 2/2012, pages 28-31, ISSN 1864-8398. Dasing, Germany, 2012.
- Ulrich Schäfer, Magdalena Wolska: Automatische Terminologie-, Taxonomie- und Glossarextraktion.
DOK.magazin - Technologien, Strategien & Service für das digitale Dokument, volume 6/2012, pages 62-65, ISSN 1864-8398. Dasing, Germany, 2012.
- Ulrich Schäfer, Christian Spurk, Jörg Steffen: A fully Coreference-annotated Corpus of Scholarly Papers from the ACL Anthology. Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1059-1070, Mumbai, India, 2012. bibtex. Annotation Guidelines: html pdf-A4 pdf-letter Annotated data: zip-archive.
2011
- Hans Uszkoreit. Learning relation extraction grammars with minimal human intervention: strategy, results, insights and plans. Book Chapter in:
Computational Linguistics and Intelligent Text Processing; Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, Springer LNCS 6609, pages 106-126, Tokyo, Japan, 2011.
- Peter Adolphs, Martin Theobald, Ulrich Schäfer, Hans Uszkoreit, Gerhard Weikum: YAGO-QA: Answering Questions by Structured Knowledge Queries.
Proceedings of the Fifth IEEE International Conference on Semantic Computing (ICSC-2011), September 2011, pages 158-161, IEEE Computer Society, Los Alamitos, CA, USA.
- Ulrich Schäfer, Bernd Kiefer, Christian Spurk, Jörg Steffen, Rui Wang. The ACL Anthology Searchbench. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), System Demonstrations, pages 7-13, 2011. Portland, OR, USA. bibtex.
- Cailing Dong, Ulrich Schäfer. Ensemble-style Self-training on Citation Classification. Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP2011), pages 623-631, 2011. Chiang Mai, Thailand. bibtex.
- Magdalena Wolska, Ulrich Schäfer, The Nghia Pham. Bootstrapping a Domain-specific Terminological Taxonomy from Scientific Text. Proceedings of the 9th International Conference on Terminology and Artificial Intelligence (TIA), pages 17-23, Paris, France, 2011. bibtex.
- Ulrich Schäfer, Bernd Kiefer. Advances in Deep Parsing of Scholarly Paper Content. Book Chapter in: Raffaella Bernardi, Sally Chambers, Björn Gottfried, Frédérique Segond, Ilya Zaihrayeu (eds.): Advanced Language Technologies for Digital Libraries. ISBN 978-3-642-23159-9,
Springer LNCS Theoretical Computer Science Series, LNCS 6699, pages 135-153, 2011.
- Antske Fokkens. Metagrammar Engineering: Towards systematic exploration of implemented grammars. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pages 1066-1076, 2011. Portland, OR, USA. bibtex.
- Weiwei Sun. A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pages 1385-1394, 2011. Portland, OR, USA. bibtex.
- Weiwei Sun, Jia Xu. Enhancing Chinese word segmentation using unlabeled data. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 970-979, 2001. Edinburgh, Scotland, UK. bibtex.
- Hans-Ulrich Krieger, Bernd Kiefer. Converting CCGs into Typed Feature Structure Grammars. Proceedings of the 18th International Conference on Head-Driven Phrase Structure Grammar (HPSG-2011), CSLI Publications, Seattle, WA, August 22-25, 2011.
- Feiyu Xu, Hong Li, Yi Zhang, Hans Uszkoreit, Sebastian Krause. Minimally Supervised Domain-Adaptive Parse Reranking for Relation Extraction. Proceedings of the 12th International Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland, 2011.
- Yi Zhang, Hans-Ulrich Krieger. Large-Scale Corpus-Driven PCFG Approximation of an HPSG.
Proceedings of the 12th International Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland, 2011. bibtex.
- Peter Adolphs, Anton Benz, Núria Bertomeu, Xiwen Cheng, Tina Klüwer, Hans Uszkoreit, Feiyu Xu.
Conversational Agents in a Virtual World. Proceedings of the 34th Annual German Conference on Artificial Intelligence (KI-2011), Berlin, 2011.
- Peter Adolphs, Feiyu Xu, Hans Uszkoreit, Hong Li. Dependency Graphs as a Generic Interface between Parsers and Relation Extraction Rule Learning. Proceedings of the 34th Annual German Conference on Artificial Intelligence (KI-2011), Berlin, 2011.
- Arif Bramantoro, Ulrich Schäfer, Toru Ishida. Pipelining Software and Services for Language Processing. Book Chapter 16 in: Toru Ishida and Donghui Lin (eds.): The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. ISBN 978-3-642-21177-5, Springer LNCS Cognitive Technologies Series, pages 247-262, 2011.
2010
- Ulrich Schäfer, Christian Spurk: TAKE Scientist's Workbench: Semantic Search and Citation-based Visual Navigation in Scholar Papers. Proceedings of the Fourth IEEE International Conference on Semantic Computing (ICSC-2010), pages 317-324, ISBN 978-0-7695-4154-9, September 2010, IEEE Computer Society, Los Alamitos, CA, USA.
- Hans-Ulrich Krieger, Ulrich Schäfer: DL Meet FL: A Bidirectional Mapping between Ontologies and Linguistic Knowledge. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pages 588-596, August 2010, Beijing, China, bibtex.
- Ulrich Schäfer, Uwe Kasterka: Scientific Authoring Support: A Tool to Navigate in Typed Citation Graphs. Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2010) Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids (CL&W-2010), pages 7-14, June 2010, Los Angeles, CA. bibtex
- Arif Bramantoro, Ulrich Schäfer, Toru Ishida: Towards an Integrated Architecture for Composite Language Services
and Multiple Linguistic Processing Components. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pages 3506-3511, ISBN 2-9517408-6-7, May 2010, Valletta, Malta. bibtex
- Peter Adolphs, Xiwen Cheng, Tina Klüwer, Hans Uszkoreit, and Feiyu Xu. Question answering biographic
information and social network powered by the semantic web. In Nicoletta Calzolari, Khalid Choukri, Bente
Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedings
of the Seventh International Conference on Language Resources and Evaluation (LREC-2010), Valetta, Malta.
European Language Resources Association (ELRA). 2010. bibtex
- Kathrin Eichler and Günter Neumann. DFKI KeyWE: Ranking keyphrases extracted from scientific articles.
In Proceedings of the Fifth International Workshop on Semantic Evaluations. International Workshop
on Semantic Evaluation (SemEval-2010), located at ACL, Uppsala, Sweden. Association for Computational
Linguistics. 2010. bibtex
- Alejandro Figueroa. Surface language models for discovering temporally anchored definitions on the web
- Producing chronologies as answers to definition questions. In WEBIST 2010 - Proceedings of the Fifth
International Conference on Web Information Systems and Technologies (WEBIST-2010), Valencia, Spain.
INSTICC Press. 2010.
- Yu Fu, Feiyu Xu, and Hans Uszkoreit. Determining the origin and structure of person names. In Nicoletta
Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner,
and Daniel Tapias, editors, Proceedings of the Seventh International Conference on Language Resources and
Evaluation (LREC-2010), Valetta, Malta. European Language Resources Association (ELRA). 2010.
- Brigitte Jörg, Hans Uszkoreit, and Alastair Burt. LT World: Ontology and reference information portal. In
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner,
and Daniel Tapias, editors, Proceedings of the Seventh International Conference on Language Resources
and Evaluation (LREC-2010), Valetta, Malta. European Language Resources Association (ELRA). 2010.
- Brigitte Jörg. CERIF: The common European research information format. Insight into the CERIF 2008 -
1.1 release. In Maximilian Stempfhuber and Nils Thiedemann, editors, Connecting Science with Society. The
Role of Research Information in a Knowledge-Based Society. International Conference on Current Research
Information Systems (CRIS-2010), Aalborg, Denmark. University of Aalborg. 2010.
- Brigitte Jörg. CERIF: The common European research information format model. Data Science Journal
(DSJ), CRISs for European e-Infrastructure(9):24-31. 2010.
- Tina Klüwer, Peter Adolphs, Feiyu Xu, Hans Uszkoreit, and Xiwen Cheng. Talking NPCs in a virtual
game world. In Proceedings of the 48th Meeting of the Association for Computational Linguistics (ACL-2010), System Demonstrations, Uppsala, Sweden. Association for Computational Linguistics. 2010. bibtex
- Tina Klüwer, Hans Uszkoreit, and Feiyu Xu. Using syntactic and semantic based relations for dialogue
act recognition. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING
2010), Beijing, China. Association for Computational Linguistics, Tsinghua University Press. 2010. bibtex
- Peter Spyns, Geert van Grootel, Brigitte Jörg, and Stijn Cristiaens. Realising a Flemish government innovation
information portal with business semantics management. In Maximilian Stempfhuber and Nils Thidemann,
editors, Connecting Science with Society. The Role of Research Information in a Knowledge-Based
Society. International Conference on Current Research Information Systems (CRIS-2010), Aalborg, Denmark.
University of Aalborg. 2010.
- Weiwei Sun. Improving Chinese semantic role labeling with rich syntactic features. In Proceedings of the
48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), Short Papers, Uppsala,
Sweden. Association for Computational Linguistics. 2010. bibtex
- Weiwei Sun. Semantics-driven shallow parsing for Chinese semantic role labeling. In Proceedings of
the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), Short Papers, pages
103-108, Uppsala, Sweden. Association for Computational Linguistics. 2010. bibtex
- Feiyu Xu, Hans Uszkoreit, Sebastian Krause, and Hong Li. Boosting relation extraction with limited
closed-world knowledge. In Proceedings of the 23rd International Conference on Computational Linguistics
(COLING 2010), Beijing, China. Association for Computational Linguistics, Tsinghua University Press. 2010. bibtex
2009
- Brigitte Jörg. CERIF: Common european research information format - formal contextual relations to guide through the
maze of research information. In Oleg Cvik, editor, Proceedings of the International Conference Research Information
Systems in the EU, in Conjunction with Standardization and Compatibility, pages 6-17, Bratislava, Slovakia. Centrum vedecko-technickych informacii SR. 2009.
- Hans Uszkoreit. Linguistics in Computational Linguistics: Observations and Predictions. In Proceedings of the EACL 2009
Workshop on the Interaction between Linguistics and Computational Linguistics, Athens, Greece. 2009. bibtex
- Hans Uszkoreit, Feiyu Xu, and Hong Li. Analysis and improvement of minimally supervised machine learning for relation
extraction. In Proceedings of International Conference on Applications of Natural Language to Information Systems
(NLDB 2009), Saarbrücken, Germany. 2009.
- Geert van Grootel, Peter Spyns, Stijn Christiaens, and Brigitte Jörg. Business semantics management supports government
innovation information portal. In: On the Move to Meaningful Internet Systems: OTM 2009 Workshops. Volume 5872 of
Lecture Notes in Computer Science, pages 757-766. Springer. 2009.
- Zygmunt Vetulani and Hans Uszkoreit, editors. Human Language Technology. Challenges of the Information Society.
Volume 5603 of Lecture Notes in Artificial Intelligence. Springer, Heidelberg. 2009.