BLENDEET: Cross-document Cross-lingual Event Extraction and Tracking

 


  » Overview

  » Publications

  » Presentations
  » Demos
  » Softwares  » Awards


Overview

  • Award No. IIS-0953149 and IIS-1523198
  • Duration: 2010-2018
  • Title: Cross-Document Cross-Lingual Event Extraction and Tracking
  • Institution: Rensselaer Polytechnic Institute
  • Abstract

This research project defines several new extensions to the state-of-the-art Information Extraction paradigm beyond slot filling, and achieves more accurate, salient, complete, concise and coherent extract results by exploiting dynamic background knowledge and cross-document cross-lingual event ranking and tracking. The approach consists of cross-document inference, unknown implicit event time prediction and reasoning, cross-document entity coreference resolution with global contexts, centroid entity detection, event attribute extraction and graph-based clustering algorithms for redundancy and contradiction detection, automatic new event clustering and zero-shot transfer learning, abstractive summary generation based on extraction results, name translations with comparable corpora and cross-lingual co-training. This project develops state-of-the-art Information Extraction and Knowledge Base Population techniques for techniques for extracting, linking and tracking entities, relations, attributes, events, from unstructured data in a wide range of domains, genres and languages. These methods impose deep influence on the philosophy of combining symbolic semantic representation and distributional semantic representation for IE. This new paradigm extends extraction capability to hundreds of low-resource languages and thousands of entity and event types.

The broader impacts of the project are two-fold. The experimental research is linked to educational activities including project-related curriculum development. This project supports and completes seven PhD students, involves non-CS undergraduate students through utility evaluation and corpus annotation, and delivers many conference tutorials, research software repositories and demos.

  • Research Challenges
    • Cross-document event coreference resolution
    • Event ranking by salience and novelty
    • Event organization by participant, time, and place
    • Name translation
    • Knowledge Discovery for IE
    • Domain Adpatation techniques for Information Extraction
  • Point of Contact

Prof. Heng Ji (jih@rpi.edu)

  • Acknowledgement

This material is based upon work supported by the U.S. National Science Foundation under Grant No. IIS-0953149 and IIS-1523198. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not recessarily reflect the views of the National Science Foundation.

  • Date of Last Update: 07/10//2018