The goal of this research project is to define several new extensions to the state-of-the-art Information Extraction paradigm beyond ‘slot filling’, and achieve more accurate, salient, complete, concise and coherent extract results by exploiting dynamic background knowledge and cross-document cross-lingual event ranking and tracking. The approach consists of cross-document inference, unknown implicit event time prediction and reasoning, cross-document entity coreference resolution with global contexts, centroid entity detection, event attribute extraction and graph-based clustering algorithms for redundancy and contradiction detection, automatic new event clustering and active learning, abstractive summary generation based on extraction results, name translations with comparable corpora and cross-lingual co-training.
The broader impacts of the project are two-fold. The experimental research is linked to educational activities including project-related curriculum development. This project supports two PhD students and two undergraduate students in each of the five years, involves non-CS undergraduate students through utility evaluation and corpus annotation, and attracts elementary school and high school students by tutorials, regular research seminars and an extensive summer workshop. The results of this project will also have a benefit in E-Science and E-Learning by extracting and tracking the related knowledge from scientific literatures and learning materials used in elementary schools and high schools.