NIST TAC Knowledge Base Population (KBP2015) Entity Discovery and Linking Track

Participants can obtain data from LDC after finish registration:

 

  • Core Data Sets:
  • KB:
    • 4/20/2015 LDC2015E42 TAC KBP 2015 Tri-lingual Entity Discovery and Linking Knowledge Base Official 2015 KB as downloaded by LDC from :BaseKB Gold in January 2015 with no further processing applied             
    • 5/6/2015 LDC2015E43 TAC KBP 2015 Tri-lingual Entity Discovery and Linking Knowledge Base Entries Creation Algorithm LDC algorithm for generating human-readable versions of KB entries for use during linking annotation       

 

  • Training and Evaluation Corpora:     
    • 7/17/2015 LDC2015E72 TAC KBP 2015 English Entity Discovery Sample Data             
    • 7/22/2015 LDC2015E75 TAC KBP 2015 Tri-lingual Entity Discovery & Linking Training Data
    • 8/17/2015 LDC2015E?? TAC KBP 2015 English Entity Discovery Evaluation Gold Standard Entity Mentions            
    • 9/1/2015 LDC2015R?? TAC KBP 2015 Tri-lingual Entity Discovery and Linking Evaluation Source Corpus              
    • 9/30/2015 LDC2015R?? TAC KBP 2015 Tri-lingual Entity Discovery and Linking Evaluation Queries
    • 10/5/2015 LDC2015E?? TAC KBP 2015 Tri-lingual Entity Discovery and Linking Evaluation Queries and Gold Standard Knowledge Base Links

 

  • Pilot Evaluation:
  • 6/18/2015 LDC2015E44 TAC KBP 2015 Tri-Lingual Entity Discovery and Linking Pilot Gold Standard Knowledge Base Links V1.1 2015 Tri-lingual Entity Discovery & Linking pilot gold standard entity mentions as well as the source documents, KB links, NIL equivalence class clusters, and entity type information for each.             
  • 4/27/2015 LDC2015E61 TAC KBP 2015 Tri-Lingual Entity Discovery and Linking Pilot Source Corpus 15 source documents for Tri-lingual EDL pilot     

    

  • Training Data from Previous Years:
    • 3/20/2015 LDC2015E17 TAC KBP Chinese Entity Linking Comprehensive Training and Evaluation Data 2011 - 2014 All Chinese Entity Linking training and eval data from 2011-2014 including queries, KB links, equivalence class clusters for NIL entities, and entity type information for each of the queries.             
    • 2/12/2015 LDC2015E18 TAC KBP Spanish Entity Linking - Comprehensive Training and Evaluation Data 2012 - 2014 All Spanish Entity Linking training and eval data from 2012-2014 including queries, KB links, equivalence class clusters for NIL entities, and entity type information for each of the queries.             
    • 3/17/2015 LDC2015E19 TAC KBP English Entity Linking - Comprehensive Training and Evaluation Data 2009 - 2013 All English Entity Linking training and eval data from 2009-2013 including queries, KB links, equivalence class clusters for NIL entities, and entity type information for each of the queries plus IAA results.             
    • 4/8/2015 LDC2015E45 TAC KBP Comprehensive English Source Corpora 2009-2014 Complete set of English source documents for Regular, Temporal, Sentiment and Surprise Slot Filling from 2009-2014.
    • 3/31/2015 LDC2015E20 TAC KBP English Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014 Partial training and complete eval data for 2014 English Entity Discovery & Linking including queries, KB links, equivalence class clusters for NIL entities, and entity type information for each of the queries.             
    • LDC2014T12 Abstract Meaning Representation (AMR) Annotation Release 1.0
    • LDC2012T21 Annotated English Gigaword

 

  • Other data sets which may help entity mention extraction and coreference resolution:
  • LDC2013E64 DEFT Phase 1 ERE Annotation R3 V3
  • LDC2014E31 DEFT ERE English Discussion Forum Annotation V3
  • LDC2014E113 DEFT ERE Chinese Discussion Forum Annotation
  • LDC2014E114 DEFT ERE Chinese and English Parallel Annotation V2
  • 07/02/15 LDC2015E29 DEFT Rich ERE English Training Annotation V2
  • 07/02/15 LDC2015E68 DEFT Rich ERE English Training Annotation R2 V2
  • 07/08/15 LDC2015E71 DEFT Spanish Light ERE Training Data V2
  • LDC2006T06 ACE 2005 Multilingual Training Corpus
  • LDC2005T09 ACE 2004 Multilingual Training Corpus
  • LDC2004T09 TIDES Extraction (ACE) 2003 Multilingual Training Data
  • LDC2003T11 ACE-2 Version 1.0
  • LDC2013T19 OntoNotes Release 5.0

 

  • Other data sets which may help name translation:
  • LDC2014T18 ACE 2007 Multilingual Training Corpus
  • LDC2009T11 REFLEX Entity Translation Training/DevTest
  • LDC2005T34 Chinese <-> English Name Entity Lists v 1.0

 

  • Other data sets which may help parsing:
  • LDC2013T21 Chinese Treebank 8.0
  • LDC2012E109 BOLT Phase 1 Chinese Treebank DF Part 1
  • LDC2012E120 BOLT Phase 1 Chinese Treebank DF Part 2 Version 2.0
  • LDC2012E130 BOLT Phase 1 Chinese Treebank DF Part 3
  • LDC2013E32 BOLT Phase 1 Chinese Treebank DF Part 4
  • LDC2015T06 GALE Chinese-English Parallel Aligned Treebank -- Training
  • LDC2007T02 English Chinese Translation Treebank v 1.0

 

DTDs:

Entity Linking DTD

Source Data DTD

Knowledge Base DTD
 

Other useful resources:

Wiki-based linguistically grounded knowledge base: http://www.gabormelli.com/RKB/