The PARC 700 Dependency Bank
The PARC 700 Dependency Bank consists of 700 sentences which were randomly extracted from section 23 of the UPenn Wall Street Journal treebank, parsed with our LFG grammar, and given gold-standard annotations of grammatical dependency relations by manual correction and extension. Average sentence length: 19.8 words; average number of relation triples: 65.4.
- Development set: sentence numbers ending in 1 or 6 (parc numbering)
- Test set: all other sentences
The corpus is freely available for research and evaluation purposes. Please contact us personally in case you intend to use the corpus for commercial applications.
We would like to thank Ted Briscoe, Mick Burke, Aoife Cahill, John Carroll, Rebecca Watson, and Tomas By for corrections to the original release.
Downloadables
Documentation
- Dependency bank documentation , including a description of the file format, the methodology of the production of the dependency bank, and an extensive explanation of the grammatical dependency functions and other features appearing in the dependency bank.
- Documentation for emacs library for displaying and pruning dependency structures.
References
- Tracy H. King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M. Kaplan (2003). The PARC 700 Dependency Bank. In Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora, held at the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL'03), Budapest.
- Stefan Riezler, Tracy H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, Mark Johnson (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02), Philadephia, PA.
- Richard Crouch, Ronald M. Kaplan, Tracy H. King, Stefan Riezler (2002). A Comparison of Evaluation Metrics for a Broad Coverage Stochastic Parser. In Proceedings of the Workshop on "Parseval and Beyond" at the 3rd International Conference on Language Resources and Evaluation (LREC'02), 2000, Las Palmas, Spain.
For more information and references, visit the NLTT page or contact
Tracy Holloway King (www)
Last modified: , Tracy Holloway King