Identifying missing dictionary entries with frequency-conserving context models
J. R. Williams, E. M. Clark, J. P. Bagrow, C. M. Danforth, and P. S. Dodds
Physical Review E, 92, 042808, 2015
Times cited: 8
Logline: The lexical edge of language: We show how to find phrases that aren't in the dictionary but perhaps should be.
Abstract:
In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability in the presence of ordered symbolic data (e.g., text, speech, genes, etc...). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary—an extensive, online, collaborative, and open-source dictionary that contains over 100, 000 phrasal-definitions—we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases.
- This is the default HTML.
- You can replace it with your own.
- Include your own code without the HTML, Head, or Body tags.
BibTeX:
@Article{williams2015c, author = {Williams, Jake Ryland and Clark, Eric M. and Bagrow, James P. and Danforth, Christopher M. and Dodds, Peter Sheridan}, title = {Identifying missing dictionary entries with frequency-conserving context models}, journal = {Physical Review E}, year = {2015}, key = {language}, volume = {92}, pages = {042808}, note = {Available online at \href{https://arxiv.org/abs/1503.02120}{https://arxiv.org/abs/1503.02120}}, }