Research topics

Statistical Estimation of Vocabulary Size Including "Unseen" Words.

The number of unique words in children’s speech is one of most basic statistics indicating their language development. We may face, however, to a difficulty to accurately evaluate the number of unique words in a child’s growing corpus over time with a limited sample size. This study proposes a novel technique to estimate the latent number of words from a series of words uttered by children. This technique utilizes statistical properties of the number of types as a function of the number of sampled tokens. We tested the practical effectiveness of the proposed method in the empirical data analysis of the cross-sectional and longitudinal samples. The converging empirical evidence suggests that the proposed estimator improves the accuracy of vocabulary size estimation over a naïve type-counting estimators. Utilizing this efficient estimator, we propose a new sampling scheme for vocabulary assessment that has lower cost and higher accuracy compared to existing methods.

Keywords

Vocabulary growth; Small sample size; Number of latent types; Type–token ratio;

観察されない語彙を含む語彙数の推定法

Related papers (See also other publications/ 関連する発表論文 (その他の論文など)


トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS