In order to be able to meticulously study the
English language, an annotated text corpus was much needed. The Penn
Treebank[5] was one of the most used corpora. It consisted of IBM computer manuals, transcribed telephone conversations, and other texts, together containing over 4.5 million words of American English, annotated using both
part-of-speech tagging and syntactic bracketing.[6]
Japanese sentence corpora were analyzed and a pattern of
log-normality was found in relation to sentence length.[7]
Modeling language acquisition
The fact that during
language acquisition, children are largely only exposed to positive evidence,[8] meaning that the only evidence for what is a correct form is provided, and no evidence for what is not correct,[9] was a limitation for the models at the time because the now available
deep learning models were not available in late 1980s.[10]
It has been shown that languages can be learned with a combination of simple input presented incrementally as the child develops better memory and longer attention span,[11] which explained the long period of
language acquisition in human infants and children.[11]
Robots have been used to test linguistic theories.[12] Enabled to learn as children might, models were created based on an
affordance model in which mappings between actions, perceptions, and effects were created and linked to spoken words. Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure.
Using the
Price equation and
Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution but also gives insight into the evolutionary history of modern-day languages.[13]
Chomsky's theories
Attempts have been made to determine how an infant learns a "non-normal grammar" as theorized by
Chomsky normal form.[9]
^Taylor, Ann (2003). "1". Treebanks. Spring Netherlands. pp. 5–22.
^Furuhashi, S. & Hayakawa, Y. (2012). "Lognormality of the Distribution of Japanese Sentence Lengths". Journal of the Physical Society of Japan. 81 (3): 034004.
Bibcode:
2012JPSJ...81c4004F.
doi:
10.1143/JPSJ.81.034004.
^
abBraine, M.D.S. (1971). On two types of models of the internalization of grammars. In D.I. Slobin (Ed.), The ontogenesis of grammar: A theoretical perspective. New York: Academic Press.