Difference between revisions of "Ngram frequency"

From Glottopedia
Jump to navigation Jump to search
(New page: Ngram frequency The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will cont...)
 
(Marked as {{ref}})
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Ngram frequency
 
 
 
The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will contain 2 bigrams: 'do' and 'og'. Bigram frequency is considered to be a measure of orthographic regularity and normally has a negative correlation with response times in psycho-linguistic experiments.
 
The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will contain 2 bigrams: 'do' and 'og'. Bigram frequency is considered to be a measure of orthographic regularity and normally has a negative correlation with response times in psycho-linguistic experiments.
  
Line 9: Line 6:
  
 
In the auditory domain the equivalent of bigram is diphone, a group of two phonemes. Mean diphone frequency could be considered a crude measure of phonological regularity.
 
In the auditory domain the equivalent of bigram is diphone, a group of two phonemes. Mean diphone frequency could be considered a crude measure of phonological regularity.
 +
 +
{{ref}}
 +
[[Category: En]]
 +
[[Category: DICT]]
 +
[[Category:Quantitative Linguistics]]

Latest revision as of 16:38, 18 July 2014

The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will contain 2 bigrams: 'do' and 'og'. Bigram frequency is considered to be a measure of orthographic regularity and normally has a negative correlation with response times in psycho-linguistic experiments.

It is not unusual to extend the word with a couple of 'space' characters, to give the first and last character in the word a special status. The word 'dog' will then become '_dog_' and now contains 4 bigrams: '_d', 'do', 'og' and 'g_'.

Ngram frequency of length 1 is equal to the character frequency, and using length 3 is commonly referred to as trigram frequency. Larger values for N are rare.

In the auditory domain the equivalent of bigram is diphone, a group of two phonemes. Mean diphone frequency could be considered a crude measure of phonological regularity.

REF This article has no reference(s) or source(s).
Please remove this block only when the problem is solved.