Law

Definition
The philosophy of science defines the term scientific law as a meaningful universal hypothesis which is systematically connected to other hypotheses in the field and at the same time well corroborated on relevant empirical data (cf. Bunge 1967). A law is called universal because it is valid at all times, everywhere, and for all objects of its scope. A well known example is the law of gravitation in physics. A law can be said to be a statement representing universal patterns in the world (the phenomenological type of law) or universal mechanisms (the representational or mechanistic type). While the first one relates two or more variables to each other without specifying the origin of this relation (black box), the second one includes such a specification. A system of laws is called a theory. The value of theories and their components, the laws, lies not only in their role as the containers of scientific knowledge but also in the fact that there can be no explanation without at least one law: A valid scientific explanation (the so-called deductive-nomological explanation) is a subsumption under laws taking into account boundary conditions. A specific form of the deductive-nomological explanation is the functional explanation, which follows an extended scheme and is possible only under special conditions (self-organizing systems such as biological evolution and language). Laws must not be confused with rules, which are either prescriptive or descriptive tools without any explanatory power; hence, also grammars and similar formalisms cannot explain anything. Another significant difference is that rules can be violated - laws (in the scientific sense) cannot.

Laws in the Study of Language and Text
In quantitative linguistics, the exact science of language and text, three kinds of uni-versal laws are known. The first kind takes the form of probability distributions, i.e. it makes predictions about the number of units of a given property. A well-known example of this kind is the Zipf-Mandelbrot Law (the status of the corresponding phenomenon has been discussed since the days of George K. Zipf, who was the first to systematically study quantitative properties of language from a scientific point of view). The law relates (a) the frequency of a word in a given text (in any language) to the number of words with the given frequency (called fre-quency spectrum), and (b) the frequency of a word in relation to its rank (called rank-frequency distribution). The first formulation by Zipf stated that about one half of the word tokens of a text have the frequency one (the so-called hapax legomena), a third of the rest – a frequency two (dis legomena), a quarter of the rest occurs three times in the text, etc. Zipf called it the harmonic law. It was later modified and corrected by Benoit Mandelbrot (outside linguistics known from his fractal geometry). He derived the law from the assumption that languages organize their lexicons in the way that the most frequent words become the shortest ones, using an optimization method (Lagrange multipliers) under the condition that the information of each code element must be greater than zero. This resulted in the famous formula (1), which has the form of a rank-frequency distribution: If the words are arranged according to their frequency, the most frequent word is assigned rank one etc. The formula gives the frequency a word should have at a given rank:

$$f(r) = \frac{K}{(b+r)^\gamma}$$

where f(r) is the frequency, r - the rank, b and γ - parameters, and K a normalizing constant. Since the seminal works of Zipf and Mandelbrot, numerous laws have been found. Other ex-amples of distributional laws are (in morphology and lexicon) the distribution of length, polysemy, synonymy, age, part-of-speech etc., (in syntax) the frequency distribution of syn-tactic constructions, the distribution of their complexity, depth of embedding, information, and position in mother constituent, (in semantics) the distribution of the lengths of definition chains in semantic networks, semantic diversification, etc. Any property and any linguistic unit studied so far displays a characteristic probability distribution. The second kind of law is called the functional type, because these laws link two (or more) variables, i.e. properties. An illustrative example of this kind is Menzerath’s Law (in the literature also called Menzerath-Altmann Law), which relates the size of linguistic constituents to the size of the corresponding construct. Thus, the (mean) length of the syllables of a word depends on the number of syllables the word consists of; the (mean) length of the clauses in a sentence depends on the length of the sentence (measured in terms of the number of clauses it consists of). The most general form of this law is given by the formula:

$$y = Ax^be^{(-cx)} ,$$

where y is the mean length of the constituents, x the length of the construct, and A, b, and c are parameters. The parameters of this law are mainly determined by the level of the units under study; they increase from the level of sound length gradually to the sentence and supra-sentence level. Fig. 1 gives an impression of a typical curve.



Fig. 1: The functional dependence of mean syllable length on word length in Hungarian. The line represents the prediction by the law; the marks show the coordinates of the empirical data points.

Other examples of functional laws are the dependence of word (or morph) frequency on word (or morph) length, the frequency of syntactic constructions on their complexity, of polysemy on length, of length on age, etc. The third kind of law is the developmental one. Here, a property is related to time. The best known example is the Piotrowski Law, which represents the development (increase and/or decrease) of the portion of new units or forms over time. This law is a typical growth process and can be derived from a simple differential equation with the solution:

$$ p(t) = {\frac{1}{1+a \cdot e ^{-r \cdot c \cdot t}}} $$

where p is the proportion of new forms at time t, c is the saturation value, and a and b are empirical parameters. Fig. 2 shows the increase of the forms with /u/ at the cost of the older form with /a/ in the German word ward>wurde (/vart/ > /vurde/) in the time period from 1445 to 1925.



Fig. 2: typical curve representing the replacement of a linguistic unit by a new one.

A variant of this third kind of law is based on (discrete) ‘linguistic’ instead of (continuous) physical time. The simplest way to operationalize linguistic time is the reference to text posi-tion. In oral texts, there is a direct correspondence of the sequence of linguistic units to physi-cal time intervals while written texts map this correspondence in a slightly more indirect way. A typical example of this variant is the type-token ratio (TTR), which was, in the beginning, a single number (the quotient of the number of different words, the types, and the number of running words, the tokens), used to characterize the vocabulary richness of a text. Later, it became apparent that this value is inappropriate for several reasons. Instead, at each text position, the number of types occurred so far is counted, which yields a monotonously increasing curve, because the number of words used before a given text position cannot decrease in the course of the rest of the text. A straightforward theoretical derivation of a corresponding law was given by Gustav Herdan (Herdan, 1966), represented by the simple formula:

$$T = aL^b$$

where y is the number of types, x -- the number of tokens (= text position), and b -- an empiri-cal parameter, a text characteristic. The parameter a is equal to unity if types and tokens are measured in terms of the same unit (as in almost all cases). The law is valid whether word-forms or lemmas are counted, just with a different parameter b. This parameter is also an indi-cator of the morphological type of the language under study if word-forms are considered because morphologically rich languages display a faster increase in word-form types than isolating languages. A problem of the TTR is that it is not independent of the overall text length. Therefore, more complicated formulae are used to take this influence into account or quite different models (cf. Popescu, Altmann 2006, 2007) are applied. Recent investigations found that other linguistic units show a similar behavior in their text dynamics (letters, morphs, syntactic constructions, syntactic function types etc.). However, depending on the size of their inventory in language (which may vary over several orders of magnitude -- compare, e.g. the size of an alphabet or a phoneme system to the size of a lexicon), different models have to be used. The TTR of syntactic units, e.g., follows the formula:

$$T = e^{-c}L^{b}e^{cL} = L^be^{c(L-1)} , c < 0$$

Fig. 3 shows a corresponding curve.



Fig. 3: The TTR of syntactic constructions in a text. The smooth line corresponds to the prediction by formula (5); the irregular line represents the empirical data.

There are many other examples of sequential regularities, e.g. rhythm, distances between like units, patterns of properties of units, fractal sequences of manifold properties displaying, however, typical time series character, chaotic sequences, which can be measured in terms of Hurst´s or Lyapunov´s coefficients, runs of properties and much more (cf. Altmann, 1980, Hrebicek 1997). Such dynamic patterns can be found on all levels of linguistic analysis including semantics and pragmatics.

Theory Construction
Currently, there are two approaches to the construction of a linguistic theory (in the sense of the philosophy of science): (1) synergetic linguistics and (2) Altmann’s and Wimmer’s unified theory. The basic idea behind synergetic linguistics (cf. Köhler 1986, 2005) is the aim to integrate the separated laws and hypotheses which have been found so far into a complex model which not only describes the linguistic phenomena but also provides a means to explain them. This is achieved by introducing the central axiom that language is a self-regulating and self-organizing system. An explanation of existence, properties, and changes of linguistic, more generally semiotic systems is not possible without the aspect of the (dynamic) interdependence of structure and function. Genesis and evolution of these systems must be attributed to repercussions of communication upon structure (cf. Bunge 1998 as opposed to Köhler/Martináková 1998). This axiom (i.e. the view of language as a system that develops in reaction to the properties and requirements of its environment by adaptation mechanisms in analogy to biological evolution) makes possible to set up a model on the basis of synergetics. The synergetic approach (cf. Haken/Graham 1971; Haken 1978) is a specific branch of sys-tems theory (von Bertalanffy 1968) and can be characterized as an interdisciplinary approach to the modeling of certain dynamic aspects of systems, which occur in different disciplines for different objects of investigation in an analogous way. Its particularity which separates it from other systems theoretical approaches is that it focuses on the ‘spontaneous’ rise and the devel-opment of structures. Synergetic modeling in linguistics starts from axiomatically assumed requirements, which a semiotic system must meet such as the coding requirement (semiotic systems have to provide means to create meaningful expressions), the requirement of coding and decoding efficiency, of memory saving, of transmission security, minimization of effort and many others. These requirements can be subdivided into three kinds (cf. Köhler 1990, 181f): (1) language-constitutive requirements, (2) language-forming requirements, and (3) control-level require-ments (the adaptation requirement, i.e. the need for a language to adapt itself to varying circumstances, and the opposite stability requirement). The second step is the determination of system levels, units, and variables which are of inter-est to the current investigation. In step three, relevant consequences, effects, and interrelations are determined. Here, the researcher sets up or systematizes hypotheses about dependences of variables on others, e.g. with increasing polytextuality of a lexical item its polysemy increases monotonically, or, the higher the position of a syntactic construction (i.e. the more to the right hand side of its mother constituent) the less its information, etc. The forth step consists of the search for functional equivalents and multi-functionalities. Step five is the mathematical formulation of the hypotheses set up so far – a precondition for any rigorous test - and step 6 is the empirical test of these mathematically formulated hy-potheses. In this way, for each subsystem of language (i.e. the lexical, morphological, syntactical etc. subsystems), models of arbitrary complexity are formed. The elements, the system variables, represent linguistic units or their properties, while the specific links between these elements are universal hypotheses, which obtain the status of laws if they have been intensively tested and corroborated. The other approach at theory construction in linguistics is Wimmer’s and Altmann’s unified theory. Integration of separately existing laws and hypotheses starts from a very general dif-ferential (alternatively: difference) equation and two also very general assumptions: (1) If y is a continuous linguistic variable (i.e. some property of a linguistic unit) then its change over time or with respect to another linguistic variable will be determined in any case by its tempo-rary value. Hence, a corresponding mathematical model should be set up in terms of its relative change (dy/y). Consider, as an example, the change of word length in dependence on its frequency. We know that words become shorter if they are used more frequently but a long word will be shortened to a higher extent than an already relatively short one. (2) The independent variable which has an effect on y has to be taken into account also in terms of its rela-tive change (i.e., dx/x). In our example, it is not the absolute increase in usage of a word that causes its shortening but the relative one. The discrete approach is analogical; one considers the relative difference Δyx/yx. Hence, the general formulas are dy/y = g(x)dx and Δyx-1 / yx-1 = g(x). Based on various results in linguistics it could be shown that for the continuous case it is sufficient to consider

$$

\frac{dy}{d-y} = \left( a_0+\sum_{i=1}^{k_1} \frac{a_{1i}}{(x-b_{1i})^{c_1}}+\sum_{i=1}^{k_2}\frac{a_{2i}}{(x-b_{2i})^{c_2}}+ ... \right) dx

\quad \mbox{with} \ c_i \not= c_j, i \not= j. \quad (\mbox{for}\quad k_s = 0 : \sum_{i=1}^{k_5}      \frac{a_{ji}}{(x-b_{ji})^{c_s}})

$$

and for the discrete case

$$

\frac{\Delta{P_{x-1}}}{P_{x-1}} = a_0+\sum_{i=1}^{k_1} \frac{a_{1i}}{(x-b_{1i})^{c_1}}+\sum_{i=1}^{k_2}\frac{a_{2i}}{(x-b_{2i})^{c_2}}+ ... \quad.

$$

Both are well interpretable linguistically and yield the same results as the synergetic ap-proach. The great majority of laws known up to now can be derived from the above equations (e.g. Menzerath´s law, Zipf-Mandelbrot law, Frumkina´s law, all laws of length, diversifica-tion laws, TTR, synonymy, polysemy, polytextuality laws, morphological productivity, vo-cabulary growth, Krylov´s law, the law of change, etc.). The discrete and continuous ap-proaches can be transformed into one another (cf. Mačutek, Altmann 2007) and yield all discrete probability distributions used in linguistics. The parameters are interpreted as specific language forces as known from synergetic linguistics. Both models, the unified one and the synergetic one, turn out to be two representations of the same basic assumptions. The synergetic model allows easier treatment of multiple de-pendencies for which in the general model partial differential equations must be used.

Link
U Trier page on laws in Quantitative Linguistics