[[Category:BIOG|Lutosławski, Wincenty]]
[[Category:Quantitative Linguistics]]

Latest revision as of 15:13, 28 November 2007


The origin and development of modern quantitative linguistics is associated with the structuralist revolution of the first decades of the 20th century. Support for this notion can be found in the words of one of the creators of structuralism, J. N. Baudouin de Courtenay (1845–1929), who in fact did not apply mathematical methods himself, but who did, while conducting field studies, realise the virtues of a quantitative description of language and foresaw the advent of rigorous investigations into the laws of language. Citing J. Rozwadowski’s concept of the quantitative rules of language development (Rozwadowski 1909), he presented his view on the emerging relationships between the realm of numbers and "linguistic thought" (Baudouin de Courtenay 1927 [1990]: 549). His concept principally involves the semantic, syntactic, and morphologic representation of the number, dimensions, and intensities of attributes, and thus does not touch upon the concept of statistical linguistics operating with frequencies or other expressly numerical features of language elements. Nonetheless, this scholar perceived analogies between the physical domain, defined by precise and formalised laws, and language. He realised that the contemporary level of linguistic and mathematical knowledge was inadequate for the formulation of exact linguistic laws. "I, personally, having considered the rigour and functional dependency of the laws of the world of physics and chemistry, would hesitate to call that a ‘law’ which I consider merely an exceptionally skilful generalisation applied to phenomena at large" (ibid. 547). However, he anticipated such laws also being formulated for linguistic relationships in future, "[...] the time for genuine laws in the psycho-social realm in general, and first and foremost in the linguistic realm, is approaching: laws which can stand proudly beside those of the exact sciences, laws expressed in formulae of the absolute dependency of one quantity on another" (ibid. 560).

The roots of Polish quantitative linguistics go back further, though, to the period prior to this revolution. The scholar who may be recognised as its forerunner and one of the creators of stylometry was Wincenty Lutosławski (1863–1954). A graduate of the Technical University of Riga and the University of Dorpat, he was a lecturer at the University of Kazan and professor at the universities of Vilnius and Cracow (Jadacki 1998: 54–87; Chyl 1999: 12; Lutosławski 1933 [1994]). Having been educated in a German secondary school (Mitau/ Mitawa/Jeglava in Latvia), lecturing at a Russian university, and being a classical philologist, in addition to being a Pole experiencing Poland’s own peculiar form of Diaspora (Poland did not regain formal statehood until 1918), he had command of most of the European languages1 . His main field of interest was Platonic philosophy, he was also fascinated in messianic teachings, spiritualism, and Polish national movement.

1 During his university years he claimed speaking 9 languages (Lutosławski 1933 [1994]: 118–19). We might add here that his first wife was Sofía Pérez Eguía Y Casanova Lutosławska, a Spanish journalist, poet and novelist from Galicia.

The issues which today associate the work of Lutosławski with quantitative linguistics, and precisely the methodology of stylometry, arose from his studies of Plato. One of the classical problems of Hellenism, unresolved to this day, is the periodisation of Plato’s Dialogues. This is of vital significance for the interpretation of his legacy, as the chronological proximity (or remoteness) of the texts may suggest relationships in content (or the possible lack of such), which would consequently determine a reconstruction of the complete Platonic philosophical system (cf. Pawłowski, Pacewicz 2005).

Lutosławski decided to solve the problem of platonic chronology. Inspired by the ideas of the Scottish philosopher L. Campell (Lutosławski 1933 [1994]: 219–220), he worked out his own method based on the comparison of a great number of stylistic text characteristics. He was convinced that it would be possible to reconstruct the true order of platonic writings solely using their stylistic features: “If an exact definition be possible of the notes which distinguish Plato’s style from the style of other writers, or by which a work written contemporaneously with the Laws differs from a work written at the time when Plato founded the Academy, then we may hope to ascertain the true order of Platonic dialogues according to the stylistic variations observed in them.” (Lutosławski 1897a [1983]: 65–66) A concise formulation of his method is the law of stylistic affinity which states that: “Of two works of the same author and of the same size, that is nearer in time to a third, which shares with it the greater number of stylistic peculiarities, provided that their different importance is taken into account, and that the number of observed peculiarities is sufficient to determine the stylistic character of all the three works.” (ibid. 152) We shall introduce the fundamentals of Lutosławski’s method below and then mention the origin of stylometry in light of his achievement. The novelty of this idea, compared with earlier work, is the attention to Lutosławski’s role. Investigation indicates it was most probably he who first introduced the term "stylometry" into scientific use (“This future science of stylometry [emphasis mine – AP] may improve our methods beyond the limits of imagination [...]” – Lutosławski 1897a [1983]: 193, cf. also Lutosławski 1896, 1897b and 1898) and, despite being unfamiliar with modern statistical tools and research on the quantitative structure of lexicon and text, he defined the majority of its cardinal rules.

Lutosławski’s method rests on a few premises, not always directly articulated, which he accepted on the basis of observation, research results, and intuition. The effect of these efforts is surprisingly good compared with the assumptions of modern stylometry, all the more so as the author was primarily interested in sequencing the works of Plato, while the question of their authenticity (thus authorship) was secondary (cf. Lutosławski 1933 [1994]: 225, cf. discussion in Pawłowski, Pacewicz 2005). In Lutosławski’s view, the most important premises of the method of stylomertry are:

  1. Reliable information about dating of some writings by the controversial author (e.g. Laws, considered as Plato’s last text). It allows working out and verifying the hypotheses concerning the evolution of his style and the application thereof to the litigious works.
  2. Existence of individual style in the texts of every author and its independence of contents: “Now the external form of a writer is his style, and it betrays him even if he for some reason may be professing thoughts very different from those which we usually associate with his name.” (Lutosławski 1897a [1983]: 64)
  3. Possibility of solving the question of author’s arguable identity on the ground of stylistic proprieties of his texts, considered as external characteristics: „There is no exaggeration in this pretension, since questions of identification are generally settled by purely external tests.” (ibid. 65)
  4. Analogy between stylometry and graphology indicating potential effectiveness of the stylometric analysis. Lutosławski argued that if the uniqueness of handwriting is uncontroversial and officially recognised in the legal practice, a similar distinctive power should be associated with the characteristics of style: „The identity of handwriting, consisting in many minute signs difficult of definition, is held to be so far ascertainable, that on an expert’s decision in such matters a man’s life may sometimes depend. The limited number of marks of identity contained in a signature is sufficient to decide its authenticity for all purposes. […] If handwriting can be so exactly determined as to afford certainty as to its identity, so also with style, since style is more personal and characteristic than handwriting.“ (ibid. 65)
  5. Large but limited set of relevant stylistic features (peculiarities): „It may be objected that, since science style has an almost infinite number of characteristic notes, it cannot be reduced to one external formula. The answer is, that a like infinity of characteristics exists in every object of natural science, and that science is possible only through the distinction of essential marks from those which are unessential.” (ibid. 66). These features should appear in all the compared texts: „the number of observed peculiarities is sufficient to determine the stylistic character of all the three works.” (ibid. 152).
  6. Hierarchy of importance of the analysed stylistic features: „In order to draw our conclusions, we begin by recognising four degrees of importance, distinguishing stylistic peculiarities.” (ibid.146).
  7. Possibility of quantification and measurement of the degree of similarity of texts based on the number of shared stylistic features. Lutosławski formulated this principle as a law of stylistic affinity: „Of two works of the same author […] that is nearer in time to a third, which shares with it the greater number of stylistic peculiarities […].”(ibid. 152).
  8. Superiority of techniques synthesizing complex information about text and style: “[…] we needed a greater number of facts than has been known heretofore to any single author; but we found that five hundred peculiarities, selected at random from the special investigation, were sufficient for our purpose.” (ibid. 145). Text and style are considered here as very complex objects and this kind of synthesis cannot be obtained with a trad¬itional methodology: “But the definition of style requires a deeper study, because style is not, like handwriting, accessible to the senses.” (ibid.) It is worth emphasising that many years after Lutosławski’s publication certain coefficients of lexical richness and the methods of multidimensional scaling became fully operational as tools of the effectively synthesising information about text.
  9. Unidirectional evolution of personal style during the whole period of authorial creativity (concerns only chronological research): „[…] that the style of some writers has changed in the course of years is a patent fact” (ibid. 64).
  10. Necessity of comparing samples of equal length: „Of two works of the same author and of the same size […]” (ibid. 152).

Armed with the above premises, Lutosławski believed that by comparing those dialogues whose dates were beyond dispute with disputable texts whose similarities with the former were numerically expressed, it was possible to establish a complete chronology of Plato’s works. Drawing on studies by other authors, he defined 500 characteristics of Plato’s style and conducted a sequencing of the questionable dialogues on the basis of their appearance in 58’000 fragments (ibid. 74–139). Despite criticism, recent advances in Platonic studies (see: Brandwood 1990), and fundamental doubts as to any sort of periodisation of antique texts, Lutosławski’s proposition enjoys recognition in some Hellenistic circles to this day. “Lutosławski’s sequence [...] was widely accepted in the twentieth century. [...] Today, Lutosławski’s canon is still functional, although it is being challenged by more recent research.” (Kubikowska 1999: 6; cf. also Zaborowski 2000: 50).

From the perspective of modern quantitative linguistics, Lutosławski’s technique is inadequate with respect to statistics, and the very question of periodising texts is in itself dis¬putable. Multidimensional methods have supplanted most of the traditional solutions (cf. Wishart, Leach 1970), also connectionnist techniques employing artificial neural networks are proving to be increasingly effective (cf. Tweedie et al. 1996). Nevertheless, certain elements of his work present a permanent part of the development of not only classical philology, but quantitative linguistics as well. These involve the fact that it was Lutosławski who first introduced the term “stylometry”, used until today, and defined some of its general principles (presented above). As a typical representative of the positivistic view of science, he put great hope in it for the future. “This exceptional importance of one particular case will enable us to decide questions of authenticity and chronology of literary works with the same certainty as palaeographers now know the age and authenticity of manuscripts.” (Lutosławski 1897a [1983]: 193).

When analysing the actual influence of Lutosławski’s work on the development of stylometry as a division of quantitative linguistics, it is worth turning our attention to the absence of his work in modern published studies. If he is cited, the philological or material aspects are emphasised, while his methodology is wholly absent (e.g. Herdan 1966: 1). For A. Kenny, of value was only Lutosławski’s treatment on the state of research into the chronology of Plato’s works: “The work of theses and other scholars was magisterially synthesized by the Polish scholar W. Lutosławski in his work of 1897, The Origin and Growth of Plato’s Logic.” (Kenny 1982: 3). Probably the only discussion of his method available so far appears in B. Pindlowa’s study (1994: 18–20, 161), but there, too, there is no synthesis of his methodological premises and the postulates.

D. Holmes (1988), in a comprehensive and inspiring article on the history of stylometry, passes over Lutosławski’s contribution in silence, observing, “Mendenhall’s labours seemed to deter statisticians from following him, and there was a gap of some 30 years before G. Udny Yule In England and the American linguist George Zipf worked on alternative features of style.”2 Despite all the reservations concerning Lutosławski, the expression “gap of some 30 years” stands in sharp contrast to that portion of his work devoted to the stylometric method. It’s hard to explain the causes of this state of affairs. As a comment one could just invoke the words of Lutosławski himself, referring to the platonic studies at the end of the XIXth century: “As a Pole, the author may possibly be more impartial than the representatives of other nations more active in Platonic research. The works of British scholars are little known in Germany, and, on the other hand, many special German investigations are overlooked in France and Great Britain.” (Lutosławski 1897a [1983]: vii)”.

2 This is with reference to a 1887 article by T. C. Mendenhall devoted to the empirical frequency distributions of words. The term or the notion of stylometry do not appear in this article (Mendenhall 1887).

A provisional definition of the beginnings of stylometry would necessitate accepting a polygenetic concept consisting of a gradual crystallisation of the premises of this discipline based on the work of various researchers. Although it is difficult in a short article to discuss in detail all the studies which had a part in the process, we can mention the pioneers and most important creators of stylometry. In our opinion, this group would include, in chronological order, A. De Morgansup>3</sup> , W. Lutosławski, G.U. Yule (1938, 1944), J.K. Zipf (1935, 1949), F. Mosteller, and D. Wallace (1978, 1984), as well as the authors of the first studies applying multidimensional analysis (cf. Holmes, Forsyth 1995). None of these can be assigned absolute priority in calling stylometry into being. Each though had a, lesser or greater part in this process.

3 In a letter from 1851 he mentioned the relationship between the authorship of a text and mean word length. He also presented initial calculations of mean word length in the letters of St. Paul. He finally arrived at the conclusion that style allows distinguishing the author of a text even when the texts vary in topic. "I would have Greek, Latin or English tried and I should expect to find that one man writing on two different subjects agrees more closely with himself than two different men writing on the same subject." (De Morgan, 1882, quotation from a study by Williams (1970: 5)).


1863 Wincenty Lutosławski is born in Warsaw
1877 – 1881 secondary school (gymnasium) in Mitau/Mitawa (Jelgava), Latvia
1881 – 1883 studies at the Riga Polytechnic (Wilhelm Ostwald’s class)
1883–1884 travels in Europe (Switzerland, France, Italy, Austria)
1884–1885 studies of chemistry at the Dorpat (Tartu) University, candidate’s degree in chemistry
1884–1886 studies of philosophy at the Dorpat (Tartu) University under Gustav Teichmüller’s supervision, candidate’s degree in philosophy
1885–1886 studies of French philology at the École de Hautes Études in Paris, travel to Portugal, Spain and Morocco
1887–1888 studies of Plato under G. Teichmüler’s supervision in Dorpat (Tartu)
1887 master’s degree in philosophy at the Dorpat (Tartu) University
1888–1889 stay in Moscow, discovery of two unknown manuscripts of Giordano Bruno
1889–1890 stay in London
1890–1893 “dozent” at the Kazan University, lectures in logic, psychology and history of philosophy
1893–1894 stay in Spain, USA and England
1894–1895 stay in Drozdowo near Łomża (Poland)
1895–1898 stay in Spain and in England
1898–1899 stay in Finland, Sweden, Denmark and Germany
1898 doctor’s degree in philosophy at the Helsinki University
1889–1901 stay in Cracow
1889–1900 “Privatdozent” at the Jagiellonian University in Cracow
1901–1902 stay in Switzerland, lectures in Lausanne and Geneva
1904–1906 lectures at the University College in London
1907–1908 stay in the USA, numerous lectures
1908–1910 stay in Warsaw
1912–1916 lectures at the Geneva University
1916–1919 lectures at the University of Paris
1920–1933 professorship at the University of Vilna
1921 lectures on philosophy in Poznań
1923 lectures on philosophy in Warsaw and Lvov
1929 retirement at the University of Vilna
1931–1932 stay in France
1946–1948 lectures at the Jagiellonian University in Cracow
1954 Wincenty Lutosławski dies in Cracow

4 This selective CV is based on an extensive biography presented by J.J. Jadacki (1998: 54–57).


Adam Pawłowski: Glottometrics 8, 2004, 79 - 89