The origin and development of modern quantitative linguistics is associated with the structuralist revolution of the first decades of the 20th century. Support for this notion can be found in the words of one of the creators of structuralism, J. N. Baudouin de Courtenay (1845–1929), who in fact did not apply mathematical methods himself, but who did, while conducting field studies, realise the virtues of a quantitative description of language and foresaw the advent of rigorous investigations into the laws of language. Citing J. Rozwadowski’s concept of the quantitative rules of language development (Rozwadowski 1909), he presented his view on the emerging relationships between the realm of numbers and "linguistic thought" (Baudouin de Courtenay 1927 : 549). His concept principally involves the semantic, syntactic, and morphologic representation of the number, dimensions, and intensities of attributes, and thus does not touch upon the concept of statistical linguistics operating with frequencies or other expressly numerical features of language elements. Nonetheless, this scholar perceived analogies between the physical domain, defined by precise and formalised laws, and language. He realised that the contemporary level of linguistic and mathematical knowledge was inadequate for the formulation of exact linguistic laws. "I, personally, having considered the rigour and functional dependency of the laws of the world of physics and chemistry, would hesitate to call that a ‘law’ which I consider merely an exceptionally skilful generalisation applied to phenomena at large" (ibid. 547). However, he anticipated such laws also being formulated for linguistic relationships in future, "[...] the time for genuine laws in the psycho-social realm in general, and first and foremost in the linguistic realm, is approaching: laws which can stand proudly beside those of the exact sciences, laws expressed in formulae of the absolute dependency of one quantity on another" (ibid. 560).
The roots of Polish quantitative linguistics go back further, though, to the period prior to this revolution. The scholar who may be recognised as its forerunner and one of the creators of stylometry was Wincenty Lutosławski (1863–1954). A graduate of the Technical University of Riga and the University of Dorpat, he was a lecturer at the University of Kazan and professor at the universities of Vilnius and Cracow (Jadacki 1998: 54–87; Chyl 1999: 12; Lutosławski 1933 ). Having been educated in a German secondary school (Mitau/ Mitawa/Jeglava in Latvia), lecturing at a Russian university, and being a classical philologist, in addition to being a Pole experiencing Poland’s own peculiar form of Diaspora (Poland did not regain formal statehood until 1918), he had command of most of the European languages1 . His main field of interest was Platonic philosophy, he was also fascinated in messianic teachings, spiritualism, and Polish national movement.
1 During his university years he claimed speaking 9 languages (Lutosławski 1933 : 118–19). We might add here that his first wife was Sofía Pérez Eguía Y Casanova Lutosławska, a Spanish journalist, poet and novelist from Galicia.
The issues which today associate the work of Lutosławski with quantitative linguistics, and precisely the methodology of stylometry, arose from his studies of Plato. One of the classical problems of Hellenism, unresolved to this day, is the periodisation of Plato’s Dialogues. This is of vital significance for the interpretation of his legacy, as the chronological proximity (or remoteness) of the texts may suggest relationships in content (or the possible lack of such), which would consequently determine a reconstruction of the complete Platonic philosophical system (cf. Pawłowski, Pacewicz 2005).
Lutosławski decided to solve the problem of platonic chronology. Inspired by the ideas of the Scottish philosopher L. Campell (Lutosławski 1933 : 219–220), he worked out his own method based on the comparison of a great number of stylistic text characteristics. He was convinced that it would be possible to reconstruct the true order of platonic writings solely using their stylistic features: “If an exact definition be possible of the notes which distinguish Plato’s style from the style of other writers, or by which a work written contemporaneously with the Laws differs from a work written at the time when Plato founded the Academy, then we may hope to ascertain the true order of Platonic dialogues according to the stylistic variations observed in them.” (Lutosławski 1897a : 65–66) A concise formulation of his method is the law of stylistic affinity which states that: “Of two works of the same author and of the same size, that is nearer in time to a third, which shares with it the greater number of stylistic peculiarities, provided that their different importance is taken into account, and that the number of observed peculiarities is sufficient to determine the stylistic character of all the three works.” (ibid. 152) We shall introduce the fundamentals of Lutosławski’s method below and then mention the origin of stylometry in light of his achievement. The novelty of this idea, compared with earlier work, is the attention to Lutosławski’s role. Investigation indicates it was most probably he who first introduced the term "stylometry" into scientific use (“This future science of stylometry [emphasis mine – AP] may improve our methods beyond the limits of imagination [...]” – Lutosławski 1897a : 193, cf. also Lutosławski 1896, 1897b and 1898) and, despite being unfamiliar with modern statistical tools and research on the quantitative structure of lexicon and text, he defined the majority of its cardinal rules.
Lutosławski’s method rests on a few premises, not always directly articulated, which he accepted on the basis of observation, research results, and intuition. The effect of these efforts is surprisingly good compared with the assumptions of modern stylometry, all the more so as the author was primarily interested in sequencing the works of Plato, while the question of their authenticity (thus authorship) was secondary (cf. Lutosławski 1933 : 225, cf. discussion in Pawłowski, Pacewicz 2005). In Lutosławski’s view, the most important premises of the method of stylomertry are:
- Reliable information about dating of some writings by the controversial author (e.g. Laws, considered as Plato’s last text). It allows working out and verifying the hypotheses concerning the evolution of his style and the application thereof to the litigious works.
- Existence of individual style in the texts of every author and its independence of contents: “Now the external form of a writer is his style, and it betrays him even if he for some reason may be professing thoughts very different from those which we usually associate with his name.” (Lutosławski 1897a : 64)
- Possibility of solving the question of author’s arguable identity on the ground of stylistic proprieties of his texts, considered as external characteristics: „There is no exaggeration in this pretension, since questions of identification are generally settled by purely external tests.” (ibid. 65)
- Analogy between stylometry and graphology indicating potential effectiveness of the stylometric analysis. Lutosławski argued that if the uniqueness of handwriting is uncontroversial and officially recognised in the legal practice, a similar distinctive power should be associated with the characteristics of style: „The identity of handwriting, consisting in many minute signs difficult of definition, is held to be so far ascertainable, that on an expert’s decision in such matters a man’s life may sometimes depend. The limited number of marks of identity contained in a signature is sufficient to decide its authenticity for all purposes. […] If handwriting can be so exactly determined as to afford certainty as to its identity, so also with style, since style is more personal and characteristic than handwriting.“ (ibid. 65)
- Large but limited set of relevant stylistic features (peculiarities): „It may be objected that, since science style has an almost infinite number of characteristic notes, it cannot be reduced to one external formula. The answer is, that a like infinity of characteristics exists in every object of natural science, and that science is possible only through the distinction of essential marks from those which are unessential.” (ibid. 66). These features should appear in all the compared texts: „the number of observed peculiarities is sufficient to determine the stylistic character of all the three works.” (ibid. 152).
- Hierarchy of importance of the analysed stylistic features: „In order to draw our conclusions, we begin by recognising four degrees of importance, distinguishing stylistic peculiarities.” (ibid.146).
- Possibility of quantification and measurement of the degree of similarity of texts based on the number of shared stylistic features. Lutosławski formulated this principle as a law of stylistic affinity: „Of two works of the same author […] that is nearer in time to a third, which shares with it the greater number of stylistic peculiarities […].”(ibid. 152).
- Superiority of techniques synthesizing complex information about text and style: “[…] we needed a greater number of facts than has been known heretofore to any single author; but we found that five hundred peculiarities, selected at random from the special investigation, were sufficient for our purpose.” (ibid. 145). Text and style are considered here as very complex objects and this kind of synthesis cannot be obtained with a trad¬itional methodology: “But the definition of style requires a deeper study, because style is not, like handwriting, accessible to the senses.” (ibid.) It is worth emphasising that many years after Lutosławski’s publication certain coefficients of lexical richness and the methods of multidimensional scaling became fully operational as tools of the effectively synthesising information about text.
- Unidirectional evolution of personal style during the whole period of authorial creativity (concerns only chronological research): „[…] that the style of some writers has changed in the course of years is a patent fact” (ibid. 64).
- Necessity of comparing samples of equal length: „Of two works of the same author and of the same size […]” (ibid. 152).
Armed with the above premises, Lutosławski believed that by comparing those dialogues whose dates were beyond dispute with disputable texts whose similarities with the former were numerically expressed, it was possible to establish a complete chronology of Plato’s works. Drawing on studies by other authors, he defined 500 characteristics of Plato’s style and conducted a sequencing of the questionable dialogues on the basis of their appearance in 58’000 fragments (ibid. 74–139). Despite criticism, recent advances in Platonic studies (see: Brandwood 1990), and fundamental doubts as to any sort of periodisation of antique texts, Lutosławski’s proposition enjoys recognition in some Hellenistic circles to this day. “Lutosławski’s sequence [...] was widely accepted in the twentieth century. [...] Today, Lutosławski’s canon is still functional, although it is being challenged by more recent research.” (Kubikowska 1999: 6; cf. also Zaborowski 2000: 50).
From the perspective of modern quantitative linguistics, Lutosławski’s technique is inadequate with respect to statistics, and the very question of periodising texts is in itself dis¬putable. Multidimensional methods have supplanted most of the traditional solutions (cf. Wishart, Leach 1970), also connectionnist techniques employing artificial neural networks are proving to be increasingly effective (cf. Tweedie et al. 1996). Nevertheless, certain elements of his work present a permanent part of the development of not only classical philology, but quantitative linguistics as well. These involve the fact that it was Lutosławski who first introduced the term “stylometry”, used until today, and defined some of its general principles (presented above). As a typical representative of the positivistic view of science, he put great hope in it for the future. “This exceptional importance of one particular case will enable us to decide questions of authenticity and chronology of literary works with the same certainty as palaeographers now know the age and authenticity of manuscripts.” (Lutosławski 1897a : 193).
When analysing the actual influence of Lutosławski’s work on the development of stylometry as a division of quantitative linguistics, it is worth turning our attention to the absence of his work in modern published studies. If he is cited, the philological or material aspects are emphasised, while his methodology is wholly absent (e.g. Herdan 1966: 1). For A. Kenny, of value was only Lutosławski’s treatment on the state of research into the chronology of Plato’s works: “The work of theses and other scholars was magisterially synthesized by the Polish scholar W. Lutosławski in his work of 1897, The Origin and Growth of Plato’s Logic.” (Kenny 1982: 3). Probably the only discussion of his method available so far appears in B. Pindlowa’s study (1994: 18–20, 161), but there, too, there is no synthesis of his methodological premises and the postulates.
D. Holmes (1988), in a comprehensive and inspiring article on the history of stylometry, passes over Lutosławski’s contribution in silence, observing, “Mendenhall’s labours seemed to deter statisticians from following him, and there was a gap of some 30 years before G. Udny Yule In England and the American linguist George Zipf worked on alternative features of style.”2 Despite all the reservations concerning Lutosławski, the expression “gap of some 30 years” stands in sharp contrast to that portion of his work devoted to the stylometric method. It’s hard to explain the causes of this state of affairs. As a comment one could just invoke the words of Lutosławski himself, referring to the platonic studies at the end of the XIXth century: “As a Pole, the author may possibly be more impartial than the representatives of other nations more active in Platonic research. The works of British scholars are little known in Germany, and, on the other hand, many special German investigations are overlooked in France and Great Britain.” (Lutosławski 1897a : vii)”.
2 This is with reference to a 1887 article by T. C. Mendenhall devoted to the empirical frequency distributions of words. The term or the notion of stylometry do not appear in this article (Mendenhall 1887).
A provisional definition of the beginnings of stylometry would necessitate accepting a polygenetic concept consisting of a gradual crystallisation of the premises of this discipline based on the work of various researchers. Although it is difficult in a short article to discuss in detail all the studies which had a part in the process, we can mention the pioneers and most important creators of stylometry. In our opinion, this group would include, in chronological order, A. De Morgansup>3 , W. Lutosławski, G.U. Yule (1938, 1944), J.K. Zipf (1935, 1949), F. Mosteller, and D. Wallace (1978, 1984), as well as the authors of the first studies applying multidimensional analysis (cf. Holmes, Forsyth 1995). None of these can be assigned absolute priority in calling stylometry into being. Each though had a, lesser or greater part in this process.
3 In a letter from 1851 he mentioned the relationship between the authorship of a text and mean word length. He also presented initial calculations of mean word length in the letters of St. Paul. He finally arrived at the conclusion that style allows distinguishing the author of a text even when the texts vary in topic. "I would have Greek, Latin or English tried and I should expect to find that one man writing on two different subjects agrees more closely with himself than two different men writing on the same subject." (De Morgan, 1882, quotation from a study by Williams (1970: 5)).
SHORT CURRICULUM VITAE OF WINCENTY LUTOSŁAWSKI4
|1863||Wincenty Lutosławski is born in Warsaw|
|1877 – 1881||secondary school (gymnasium) in Mitau/Mitawa (Jelgava), Latvia|
|1881 – 1883||studies at the Riga Polytechnic (Wilhelm Ostwald’s class)|
|1883–1884||travels in Europe (Switzerland, France, Italy, Austria)|
|1884–1885||studies of chemistry at the Dorpat (Tartu) University, candidate’s degree in chemistry|
|1884–1886||studies of philosophy at the Dorpat (Tartu) University under Gustav Teichmüller’s supervision, candidate’s degree in philosophy|
|1885–1886||studies of French philology at the École de Hautes Études in Paris, travel to Portugal, Spain and Morocco|
|1887–1888||studies of Plato under G. Teichmüler’s supervision in Dorpat (Tartu)|
|1887||master’s degree in philosophy at the Dorpat (Tartu) University|
|1888–1889||stay in Moscow, discovery of two unknown manuscripts of Giordano Bruno|
|1889–1890||stay in London|
|1890–1893||“dozent” at the Kazan University, lectures in logic, psychology and history of philosophy|
|1893–1894||stay in Spain, USA and England|
|1894–1895||stay in Drozdowo near Łomża (Poland)|
|1895–1898||stay in Spain and in England|
|1898–1899||stay in Finland, Sweden, Denmark and Germany|
|1898||doctor’s degree in philosophy at the Helsinki University|
|1889–1901||stay in Cracow|
|1889–1900||“Privatdozent” at the Jagiellonian University in Cracow|
|1901–1902||stay in Switzerland, lectures in Lausanne and Geneva|
|1904–1906||lectures at the University College in London|
|1907–1908||stay in the USA, numerous lectures|
|1908–1910||stay in Warsaw|
|1912–1916||lectures at the Geneva University|
|1916–1919||lectures at the University of Paris|
|1920–1933||professorship at the University of Vilna|
|1921||lectures on philosophy in Poznań|
|1923||lectures on philosophy in Warsaw and Lvov|
|1929||retirement at the University of Vilna|
|1931–1932||stay in France|
|1946–1948||lectures at the Jagiellonian University in Cracow|
|1954||Wincenty Lutosławski dies in Cracow|
4 This selective CV is based on an extensive biography presented by J.J. Jadacki (1998: 54–57).
- Baudouin de Courtenay, Jan. 1927. Ilościowość w myśleniu językowym [Quantity as a dimension of thought about language]. In: Symbolae Grammaticae in honorem Ioannis (Jan) Rozwadowski v.1. (Festschrift) Cracoviae: Gebethner & Wolff, 3–18. Reprint: Baudouin de Courtenay, J.. 1990. Dzieła wybrane t.IV [Selected Writings, v.4]. Warszawa: PWN, 546–563.
- Brandwood L.. 1990. The Chronology of Plato’s Dialogues. Cambridge: Cambridge University Press.
- Chyl S.. 1999. Lutosławscy [The Lutosławski family]. Drozdowo-Zambrów: PWSM Zambrów.
- De Morgan S.E.. 1882. Memoir of Augustus de Morgan, by his wife Sophia Elisabeth de Morgan, with Selection of his Letters. London: Longmans, Green, and co.
- Herdan G.. 1966. The Advanced Theory of Language as Choice and Chance. Berlin, Heidelberg, New York: Springer Verlag.
- Holmes D.. 1998. The Evolution of Stylometry in Humanities Scholarship. Literary and Linguistic Computing 13/3. 111–117.
- Holmes D. & Forsyth R.S.. 1995. The Federalist Revisited: New Directions in Authorship Attribution. In: Literary and Linguistic Computing 10/2. 111–127.
- Jadacki J.J.. 1998. Wincenty Lutosławski, rozdział z dziejów myśli polskiej [Wincenty Lutosławski, a chapter from the history of Polish science]. In: Klukowski B.. Lutosławscy w kulturze polskiej Drozdowo: Towarzystwo Przyjaciół Muzeum Przyrody, 54–87.
- Kenny A.. 1982. The Computation of Style. London: Pergamon Press.
- Kubikowska E.. 1999. Od redakcji [From the editors]. In: Platon, Dialogi, v.1 (przekład W. Witwicki). Kęty: Wydawnictwo Antyk, 3–8.
- Lutosławski W.. 1896. Sur une nouvelle méthode pour déterminer la chronologie des dialogues de Platon (mémoire lu le 16 mai 1896 à l’Institut de France). Paris: H. Welter.
- Lutosławski W.. 1897a. The origin and growth of Plato's logic. London, New York, Bombay: Longmans, Green and Co. Reprint: The origin and growth of Plato's logic. Hildersheim: Georg Olms Verlag, 1983.
- Lutosławski W.. 1897b. On stylometry. Abstract of a paper read at the Oxford Philological Society on May 21st by Dr. W. Lutosławski, of Drozdowo, near Lomza, Poland. Classical Review 11.284–286.
- Lutosławski W.. 1898. Principes de stylométrie. Revue des études grécques 41. 61-81.
- Lutosławski W.. 1933. Jeden łatwy żywot [One easy existence]. Warszawa: Hoesick. Reprint: Jeden łatwy żywot [One easy existence]. Kraków: Fundacja im. Wincentego Lutosławskiego 1994.
- Mendenhall T.C. 1887. The characteristic curves of composition. Science11 11. 237–249.
- Mosteller F.& Wallace D. 1978. Deciding authorship.In: Tanur J.M.; Lehmann E..L. et al (eds.). 1978. Statistics: a guide to the unknown (2nd ed.). San Francisco: Holden-Day, 207-219.
- Mosteller F.& Wallace D.L.. 1984. Applied Bayesian and Classical Inference. New York, Berlin etc.: Springer Verlag.
- Pawłowski A.& Pacewicz A.. 2005. Wincenty Lutosławski – philosophe, helléniste ou fondateur sous-estimé de la stylométrie?. Historigraphia Linguistica [to be published].
- Pindlowa W.. 1994. Infometria w nauce o informacji [‘Infometry’ in the science of information]. Kraków: Universitas.
- Rozwadowski J.. 1909. Ein quantitatives Gesetz der Sprachentwickelung. Indogermanische Forschungen XXV. 38–50.
- Tweedie F.J.; Singh S.& Holmes D.I.. 1996. Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities 30. 1–10.
- Williams C.B.. 1970. Style and vocabulary: numerical studies. London: Griffin.
- Wishart D.& Leach S.. 1970. A multivariate analysis of Platonic prose rhythm. Computer Studies in the Humanities and Verbal Behaviour 3. 90–99.
- Yule G.U.. 1938. On ssentence-length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30. 363–390.
- Yule G.U.. 1944. The Statistical Study of literary Vocabulary. Cambridge: Cambridge University Press.
- Zaborowski R.. 2000. Platon w ujęciu Wincentego Lutosławskiego (1863–1954) i Adama Krokiewicza (1890–1977) [Wincenty Lutosławski’s (1863–1954) and Adam Krokiewicz’s (1890–1977) views on Plato]. In: Zaborowski R. (Ed.). Filozofia i mistyka Wincentego Lutosławskiego. [The philosophy and mysticism of Wincenty Lutosławski]. Warszawa: Stakroos, 47-83.
- Zipf G.K.. 1935. The psycho-biology of language; an introduction to dynamic philology. Boston: Houghton Mifflin Company.
- Zipf G.K.. 1949. Human behavior and the principle of least effort; an introduction to human ecology. Cambridge, Mass.: Addison-Wesley Press.
Adam Pawłowski: Glottometrics 8, 2004, 79 - 89