Jerzy Woronczak

From Glottopedia
Revision as of 16:02, 28 November 2007 by WikiLingua (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Jerzy Woronczak (1923-2003) is one of the most important Polish scholars in the field of quantitative linguistics, and, in fact, one of the founders of this discipline in Poland.

Jerzy Woronczak



9 Nov. 1923 Jerzy Woronczak is born in Radomsko;
1936–1939 secondary school in Radomsko, studies of Hebrew with Jakub Fajner (until 1940);
1945 matura (secondary-school certificate) at the gymnasium of Radomsko, arrival to Wrocław;
1945–1947 studies at the Faculty of Human Sciences, University of Wrocław (history);
1948–1952 studies at the Faculty of Human Sciences, University of Wrocław (Polish philology);
1945–1953 employment at the Wrocław University Library;
1953–2003 Emyment at the Institute of Literary Research, Polish Academy of Sciences, Wrocław section (Instytut Badań Literackich Polskiej Akademii Nauk, henceforth IBL PAN);
1952 master's degree at the University of Wrocław in Polish philology;
1959 doctor's degree at the IBL PAN, dissertation “Studia nad wierszem polskiego średniowiecza” (Studies of Polish medieval verse);
1961–2003 member of the editorial committee of a dictionary of 16th century Polish;
1962 Woronczak’s study is published which conclusively settles the perennial dispute over the originality and chronology of Bogurodzica’s stanzas [Bogurodzica is the oldest Polish literary text], as well as indicates the filiations of its numerous versions.
1965 member of the editorial board of "Biblioteka Pisarzów Polskich" (Library of Early Polish Writers);
1965–1970 beginning of works on the Frequency dictionary of modern Polish (J. Woronczak was leading the project);
1968 chief of the editorial board of "Biblioteka Pisarzów Polskich" (Library of Early Polish Writers);
1966 habilitation at the IBL PAN, dissertation “Rękopis nr 149 Biblioteki Kapitulnej w Gnieźnie (Missale Plenarum z przełomu XI I XII w.). Opracowanie filologiczne” (Manuscript no. 149 from the Capitular Library in Gniezno, Missale Plenarum, XI–XII c. A Philological Analysis.);
1967–1978 head of the Section of Medieval Texts at the IBL PAN;
1975–2001 lectures at the Institute of Polish Philology, University of Wrocław (European and Polish medieval literature and culture, linguistics, Judaism);
1978 beginning of Woronczak’s opus vitae – the edition of the Complete Works (Dzieła Wszystkie) of Jan Kochanowski;
1984 extraordinary professor at the IBL PAN;
1991 ordinary professor at the IBL PAN;
1993–1996 founder and chief of the Research Centre for the Culture and Languages of the Polish Jews at the Institute of Polish Philology, University of Wrocław;
1993 officially retired, but scientifically active until the end of his days;
winter 2003 Woronczak’s last lecture at a meeting of the Wrocław Philological Society;
6 Mar. 2003 death of Jerzy Woronczak in Wrocław.

Contribution to quantitative linguistics

There are three areas in which one must examine Jerzy Woronczak’s work in the field of quantitative linguistics (cf. Pawłowski, Sambor 2004, Kamińska-Szmaj 2004): The first pertains to the scientific merits of his own studies, the second to the knowledge he imparted to his students (master’s and doctor’s degree candidates), and the third to the value of the popularisation of his achievements against the background of the rather traditionalistic currents predominant in Polish linguistics.

Here it should be sufficient to state that during his active scientific career, J. Woronczak (as a participant, patron, reviewer, or adviser) took part in practically all the scientific initiatives in Poland connected with mathematical applications in linguistic research. With regard to the third area (popularisation), we must admit that the studies on the application of mathematics to problems of history and literary theory, which he undertook in the 1960’s, testified not only to his vast knowledge, which defied precise labelling, but also to his civil courage. One should not forget that for the traditional representatives of the humanities of the time, combining the poetics and aesthetics of literature with mathematics was a sign of unaccountable intellectual bravado, with a dash of humbug. Such studies were treated as peculiarly eccentric, based on the appropriation of a methodology foreign to the discipline, leading in the best case to reductionism, and therefore a simplification of complex linguistic material. Referring to the issue of numerical methods in prosody and versification (one of Woronczak’s favourite topics), M.R. Mayenowa remarked that “Statistical methods of versification analysis often arouse hostility even where there are no such basic reservations as to the principle; what is more, they arouse objections also there where, in their simplest form, they are always applied. [...] It seems that the basis of the protest is sheer psychological, and one should not wonder. The professional who has devoted many years to mastering the traditional language of his discipline can only with difficulty and great humility accept a situation in which someone discusses his discipline in another tongue.” (Mayenowa 1965: 170) One of the reasons for the success of statistical linguistics in Poland was the fact that Woronczak was able to speak “in another tongue” about the traditional issues of philology and linguistics and set an example for the younger generation of scholars. Below we shall discuss the most important quantitative works of Jerzy Woronczak according to thematic groups. It is worth adding that in the 1970’s, he gradually turned to his original interests, namely early mediaeval history, the antiquity, biblical studies and, most of all, Hebrew studies and the history of Polish Jews. From this time on, quantitative themes appeared mostly in the subjects of the theses and dissertations of his students, predominantly involving, by the way, the Bible and/or ancient texts.


It is worth reminding that the tradition of stylometric research in Poland goes back to the end of the XIXth century. One of the fathers of stylometry was Polish Hellenist W. Lutosławski, who coined the term “stylometry” and defined its general rules (Lutosławski 1897 [1983]). In the 1950’s, W. Kuraszkiewicz, an expert in Slavonic studies, suggested using numerical measures of lexical richness (Kuraszkiewicz, Łukaszewicz 1951). His coefficient, like the one of Guiraud to which it is similar, has no practical significance today, but it played an important role in promoting mathematical methods among Polish linguists. Woronczak turned to the problems of stylometry at the beginning of the 1960’s. In contrast to his predecessors, though, he applied significantly more refined and effective mathematical tools. It should be emphasized that his work had both a theoretical (the analytical derivation of estimators) and a practical (applications in the solving of real problems in linguistics) aspect. It was his goal to discover unbiased estimators of indices of lexical richness which were sensitive to lexical variety, but independent of the length of the text fragment under investigation (Woronczak 1965b). Starting from the so-called Good’s measures (Good 1953), which express the probability of randomly selecting m elements belonging to one and the same class in m independent samplings from a general population.

(1) 1b7270a03b7b825e9b2c575dbfdd734b.png

Woronczak derived equations for the estimators c_m for m = 2 and m = 3:

(2) A47687178edef9ece08e41fcf78f9605.png

(3) 1dc06312a212eaeaab15d6b1d42c68e9.png

where f_i is the frequency of the i-th word-form, and N the length of the sample1 .

Equation (4) is a generalisation of equations (2) and (3). Its author, though, did not recommend calculating its value for m > 3 (Woronczak 1976).

(4) 8a96b11a0da65963bd4463f0d6a5002f.png

1 Equation (2) was also derived by G. Herdan. Both scholars demonstrated the similarity of c2 and Yule’s K-characteristic.

Applying the parameters B and from Mandelbrot’s equation, Woronczak then derived equations for the expected length of a given text’s vocabulary and the expected size of the class of words of a given frequency (ibid.). The estimators (2) and (3) were initially verified by their author (Woronczak 1965b), but he admitted in another article that the set-up of the test was not entirely satisfactory (Woronczak 1976: 167). That is also why they were submitted to further verification on an extensive corpus (Pawłowski 1994). The dynamics of change in the values of several indices of lexical richness were compared in a corpus of French literary texts (prose by Romain Gary), the length of which was gradually increased from 20’000 to 600’000 words. The measure for evaluating an index was the dispersion2 of its value with increasing length of text. One already observes a significant improvement in index stability of log TTR, though the Dugast and Yule indices, as well as those of Woronczak, proved to be the most stable (Tab.1).

2Standard deviation divided by the mean

Table 1
Indices of lexical richness and their dispersions
Indices of lexical richness and their dispersions.jpg

Woronczak (1976) also showed that there is a connection between the values of the estimators and and the lexical cohesion of a text. Analyzing the dynamics of the mean variations of these estimators with ever-increasing sample length of a continuous text (for ), he noticed that the estimators first increased in value with increasing N, but then stabilized, despite the geometric progression of N. The value of N at which the relative stabilization of the indices and takes place (or their maximum values), marks precisely the limit of the lexical cohesion of the text, indicating at the same time the average length of the fragments, which are closed to a certain degree with respect to vocabulary and theme. The test which Woronczak conducted on texts by St. Fulgentius and St. Augustine confirmed this hypothesis. The Augustinian text, which was addressed to an uneducated social class and therefore written in a simple manner, produced an N limit of ca. 45 words, while that for the more difficult and literary Fulgentius text was ca. 128 words.


The beginnings of research using corpora in Poland must be associated with the preparation of the Frequency Dictionary of Modern Polish (Słownik Frekwencyjny Polszczyzny Współczes¬nej, hereinafter SFPW) in the 1960’s and 70’s, modelled on the Juilland dictionaries (Kurcz et al. 1990). Woronczak was, next to J. Sambor, one of the chief initiators and authors of this undertaking of several years’ duration (Lewicki, Sambor 1969). The SFPW was compiled on the basis of a sampling of 500’000 words encompassing five functional styles (genres): scientific texts, small press items, commentary on current affairs, literary prose, and drama. The fundamental indicators describing the frequency distribution of a lexeme in the stylistic categories were the Juilland measures F, D, and U. The empirical data contained in the SFPW became the basis for several analyses of Polish (see: Kamińska-Szmaj 1988, 1989, 1990; Sambor 1971; Hammerl 1989; Pawłowski 1999a, 1999b). It is worth mentioning that the current SFPW corpus has been converted to digital form and is available on the Internet (, cf. Ogrodniczuk 2003).


Although Woronczak did not extensively apply this type of methodology, he was fully aware of the possibilities which multidimensional analysis had to offer in the taxonomy of textual objects. He knew the works of J. Czekanowski3 , whom he met in Wrocław on several occasions during seminars on applied mathematics organized by H. Steinhaus. In 1962 he published a study, where multidimensional scaling in a rudimentary form was applied to establish the origin and filiations of Bogurodzica (the oldest Polish literary text). Using 56 text features, he classified all the Bogurodzica’s remaining versions (coming from the period of XV–XVII century). This helped him conclusively settle the perennial dispute over the originality and chronology of Bogurodzica’s stanzas (Woronczak 1962 [1993]). Woronczak also mentioned his discussions with A. Kolmogorow4 on the topic of spatial representations of "linguistic objects" and encouraged the author of these lines to conduct a taxonomy of Polish poetic texts.

3 In the 40s Jan Czekanowski introduced multivariate methods in anthropology and linguistics (for further information see: Adam Pawłowski, Jan Czekanowski (1882–1965) – a pioneer of multidimensional taxonomy. To be published in one of the forthcoming issues of Glottometrics).

4 Most likely during a conference on the versification of Slavic languages organised in Warsaw by the Institute of Literary Research of the Polish Academy of Sciences in August of 1964 (see Mayenowa 1965).


The American linguist J. K. Zipf is recognized as the initiator of studies on the statistical laws of language. Other scholars, such as J.N. Baudouin de Courtenay (Baudouin de Courtenay 1927 [1990]: 549), also anticipated their appearance. The dependencies Zipf discovered between the frequencies of expressions, their lengths, number of meanings, and rank are generally known as "Zipf’s laws". They stimulated the search for other linguistic laws within the framework of a broad paradigm of systems theory or cognitive science (Hammerl, Sambor 1993).

Woronczak studied Zipf’s fundamental law, which describes the relationship between the rank of a word in a list and its frequency (Woronczak 1967). Starting with the equations of Estoup, Joos, and Mandelbrot, he developed an analytical description of the quantitative structure of the vocabulary of a complete text, treating it as a sampling from the general population of the language, and derived equations for the expected size of the vocabulary of a text with a length of N word-forms and for the expected number of words with an assigned frequency (ibid. 2259). He also considered generalising the equations he obtained for an infinite text of length N\longrightarrow\infty and rank N\longrightarrow\infty . It must be added that Woronczak’s above-mentioned generalisations had never been the subject of empirical verification and were of deductive-theoretical nature.


As an expert on the literature, versification, and musical notation of the Middle Ages, Woronczak devoted many of his studies to research into texts in Old Polish (1958 [1993], 1960, 1965a, 1993) and in Old Czech (1963 [1993]). He approached this topic in his typical manner, i.e. both from a philological and a quantitative perspective. The statistical models he elaborated and the tests he employed were never goals in themselves, nor, consequently, were the linguistic materials he used merely a pretext for the abstract solutions often encountered in formalistic approaches. It is certainly this balance between philological-linguistic content and mathematical formalism which resulted in this aspect of Woronczak’s work becoming an especially valuable element of his scientific legacy. We will discuss here just some of his most representative works devoted to versification.

In 1960 his analysis of the distributions of the verse lengths of asyllabic Slavonic poetry of the 15th – 16th centuries appeared (Woronczak 1960). For the sake of comparison he described the numerical distribution of the length of sentences in Polish prose; this proved to be a gamma distribution with a large right-sided asymmetry. He then found that the variance in length of asyllabic verses was less than that of sentences in prose and decreased with time, which was an indication of the gradual formation of the Polish syllabic system. This gradual formation process of Polish syllabic verse was the leitmotif of Woronczak’s studies of the Biernat from Lublin’s writings (1958 [1993]).

While in controversy with the theses of Czech mediaevalists over the structure of the Old Czech versification in the Dalimil Chronicles (org. Staročeská Kronika tak Řečeneho Dalimila), Woronczak submitted the hypothesis that if one proceeded from the opening chapters of the chronicles towards its end, one would be able to observe the process of its development into prose, in that the structures of its versification and rhythm would gradually become less rigid (Woronczak 1963 [1993]). He maintained that the beginning fragments of the chronicles, which speak of the pre-Christian era, temporally remote and unknown to the annalist, would be versified in a more orderly manner. He explained this phenomenon through two causes. First, the beginning chapters may have contained quotations from a surviving oral literary tradition introduced into the text. One must remember that the majority of medieval texts were originally transmitted orally, these being easy to remember by their regular, formulaic structure, which served not only an aesthetic, but also, and perhaps foremost, a mnemonic function. Secondly, one could imagine that the author, writing of events remote in time and not familiar to the contemporary audience, might, as the need arose, alter the content to fit the linguistic form rather than the form to fit the content, making it in this way more splendid. The opposite situation would prevail in the last fragments, presenting contemporary events which have not yet been consolidated into an oral tradition and which demanded adherence to facts, the rules of correct versification being of secondary importance.

Woronczak conducted the verification of this hypothesis employing the test of runs, which is a technique which allows defining the degree of randomness of a numerical series. The data he used were the lengths of subsequent verses. The tests confirmed the agreement of the hypothesis with the structure of the Dalimil Chronicles.


If one were to consider the number of his publications as the only criterion in evaluating Jerzy Woronczak’s achievements in the field of quantitative linguistics, the result would be modest. His determination to promote his achievements in international journals was also, by today’s standards, too slight and not proportional to their scientific value. But do these strictly utilitarian measures embrace the totality of scientific output? Time has shown that the main distinguishing feature of Woronczak’s work is its depth, quality and originality. For in the overwhelming number of cases, the Professor was able to find the optimal point of balance at which philological and linguistic issues do not disappear in a thicket of mathematical formalism, but preserve their cognitive value and freshness even for the demanding specialists in the given discipline. And that is perhaps the last lesson which he taught his students.


  • Baudouin de Courtenay J.. 1927. Ilościowość w myśleniu językowym [Quantity as a dimension of thought about language]. In: Symbolae Grammaticae in honorem Ioannis (Jan) Rozwadowski v. 1. (Festschrift) Cracoviae: Gebethner & Wolff, 3–18. Reprint: Baudouin de Courtenay J.. 1990. Dzieła wybrane t. IV, 546–563 [Selected Writings, v. 4]. Warszawa: PWN.
  • Good I.J.. 1953. On the Population frequencies of Species and estimation of population parameters. Biometrika 40. 237–264.
  • Hammerl R.. 1989. Metoda wyodrębniania słownika minimum (na materiale słownika frekwencyjnego polszczyzny współczesnej) [A method of establishing the minimum dictionary of Polish (on the data from the frequency dictionary of contemporary Polish)]. Poradnik Językowy 1989, 614-628.
  • Hammerl R.& Sambor J.. 1993. O statystycznych prawach językowych [On statistical laws of language]. Warszawa: Polskie Towarzystwo Semiotyczne.
  • Kamińska-Szmaj I.. 1988. Części mowy w słowniku i tekście pięciu stylów funkcjonalnych polszczyzny pisanej (na materiale słownika frekwencyjnego) [[Parts of speech in the lexicon and text of five functional styles (genres) of written Polish (on the material of the frequency dictionary)]. Biuletyn Polskiego Towarzystwa Językoznawczego 41, 127-136.
  • Kamińska-Szmaj I.. 1989. Charakterystyka statystyczno-stylistyczna części mowy [Stylo-statistical characteristics of parts of speech]. Polonica 14. 87-120.
  • Kamińska-Szmaj I.. 1990. Różnice leksykalne między stylami funkcjonalnymi polszczyzny pisanej. Analiza statystyczna na materiale słownika frekwencyjnego [Lexical differences between the styles (genres) of written Polish. Statistical analysis based on the frequency dictionary of Polish]. Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego.
  • Kamińska-Szmaj I. (Ed.). 2004). Od starożytności do współczesności. Księga poświęcona pamięci profesora Jerzego Woronczaka [From antiquity to contemporary times. Festschrift for Jerzy Woronczak]. Wrocław: Wydawnictwo Uniwersyttetu Wrocławskiego. &
  • Kuraszkiewicz W.& Łukaszewicz J.. 1951. Ilość różnych wyrazów w zależności od długości tekstu [The frequency of different words as a function of text length]. Pamiętnik Literacki 42(1). 168–182.
  • Kurcz, I.; Lewicki, A.; Sambor, J.; Szafran, K.& Woronczak, J.. 1990. Słownik frekwencyjny polszczyzny współczesnej, t. 1-2 [Frequency dictionary of contemporary Polish, v. 1-2]. Kraków: PAN, Instytut Języka Polskiego.
  • Lewicki, A.& Sambor, J.. 1969. Projekt słownika frekwencyjnego współczesnego języka polskiego [The project of the frequency dictionary of present-day Polish]. Sprawozdania PAN 12/4, 90-103.
  • Lutosławski W.. 1897. The origin and growth of Plato's logic. London, New York, Bombay: Longmans, Green and Co. Reprint: 1983. The origin and growth of Plato's logic. Hildersheim: Georg Olms Verlag.
  • Mayenowa M.R.. 1965. Granice matematyzacji (w opisie wiersza) [The limits of mathematization (in verse description)]. Kultura i społeczeństwo 24. 170–173.
  • Ogrodniczuk M.. 2003. Nowa edycja wzbogaconego korpusu słownika frekwencyjnego [A new enhanced edition of the frequency dictionary corpus]. In: Gajda S. (Ed.). 2003. Językoznawstwo w Polsce. Stan i perspektywy [Linguistics in Poland. Its present state and perspectives.]. Opole: PAN, Komitet Językoznawstwa, 181–190.
  • Pawłowski A.. 1994. Ein Problem der klassischen Stilforschung: Die Stabilität einiger Indikatoren des Lexikonumfangs. Zeitschrift für Empirische Textforschung 1. 67–74.
  • Pawłowski A.. 1999a. Metodologiczne podstawy wykorzystania słowników frekwencyjnych w badaniu językowego obrazu świata [Methodological foundations for the use of frequency dictionaries in investigating the linguistic image of the world]. In: Pajdzińska, A. & Krzyżanowski, P.. Przeszłość w językowym obrazie świata: [Past in the linguistic image of the world]. Lublin: wyd. UMCS, 81-99.
  • Pawłowski A.. 1999b. The Quantitative Approach in Cultural Anthropology: Application of Linguistic Corpora in the Analysis of Basic Color Terms. Journal of Quantitative Linguistics 6/3. 222–234.
  • Pawłowski, A.& Sambor, J.. 2004. Jerzy Woronczak – twórca polskiej lingwistyki kwantytatywnej [Jerzy Woronczak – the founder of Polish quantitative linguistics]. In: Kamińska-Szmaj I. (Ed.). Od starożytności do współczesności. Księga poświęcona pamięci profesora Jerzego Woronczaka [From antiquity to contemporary times. Festschrift for Jerzy Woronczak]. Wrocław: Wydawnictwo Uniwersyttetu Wrocławskiego.
  • Sambor J.. 1971. Z zagadnień gramatyki w słowniku frekwencyjnym współczesnego języka polskiego [On the problems of grammar in the frequency dictionary of modern Polish]. Biuletyn Polskiego Towarzystwa Językoznawczego 29. 117-129.
  • Woronczak J.. 1960. Statistische Methoden in der Verslehre. In: Poetics – poetyka – poetika. Warszawa: PWN, IBL, 607–627.
  • Woronczak J.. 1962 [1993]. Wstęp filologiczny do Bogurodzicy [Philologica introduction to „Bogurodzica”]. In: M.R. Mayenowa. 1962. Liryka średniowiczna t.1. BPP, Seria A, 1. Worcław etc.: Ossolineum, 7–25. [Reprint in the volume: Woronczak J.. 1993. Studia o literaturze średniowiecza i renesansu [Papers on the literature of Middle Ages and Renaissance]. Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego, 76–94].
  • Woronczak J.. 1965a. Rytmika akcentowa sylabowca. [Accentual rhythm of the syllable verse]. In: Mayenowa, M.R. (red.). Poetyka i matematyka [Poetics and mathematics]. Warszawa: PIW, 72-78.
  • Woronczak J.. 1965b. Metody obliczania wskaźników bogactwa słownikowego [Methods of calculating indices of the lexical richness of texts]. In: Mayenowa, M. R. (red.). Poetyka i matematyka [Poetics and mathematics]. Warszawa: PIW, 145-165.
  • Woronczak J.. 1967. On an attempt to generalize Mandelbrot’s distribution. In: To Honor Roman Jakobson vol. II. The Hague: Mouton, 2254-2268.
  • Woronczak J.. 1976. O statystycznym określeniu spójności tekstu [On the statistical definition of test coherence]. In: Mayenowa M. R. (red.). Semantyka tekstu i języka [Semantics of text and language]. Wrocław: Ossolineum, 165-173.
  • Woronczak J.. 1993. Studia o literaturze średniowiecza i renesansu [Papers on the literature of Middle Ages and Renaissance]. Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego.
  • Woronczak J.. 1963 [1993]. Zasada budowy wiersza Kroniki Dalimila [The principle of construction of Dalimil Chronicles’ verse]. Pamiętnik Literacki 2. 1963. 469–478. Reprint in the volume: Studia o literaturze średniowiecza i renesansu [Papers on the literature of Middle Ages and Renaissance]. Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego, 67–75.
  • Woronczak J.. 1958 [1993]. Z badań nad wierszem Biernata z Lublina [The research of the Biernat from Lublin verse]. Pamiętnik Literacki 3. 1958. 97–118. Reprint in the volume: Studia o literaturze średniowiecza i renesansu: 139–156. [Papers on the literature of Middle Ages and Renaissance]. Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego.


Adam Pawłowski: Glottometrics 8, 2004, 79-89