Lexico-statistical studies in Khoisan II/I: how to make a Swadesh wordlist for Proto-Tuu
Lexico-statistical survey of the Tu language family, during which the Swadesh list for Pra-Tu is reconstructed. The study of points concerning the internal classification of the languages of Tu. Problems related to the diachronic study of Tu languages.
Рубрика | Иностранные языки и языкознание |
Вид | статья |
Язык | английский |
Дата добавления | 20.02.2022 |
Размер файла | 68,8 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
National Research University Higher School of Economics
Lexicostatistical studies in Khoisan II/1: How to make a Swadesh wordlist for Proto-Tuu
George Starostin
Moscow / Santa Fe Institute
Abstract
The paper is the first in a planned two-part series, whose main goals are to conduct a general lexicostatistical survey of the Tuu, or South Khoisan, family of languages; to reconstruct a reliable approximation of the Swadesh wordlist for Proto-Tuu; and to clarify certain as of yet unresolved issues about the internal classification of Tuu languages. In the first part of the study, I survey the main data sources, identify the main obstacles to historical reconstruction in the Tuu domain, and make observations on some aspects of Tuu diachronic phonology. The main bulk of the paper is actually represented by the Appendix, in which I attempt to reconstruct the equivalents of the first 50 Swadesh list items for the three intermediate nodes of the Tuu family (Proto-!Ui, Proto-Nossob, and Proto-Taa).
Keywords: South Khoisan languages; Tuu languages; click languages; lexicostatistics; basic lexicon; onomasiological reconstruction.
Аннотация
Г. С. Старостин. Лексикостатистические исследования по койсанским языкам II/1: к вопросу построения списка Сводеша для пра-ту языка
В статье, представляющей собой первую из двух частей исследования, представлены результаты общего лексикостатистического обзора языковой семьи ту (= южнокойсан- ской семьи), в ходе которого частично реконструируется список Сводеша для языка пра-ту и разъясняется ряд сложных моментов, касающихся внутренней классификации языков ту. В настоящей публикации представлен краткий обзор источников, перечислены основные методологические проблемы, связанные с диахроническим изучением языков ту, и приведены комментарии относительно исторической фонологии этих языков. Большую часть статьи занимает Приложение, в котором дается попытка реконструкции первых 50 элементов из списка Сводеша для трех промежуточных узлов семьи ту (пра-!ви, пра-носсоб и пра-та).
Ключевые слова: южнокойсанские языки; языки ту; щелчковые фонемы; лексикостати- стика; базисная лексика; ономасиологическая реконструкция.
Introduction
Of all the different linguistic lineages commonly united under the umbrella term of “Khoisan”, the Tuu family (originally = Dorothea Bleek's “Southern Bushman” and Joseph Greenberg's “Southern Khoisan”, see Gьldemann 2005a) shares certain unique properties which simultaneously make it one of the most important and one of the most difficult groupings for any comparative-historical analysis of the Khoisan-speaking area. First, although the overall number of known Tuu languages is smaller than the respective number for Khoe (Glottolog, following Gьldemann 2018, currently recognizes 8 different units See glottolog.org/resource/languoid/id/tuuu1241. as compared to 13 for Khoe See glottolog.org/resource/languoid/id/khoe1241.), observed grammatical and lexical differences between these languages on the average exceed those observed between the various members of Khoe. Thus, lexicostatistical calculations show that, although the lowest observed percentages of matches within the Tuu family (e.g. 42% between IXam and !Xoц) are comparable to the respective lowest percentages within Khoe (e.g. 41% between Nama and Kxoe), the internal branching of Tuu on the whole is deeper and more complicated than the internal branching of the two major subfamilies of Khoe (Khoekhoe and Kalahari Khoe; see Starostin 2013: 355, 407 for particularities). Among other things, this implies more possibilities for various important diachronic discoveries during the reconstruction of Proto-Tuu, hardly imaginable for Proto-Khoe because of the relatively young age of both of its constituent branches In fact, the divergence between some of the members of the Tuu family is so impressive that concerns have been voiced in the past about whether one may consider the common ancestry of all its members as an established fact (see e.g. Westphal 1962, 1971). As of today, however, there seems to be a general consensus among all specialists working on Khoisan that the observed phonetical, lexical, and grammatical correlations between all the small sub-branches of Tuu are best interpreted in terms of genetic relationship rather than contact (Gьldemann 2005b). In this paper, I proceed from the assumption that this relationship has already been safely established and there is no need for special additional validation, allowing us to properly concentrate on issues of reconstruction and internal classification..
Second, based on scrupulous phonetic documentation and phonological analysis of those Tuu languages which have survived into the modern age (namely, !Xoц and N|uu), this family has emerged as featuring one of the most complex sound systems in the entire Khoisan area. Thus, all known Tuu languages share no fewer than five types of click influxes, including the rare labial type 0 (outside Tuu, it is only encountered in the fHцд language of the Ju-fHцд, or Kx'a, family), and at least some Tuu languages have more than 15 phonologically contrasting types of click accompaniments, a number unmatched by any Khoisan language outside of that particular family. Understanding the reasons which underlie this staggering complexity may provide an important insight into the evolutionary mechanisms of click systems in general, yet such an understanding is impossible to gain without a thorough diachronic study of the Tuu family as a whole.
On the negative side of things is the fact that, unlike Khoe, the Tuu family is nearly extinct. The only survivors, as has already been mentioned, are the small dialectal cluster of !Xoo (Taa) and N|uu, and even the latter is moribund and has, in fact, up until recently been considered completely extinct (Sands et al. 2007). All of the data that we have on the other languages come from older sources, stretching across about 150 years of ethnographic and linguistic research and widely varying in phonetic and semantic accuracy. Some of these data collections are quite comprehensive, such as Wilhelm Bleek's and Lucy Lloyd's archive on |Xam; other doculects are less lucky, being represented by ultra-short grammatical sketches and minimal wordlists. What is common for most of them, however, is the general unreliability of phonetic notation, grammatical analysis, and semantic glossing -- implied by a lack of consistency between recordings of the same language by different researchers (quite often, even by the same researcher over an extended period of time) and by comparison with more recent and more accurate notations by newer and more experienced generations of scholars.
Similar problems are encountered with other Khoisan groupings as well, since data on both Ju (North Khoisan) and Khoe (Central Khoisan) languages often come from the same researchers as data on Tuu (Lucy Lloyd, Wilhelm and Dorothea Bleek, etc.). However, very few languages belonging to either of these stocks are exclusively represented by archaic and unreliable data; and even when they are, they usually have very close linguistic relatives with more recent and/or more accurate descriptions against which the questionable data may be crosschecked (e.g. certain Central and Southern sub-dialects of Ju against Ju|'hoan, or the extinct !Ora against its much more prominent neighbor Nama). By contrast, Tuu languages such as |Xam or ||Xegwi, while certainly not linguistic isolates per se, are still quite separate and distinct linguistic units, and cross-checking their data with, for instance, the modern phonetic and lexical descriptions of N|uu would be like trying to ensure the correctness of one's transcription of Czech or Polish by comparing it with Russian (while also having a very vague understanding of the historical phonology and lexicology of Slavic languages in general).
Subsequently, without access to more and better data (which is hardly likely, given the alleged extinction of most of those languages) our ability to properly and definitively reconstruct both the phonological system of Proto-Tuu and its lexical inventory is severely limited, and many problems will likely remain forever unresolved. Nevertheless, approximations are still possible, and any attempt to disentangle the complex web of genetic connections and areal interactions between Tuu languages and their other Khoisan neighbors is liable to shed at least some light on important diachronic processes, some of which may have even chronologically preceded the arrival of Bantu speakers into the area.
In this two-part paper, the next in an ongoing series on Khoisan lexicostatistics, I set up the challenge of conveying a full lexicostatistical survey of those Tuu languages which can actually be used for this purpose, as well as reconstructing Swadesh proto-wordlists for the three major linguistic clusters which constitute this family (!Ui, Nossob, and Taa) and ultimately for Proto-Tuu itself. A first attempt at Tuu lexicostatistics has already been published in Starostin 2013, along with provisional Proto-!Ui and Proto-Taa (but not Proto-Tuu) reconstructions for the 50-item “ultra-stable” subset of the Swadesh wordlist; this publication includes the revised and corrected results of that lexicostatistical survey and expands the reconstructions to include the Swadesh wordlist in its entirety.
The main bulk of both papers will constitute of appendices, containing specific comments on individual Swadesh items (due to volume limitations, the wordlist will be split in two). As for the theoretical parts, the first paper will briefly outline the data sources, the methodology, and the main issues concerning phonetic and lexical reconstruction; the second will deal with the actual internal classification of Tuu and give a brief analysis of the reconstructed proto-wordlists.
The data
Of the eight units currently listed in Glottolog as distinct Tuu languages, relatively complete Swadesh wordlists may be assembled for five, but their respective quality varies significantly depending on the age and thoroughness of the source(s). Additionally, while data from such languages as ||Ku||e, ||Kxau, and others are clearly insufficient to include them in any statistical calculations, they may still be relevant for purposes of etymological study and reconstruction. Below I list first the principal languages (and/or dialects) included in the statistical procedure, and then the list of auxiliary sources which will be consulted in the process of reconstructing wordlists for Proto-!Ui, Proto-Taa, and Proto-Tuu.
I Xam
Sources. This formerly widespread language became largely extinct even prior to the extensive field research of Dorothea Bleek in the first half of the 20th century; most of our knowledge on its grammar and lexicon comes from the archival records of Wilhelm Bleek and Lucy Lloyd, many of which were originally published in Bleek & Lloyd 1911 and later included into D. Bleek's comparative dictionaries (Bleek 1929, 1956).
Dialects. Considering the overall expanse of the territories formerly populated by |Xam speakers and the fact that Bleek and Lloyd worked with a variety of informants (from Achterveld, Katkop, Strandberg, and other locations), dialectal diversity within the language must have been quite notable. However, precise differentiations are impossible without a meticulous study of the entire assembled text corpus. Lexicostatistical analysis of the data shows that there are relatively few Swadesh items transparently represented by two or more synonyms which could be thought of as representing different dialects; as for observed phonetic variation, it is not always clear when it should be ascribed to dialectal diversity or simply errors in transcription. For the purposes of the current study, we treat the entire Bleek-Lloyd corpus as a single “doculect”, while admitting that this is somewhat of a provisional simplification.
Quality. Transcription accuracy is always dubious, especially concerning the system of click accompaniments (see Traill 1995 for insightful comments on how to interpret various elements of Bleek and Lloyd's transcription system for |Xam). Semantic glossing is frequently questionable as well, but at least in many cases it may be checked against the large assembled text corpus.
N|| ng - N| uu
Sources. This is a large dialectal cluster which, unlike |Xam, is represented by several very distinct doculects from sources widely varying in space and time. This means that, for lexicosta- tistical purposes, it is possible and recommendable to build as many as three distinct word- lists: B.1 = ||Ng!ke (the dialect originally described by Dorothea Bleek; data published in Bleek 1929, 1956, and later separately in Bleek 2000), B.2 = jKhomani (the dialect originally described in Doke 1936 and Maingard 1937), B.3 = N|uu (the recently rediscovered variety spoken by several informants, with lexical data published in Crawhall 2004, Sands et al. 2007, Miller et al. 2009, Collins & Namaseb 2011; a complete Swadesh wordlist was kindly provided for the purposes of this study by Bonny Sands). For all of these dialects put together, we reserve the common name of Nllng as suggested in Gьldemann 2017: 95.
Dialects. Unlike |Xam, the various attested dialects of this “macro-language” show quite a bit of lexical differentiation, though it is often difficult to tell how much of it is due to inaccurate semantic glossing, how much (especially in the case of N|uu) to very recent borrowings from other languages, and how much to gradual linguistic divergence after the original split of “Proto-N||ng”; for these reasons, as well as the relative incompleteness of the joint Doke/Maingard wordlist for jKhomani, any statistical discrepancies should be viewed with extreme caution.
Quality. Rather predictably, modern N|uu is one of the best transcribed representatives of Tuu; importantly, transcription quality in Doke 1936 and Maingard 1937 also seems superior to D. Bleek's data (thus, both sources consistently mark the palatal click f, which in most cases remains undistinguished from alveolar ! in Bleek's records). Semantic glossing is assumed to be accurate for modern N|uu and can sometimes be checked against actual text examples for N||ng and jKhomani.
| Xegwi
Sources. This language, geographically somewhat isolated from the rest of the !Ui continuum, is represented by at least three significantly different doculects, marked respectively as: (a) ||Xegwi-B -- the earliest records collected by D. Bleek and published in Bleek 1929, 1956 (in her description the language is usually referred to as Batwa, a local Bantu term); (b) ||Xegwi-Z -- as described by D. Ziervogel in a brief grammar sketch (Ziervogel 1955); (c) ||Xegwi-LH -- as described by L. W. Lanham and D. P. Hallowes in two short papers (Lanham & Hallowes 1956a, 1956b).
Dialects. Judging by attested phonetic and lexical differences between the three doculects, a certain degree of dialectal diversity must have been present among | Xegwi speakers. However, lexicostatistical discrepancies between the three sources are minimal (1-2 entries between | Xegwi-Z and | Xegwi-LH; slightly more between each of these and | Xegwi-B, possibly because of less accurate semantic glossing in Bleek's earlier records). Given the incompleteness of the sources (for | Xegwi-Z and | Xegwi-LH, data have to be extracted from grammar sketches and short text examples rather than actual vocabularies), it makes sense to merge them in one wordlist.
Quality. Transcription quality seems to be surprisingly adequate in the case of ||Xegwi-LH: for instance, Lanham and Hallowes are among the first scholars to actually denote the presence of uvular phonemes and click accompaniments in any Khoisan language. Therefore, all data from | Xegwi-B and | Xegwi-Z, wherever possible, needs to be cross-checked against | Xegwi-LH.
I 'Auni
Sources. This language, which used to represent the westernmost spread area of Tuu, is known exclusively from records by Dorothea Bleek (Bleek 1937; lexical data also printed in Bleek 1929, 1956).
Dialects. Some dialectal variety may be identified from Bleek's records, as the equivalents for various meanings occasionally differ between the earliest ones, collected in 1911 and partially published in 1929, and the later ones, collected in 1936 and published in Bleek 1937 and Bleek 1956. It is, however, often difficult to establish whether these discrepancies (around 4-5 of them are found in items on the Swadesh list) represent true dialectal variation or inaccurate semantic glossing on the part of the researcher. Additionally, it is unclear if there are sufficient grounds to count the idiolect to which Bleek refers to as “Khatia” or “Xatia”, a very small amount of data for which were also collected by her in 1911 and published in Bleek 1956, as anything other than a minor sub-dialect of I'Auni. Finally, the occasional decision to regard I'Auni and |Haasi (see below) as dialects of a single language (e.g. in Glottolog 4.4) is hardly correct due to extremely significant lexical and grammatical differences between the two (e.g. around 20 mismatches on the Swadesh list).
Quality. Transcription quality is generally typical of D. Bleek's recordings for other Khoisan languages; external comparison raises serious doubts about the accuracy of click efflux transcription and slightly less serious about the same for click influxes.
I Haasi
Sources. This variety of Lower Nossob is solely known from records made by Robert Story of data from a single informant, Kabala (or Tatabesa), at the same Tweerivieren camp in 1936 where D. Bleek's data on I'Auni were collected; some of the I Haasi data were later published as part of Bleek 1956, but the complete manuscript did not officially see the light of day until Anthony Traill managed to rediscover and edit it for publication (Story 1999). Naturally, there is no dialectal variety to speak of here, but, as mentioned above, neither is there any reason to regard |Haasi as a bona fide “dialect” of I'Auni; both speech forms, as already noted by Traill in his preface to Story 1999, are more closely related to each other than to any other form of Tuu, yet both clearly have to be treated as different languages.
Quality. Although, in his own words, Story was a “complete amateur” and had no formal training in phonetics (Story 1999: 10), the overall quality of his transcription, at least at a rough glance, seems to be no worse than D. Bleek's or almost anybody else's at the time (e.g. he seems to have had a good ear for distinguishing between the palatal and alveolar clicks, with which quite a few other Khoisanologists seem to have struggled back then). The accuracy of his semantic notation can usually be confirmed by specific texts and phrases found in the manuscript. The worst problem is the scarceness of material: thus, as many as 40 Swadesh items cannot be recovered from extant data, which makes it impossible to offer reliable glottochro- nological datings for the moment of separation between I'Auni and |Haasi. That said, IHaasi data are of vital importance for attempting to at least partially reconstruct the basic lexicon of Proto-Nossob and, in turn, Proto-Tuu itself.
Taa (!Xoц, Kakia, NI u || en)
Sources. Precisely three different varieties of Taa allow for the construction of more or less representative Swadesh wordlists. First and foremost among them is !Xoц (more precisely, Lone
Tree !Xoц) as represented in Anthony Traill's now-classic and extensive dictionary of this particular dialect (Traill 1994, 2018). The other two are much older, dating back to D. Bleek's brief research on the language of the “Masarwa” (a generic pejorative Bantu term for the San) of Kakia in 1913, and on the language of the N|u||en of Nausanabitz in 1920 (most of the data were subsequently published in Bleek 1929 and Bleek 1956). Both of these speech varieties seem to have become extinct and, to the best of my knowledge, are not directly identified with any of the still living dialectal varieties of !Xoц (such as described, e.g., in Naumann 2014); concerning the latter, although some research has been carried out on them, no significant amounts of lexical data have been published to allow for a proper lexicostatistical comparison between them and Traill's Lone Tree !Xoц.
Dialects. Although all the three varieties of Taa for which it is possible to produce more or less complete Swadesh wordlists show up to about 20% of lexical discrepancies in these word- lists, which would, under normal circumstances, clearly speak of them as three different languages, the widely varying quality of recorded data does not allow to take these discrepancies at face value: Bleek herself admits that data on Kakia and N|u||en were collected in haste, and the probability of semantic and lexical inaccuracies in her records is fairly high. It is, therefore, possible that ultimately these two varieties are not nearly as distant from !Xoц proper as are !Xoц's own 20 or so sub-dialects, tentatively classified in Naumann 2014 on the basis of some phonetic and grammatical isoglosses observed over the course of a general survey. In any case, at this time a detailed lexicostatistically based phylogeny of Taa languages and/or dialects is impossible due to lack of data; a tentative reconstruction of the Swadesh wordlist for Proto-Taa, based on all available evidence, is, however, somewhat within reach.
Quality. Lone Tree !Xoц expectedly boasts the highest quality of phonetic (and probably semantic) accuracy among all South Khoisan languages, possibly second only to N|uu (for which, however, published data are far more limited) -- all due to the extensive research of Anthony Traill. Nevertheless, the huge discrepancy between the quality of Traill's data and everything else should not lead anyone into the fallacy of conflating Traill's Lone Tree !Xoц with Proto-Taa itself, at least not when lexical reconstruction is involved. In terms of phonetics, there is little, if anything, that data from Kakia or N|u||en could contribute in light of Traill's clearly superior, and extremely detailed, description of Taa phonology (comparison with Bleek's data shows plenty of unrecognized phonetic features and a lot of mistakes in the transcription of even the basic click influxes). But from a purely lexical point of view, there is no reason to a priori consider the Lone Tree !Xoц equivalent for a particular meaning as more archaic than the corresponding Kakia or N|u||en equivalent whenever the two (or three) are clearly etymologically different from each other.
Other !Ui languages
Data from the following languages, unquestionably identifiable as separate linguistic units belonging to the !Ui group, may and should be used for etymological purposes (including reconstruction of Proto-!Ui basic lexicon) but is generally unusable for lexicostatistical goals, making a precise identification of their respective position on the !Ui tree somewhat difficult:
| Kxau (small grammatical sketch, a few phrases, and a very short vocabulary in Meinhof 1929; all lexical data reprinted in Bleek 1956);
||Ku||e (a small amount of lexical data collected by D. Bleek and published in Bleek 1956);
“Seroa” and “!Gд!ne”, both represented by short, old, and phonetically dubious collections of lexical data by T. Arbousset, C. F. Wuras, and H. Anders (all data reprinted in Bleek 1956).
Forms from some of these doculects will occasionally be quoted below, specifically as additional etymological support for particular reconstructions, but no systematic conclusions about their historical phonology or classification details shall be drawn.
Methodology
For the sake of this paper, I proceed from the following historical assumptions:
all of the languages listed above are genetically related within a single “Tuu” family;
all of those languages may be definitively and uncontroversially divided into no fewer and no more than three separate clusters -- !Ui (|Xam, Nllng, ||Xegwi); Nossob (I'Auni, |Haasi); and Taa (!Xoц and all of its dialects as well as Bleek's Kakia and N|u||en), each of these representing the result of divergence from its own intermediate protolanguage.
Convincing evidence for both of these assumptions, including (partial) regular phonetic correspondences and numerous sets of lexical and grammatical isoglosses, has been presented in numerous sources, from Bleek 1956 and Westphal 1962 to more modern research (e.g. Hastings 2001, Gьldemann 2005b, Starostin 2008), and alternate scenarios, such as trying to explain similarities between !Ui and Taa as a result of areal convergence (a possibility not ruled out by such notable “splitters” in the field of Khoisan studies as E. O. J. Westphal), are unlikely and generally unwarranted.
What remains much less clear is the degree of relationship of these three clusters to each other, or even of some of the individual languages within these clusters to each other. While certain elements of consensus between the various classification schemes offered by researchers do emerge, such as, e.g., the understanding that |Xam and Nlng are closer to each other than to ||Xegwi, a particularly tricky issue rests with the |'Auni-|Haasi cluster, commonly referred to today as the “Lower Nossob”, or simply “Nossob”, languages. Here at least three conflicting schemes have been put forward:
E. O. J. Westphal (1971: 381) directly groups this cluster with the Taa languages, using the term “Taa” for the entire agglomeration; furthermore, as has already been mentioned, he has forever remained skeptical about the idea of a genetic connection between Taa and !Ui;
Oswin Kцhler (1981: 469) counts the Nossob languages as a part of !Ui, considering them all related to Taa (which he calls “non-!Ui”) on a deeper level; this classification scheme has generally become more popular than Westphal's until recently;
Tom Gьldemann (2014) has partially reverted to Westphal's model, arguing for a closer affinity between Nossob and Taa while at the same time not denying that both are ultimately genetically related to !Ui. His arguments are based on a number of lexical and grammatical isoglosses, as well as a strongly supported observation that the similarities between Nossob and !Ui are exaggerated because of extensive areal contact between |'Auni and Nlng (involving elements of bilingualism).
Out of these three, Gьldemann is the only author who has actually published detailed linguistic argumentation in favor of his hypothesis, which may be one reason why it is currently accepted as the default phylogenetic scheme for Tuu in Glottolog. Nevertheless, due to the scarceness and sometimes dubious quality of the data, using selective lexical and grammatical arguments in this kind of linguistic investigation (the way it is done in Gьldemann 2014) may not be totally free of bias, and it would be reasonable to take a more holistic approach to the matter, if at all possible. This is why an overall lexicostatistical survey, focusing on attested core basic lexicon for all the languages involved, would be a very useful addition to Gьldemann's methods of classification; and in the event of it producing different phyloge- netic results from Gьldemann 2014, analyzing the reasons for such a discrepancy could shed new light on both the historical relations between the various Tuu languages and the methodology of phylogenetic classification as a whole.
The actual results of an initial, preliminary survey based on 100-item Swadesh wordlists for all the languages listed above have already been published in Starostin 2013: 355; they showed that, although cognacy percentages between the Nossob languages and the various !Ui languages sometimes drop to around 46-48%, they are still consistently a little higher than the average percentages between Nossob and Taa, speaking in favor of Kohler's older classification rather than Gьldemann's. However, there is a way to both correct and refine those results and make them more visually transparent by shifting from direct comparison of attested languages to comparing reconstructed wordlists -- for Proto-!Ui, Proto-Nossob, and Proto-Taa, respectively. Condensing lexical evidence from a dozen languages into the shape of evidence from just three reconstructed proto-languages would be useful in helping clear away the “chaff” of identifiably recent innovations and borrowings, and would also make it easier to focus on the analysis of specific lexical isoglosses between the three branches in order to figure out which ones may have more weight for phylogenetic classification.
The general methodology for reconstructing proto-wordlists of the Swadesh type was already described in detail in several of this author's previous publications (Starostin 2013: 153183, Starostin 2016) and, from a substantial point of view, needs no major modifications when applied to available Tuu material. Most of the specific challenges encountered along the way are of a technical nature -- namely, scarceness and phonetic / semantic inaccuracies in the source data. These can sometimes be neutralized through careful scrutiny, but on the whole, of course, it should be well understood that the presented results are only as good as the data that currently support them, and are liable to change with each new significant publication of an additional data source (although, unfortunately, this is not likely for most of the languages involved in this study).
An important tripartite distinction could be introduced between reconstructions, pseudoreconstructions, and zero reconstructions for each of the Swadesh items within each of the three subgroups. For the wordlist appendix below, the following rules are observed.
A reconstruction, marked with an asterisk, is generated when cognates are attested in at least two separate doculects which do not represent close sub-dialects of a single language. In the case of !Ui, this means that the word has to be encountered at least in two out of three main clusters (|Xam; ||Ng!ke - jKhomani - N|uu; ||Xegwi), or, failing that, at least in one of them + one or more supporting languages whose data are not eligible for lexicostatistics (e.g. an isogloss between |Xam and ||Ku||e, or between N|uu and ||Kxau). Technically speaking, since |Xam and Nllng are closer to each other than to ||Xegwi, this does not allow to formally equate a “Proto-|Xam- Nlng” reconstruction with “Proto-!Ui” in the absence of a clear cognate in | Xegwi; however, considering the scarceness of | Xegwi data, we do not really have the luxury of downplaying |Xam - Nlng isoglosses, and for the sake of this particular phylogenetic study it seems reasonable to go along with a slightly broader understanding of “Proto-!Ui”.
Accordingly, in the case of Nossob languages “Proto-Nossob” is understood as the common invariant of cognates in |'Auni and |Haasi; in the case of Taa “Proto-Taa” is understood as an isogloss between !Xoц and either Kakia or N|u||en (or all three).
Pseudo-reconstructions can sometimes be substituted for actual reconstructions for both lexicostatistical and etymological purposes. Thus, if out of all the languages belonging to one of the three main subgroups of Taa, the Swadesh item in question is only attested in one language, and the form itself is not transparently identifiable as a recent morphological derivation or borrowing, there is a more-than-zero chance that it might actually be a direct reflex of the protoitem (a very common situation for Nossob languages, where available data on I'Auni are much more extensive than data on IHaasi, see ashes, bark, belly etc. below); naturally, this chance is increased even further if the form has credible external cognates in any of the other two branches.
If there are two or more non-cognate forms for the same equivalent in different languages and it is impossible to make a sound judgement on which one is the lexicostatistical archaism and which ones are the innovations, it is permissible to count them all as “technically synonymous” pseudo-reconstructions (see, e.g., belly or big in the !Ui list below), in the sense that each of them has a comparable chance of having expressed the required Swadesh meaning in the proto-language (this is more credible than the idea of “absolute” synonymy in the protolanguage, with each daughter language retaining only one of the several earlier synonyms). Again, discovery of a potential cognate for one of these “technical synonyms” on the external level of comparison drastically increases its chances and almost (but not quite) raises the item's status from pseudo-reconstruction to actual reconstruction.
Finally, zero reconstructions -- implying, among other things, that this particular item has to be excluded from lexicostatistical calculations -- appear whenever the required item is either not found at all in any of the languages, or, if found in any of them, is transparently identifiable as a recent innovation or borrowing. In the list below, there are very few genuine zero reconstructions, since most of the Swadesh items are found to have some sort of equivalent in at least some of the discussed languages; the biggest problem is with a very small bunch of concepts whose “near-universality” does not properly apply to Tuu realities (e.g. fish, notably absent in the area, or leaf, seemingly a difficult concept for Tuu speakers which is usually expressed by borrowings).
Regarding the highest level of reconstruction (Proto-Tuu), we consider any Swadesh item to be formally reconstructible for Proto-Tuu if it is reconstructible in the exact same Swadesh meaning for both Proto-!Ui and Proto-Taa. The lower level reconstructions may be pseudoreconstructions, i.e. an isogloss between |Xam and !Xoц (or even an isogloss between |Xam and the far less reliably attested Kakia or N|u||en on the other end) may be taken as strong evidence for a Proto-Tuu reconstruction, unless there are additional obstacles to this interpretation (e.g. both forms may be easily interpreted as recent borrowings from a Khoe source). The Nossob languages, with their phylogenetic status not yet clearly resolved, are currently not very telling: it is extremely important to spot all the exclusive !Ui-Nossob and Taa-Nossob isoglosses, yet directly equating them with Proto-Tuu is impossible before the final conclusions on their position on the genealogical tree of Tuu languages.
With all possible Proto-!Ui, Proto-Nossob, Proto-Taa, and ultimately Proto-Tuu reconstructions on hand, the natural advantage is that it shall be much easier to not only calculate the distances between the specific branches, but also to analyze the possible classification alternatives in terms of individual shared archaisms and innovations, reducing the overall evidence to a small, but objectively attained, number of truly diagnostic etymologies. These results will be presented in the second part of the paper.
Notes on phonetic reconstruction in Tuu
Considering how much emphasis has been placed (and will continue to be placed) on the word “reconstruction” in this paper, some clarifications must be made about how we actually understand this term when applied to Tuu data. At the present state of our knowledge about Tuu languages as a whole, it is extremely difficult, if not downright impossible, to rigorously and rigidly apply the classic Neogrammarian methodology in order to elicit fully regular pho- netic correspondences between the phonemic systems of these languages -- mainly due to the relative scarceness of data from most of them, and to the generally poor transcription quality of those languages which are indeed represented by relatively large corpora (like |Xam or I'Auni). There is plenty of phonetic similarity between them, and there are enough recurring patterns of correspondences to usually (though not always) recognize etymological cognates, but a highly detailed system of correspondences which would fully cover all the subsystems of the complicated Tuu phonologies (click influxes, click effluxes, non-click consonants, vowels and their secondary features, tones, etc.) and reduce them to a parsimonious and typologically credible Proto-Tuu inventory at best requires a much huger research effort than is currently possible, and at worst may turn out to be objectively unreachable.
Nevertheless, even at this stage it is possible to operate on the level of what might be called “lax” reconstructions, along lines already suggested for Tuu languages in Starostin 2008, 2013. What this means is separating the phonological units of Tuu into categories which are found, based on comparative evidence, to be generally both more stable from a historical perspective and more consistently transcribed from a notational perspective -- and those which seem to be more fluent over time, as well as less easily defined by inexperienced field workers. “Lax” reconstructions might then latch on to the more reliably established correspondences for the first category, while offering reasonable approximations (for instance, bluntly based on the majority rule) for the second. Such half-way reconstructions are always amendable if more high quality data come along or additional recurring patterns are confirmed statistically, but even without this they can still serve as proper historical evidence, provided that at least a certain “sound skeleton” has been recovered for them based on Neogrammarian-type correspondences.
According to my observations, the generally stable parts of phonological inventories in Tuu can be defined as (a) click influxes; (b) non-click consonants, especially in word-initial position; and, to a slightly lesser degree, (c) main root vowels (not including vocalic codas, correspondences between which are often chaotic, possibly because they represent variable morphological add-ons). The least stable parts, in addition to vocalic codas, are tones (if only because prosody is not marked consistently and reliably in any of the older sources) and click effluxes -- which often show tremendous variation not just between different languages, but even between closely related dialects or sub-dialects of the same language. Below I adduce several important notes on each of these subseries, additionally referring the reader to my earlier and more detailed, but also sometimes outdated, observations on the comparative phonology of Tuu as published in Starostin 2008.
Click influxes. Correspondences between these segments are more often than not regular and trivial (one-to-one), but there are some important exceptions. The principal correlations are listed in Table 1; for some extra details (largely irrelevant when applied exclusively to the 100-item wordlist) see Starostin 2008: 365-370.
Table 1. Principal correspondences between click influxes across major Tuu languages
|Xam |
Nllng |
||Xegwi |
I'Auni |
!X6ц |
||
*0 |
0 |
0 |
0 |
0 |
0 |
|
*| |
I |
I |
I |
I |
I |
|
*1 |
! |
1 |
A / s |
1 |
1 |
|
*! |
! |
! |
0 |
! |
! |
|
*1 |
II |
II |
II |
|||
*C |
! |
1 |
! |
1 |
1 |
Notes.
Labial click (*0): see Starostin 2008: 366 on several examples where labial clicks in Taa may correspond to lateral clicks in !Ui, perhaps indicating secondary labialization. It is still unclear whether this correspondence is truly regular or if all the listed examples are just accidental resemblances; in any case, none of them are relevant to the data subset of the 100-item wordlist.
Dental click (*l): see ibid. on such specific correspondences as Taa *\q(')- = !Ui *c(')- and Taa *'\n- = !Ui *d-. Examples for these are somewhat more reliable than for (1), but, once again, they are only encountered outside the Swadesh wordlist.
Palatal click (*i). This is the least stable of all click influxes in Taa, and it deserves more detailed commentary. First, in such extinct languages as IXam and (maybe) some of the dialects of Nllng, such as the Bleek-transcribed ||Ng!ke, it seems to have merged with the alveolar click (*t ^ !), see below examples such as DOG, ear, egg etc.). The reason why I suspect it must have been a real diachronic development rather than a simple transcriptional error is that there are quite a few entries in IXam which have been transcribed, both by Wilhelm Bleek and Lucy Lloyd, with an initial \- (cf. |Xam f'enn 'to know', fa 'to kick', \xoa 'elephant' etc.), but many, if not most, of them look like relatively recent borrowings from a Khoe source Bonny Sands suggests that the loans may have come specifically from Korana (Sands 2014: 13).. This would imply that after the original palatal click had shifted to a different manner of articulation (perhaps merging with the alveolar click or becoming so close to it as to become indistinguishable for the early scholars of Khoisan In this respect, it may be instructive to recall a typological parallel in which the original Ju (North Khoisan) palatal click *f has shifted to a retroflex articulation (!!) in Ekoka !Xun (Kцnig & Heine 2001: 22-23), already after the original retroflex click *!! had merged with lateral *|| in that same dialect. Could something of the sort actually have taken place in some of the now-extinct Tuu languages?), it may have been reintroduced into the language/s/ along with lexical loans from their Khoe neighbors.
Second, in | Xegwi the palatal click undergoes a unique development, shifting toward a non-click lateral affricate articulation. The regular development seems to be *f ^ К (see DOG, ear, EGG below), but occasionally post-alveolar fricative reflexes (c, s) are observed as well; this seems to happen when the click has a uvular efflux (cf. N|uu fqфл 'short' = ||Xegwi-Z cwe id.; N|uu fqhoe 'wind' = ||Xegwi-LH swe: id.). Unfortunately, scarceness of available IXegwi data prevents us from being able to fully describe the picture here, which must have been typologically somewhat similar to the well-studied behavior of palatal clicks in Eastern Kalahari Khoe languages (Vossen 1997: 285-288).
Alveolar click and lateral click (*!, *l). Both of these are typically quite stable, but the alveolar click undergoes seemingly regular deletion in IXegwi as well (*!ui 'person' ^ IXegwi kwi, etc.), again, parallel to similar developments in Kalahari Khoe.
The “sixth click influx” (provisionally marked as *з for lack of a better idea The symbol з is actually borrowed from Clement Doke's ingenious, but forgotten alphabet for click consonants, where it was reserved for the unvoiced alveolar click (now commonly marked as !).). This reflects the unusual, but seemingly recurrent correspondence “IXam ! : Nlng I : IXegwi ! : (?) Nos- sob i : Taa i”, established on the data of several basic items on the Swadesh list (bone, one, red, also foot in !Ui) as well as additional basic lexicon (e.g. the root for 'female breast / milk', listed in Starostin 2008: 368). The evidence for this extra influx is not overwhelming, but too strong to be brushed away as a mix of accidental lookalikes and incorrect transcriptions; in particular, given the regular deletion of the plain alveolar click in I Xegwi, it is the only way to account for those cases in which IXegwi lexical items still feature the alveolar click (and cannot be explained away as borrowings). Postulation of a phonologically distinct sixth click influx for Proto-!Ui and Proto-Tuu would make these protolanguages typologically unique (no living or attested extinct Khoisan language has more than five), but not theoretically impossible; more work on available material is necessary to understand whether the observed correspondence should be truly traced back to a separate phonological contrast or whether it may be explained by a conditioned split.
Click effluxes. Very few Tuu languages can be said to have adequate descriptions of their complicated click accompaniment systems. The best ones have arguably been produced by Traill for !Xoц (up to 19 different effluxes per influx), Miller et al. for N|uu (up to 10 different effluxes per influx), and by Lanham and Hallowes for ||Xegwi (up to 7 different effluxes per influx). Even these descriptions may not be completely accurate and finalized in terms of recognized contrasts, and observed correspondences between different languages are by no means trivial.
Our current “lax” strategy on the matter is simple: for Proto-!Ui and Proto-Taa, unless there is a very strong individual argument about the secondary nature of these effluxes, we provisionally accept the efflux in N|uu and in !Xoц (respectively) as representing the protostate -- simply because any discrepancy between these languages and the earlier described ones may be theoretically attributed to incorrect transcriptions in older sources (where the same word may very often be found transcribed in multiple variants with different click effluxes). If this tactical decision somehow contradicts the majority rule, i.e., for instance, the Nluu click efflux is not the same as the efflux in the majority of other !Ui reflexes, such a situation deserves detailed individual analysis It should be kept in mind that click efflux articulation in Tuu, as well as other Khoisan languages, may occasionally be correlated with secondary features of vowel articulation, such as nasalization, pharyngealization, glottalization, and breathiness -- both “genuinely” (if vocalic articulation exerts assimilative influence on the efflux, or vice versa) and “virtually” (if, in one of the less than accurate sources, a vocalic feature is transcriptionally mistaken for a back closure release, or vice versa). Unfortunately, secondary vocalic features are quite inconsistently marked in older sources..
Non-click consonants. A staggeringly low percentage of either Proto-!Ui or Proto-Taa Swadesh items are reliably reconstructible with a word-initial non-click consonant (approximately 14-15 items on the Proto-!Ui wordlist and 18-20 items for Proto-Taa), which goes to show how thoroughly integral click phonemes are to these languages (for comparison, the corresponding number for Proto-Ju, even though Ju languages have the second most complex inventory of click phonemes after Tuu, is no fewer than 35 items out of 100). This does not mean that the Proto-Taa system of non-click consonants was necessarily modest -- Traill lists more than 40 such consonants for !Xoo, of which only very few can be reliably proven as secondary - but it does mean that the issue of an accurate reconstruction of this sub-system for Proto-Taa is not particularly relevant for our current task.
Phonemes encountered in basic lexicon items include *t- (hear, lie), *k- (all), *s- (bite, come, fat), the ejective velar affricate *fa'- (drink), and the alveolar affricates *j- (fly) and *c (eye), though for these last two phonemes evidence is more marginal and problematic. Correspondences for the others are largely trivial (arguably the most serious phonetic change is from *t- to palatal *з- in Nllng), though see notes on bite for a possible affricativization scenario for *s- in certain contexts. Not a single complex consonantal cluster, such as *th', etc., is reconstructible for this particular subset of the basic lexicon in any of the daughter branches of Tuu.
Vowels and codas. Reconstruction of the Proto-!Ui, Proto-Taa, and especially Proto-Tuu systems of vowels and vocalic/consonantal syllabic codas is extremely difficult due to huge amounts of variation, which should be attributed not only to phonetic change (or pseudo- phonetic change, reflecting inaccurate transcription) but also to morphological variation, as the exact same nominal, adjectival, or verbal root may frequently be encountered in different languages (or even within the same language) in combination with different suffixal components -- noun class markers, agreement morphemes, or various other clitical elements fused with the root and not recognized as separate morphemes.
The main vowels in Tuu languages, as follows from reliable modern data on Nluu and !Xoц, are typically restricted to three unrestricted phonemic units (a, o, u), occurring freely and frequently after any consonants; and two highly restricted units (front vowels e, i), whose occurrences after click phonemes are exceedingly rare, but who are somewhat more frequently met after non-click phonemes. The original picture may have been more complicated, as there are numerous cases in which the vowel a in Taa corresponds to either e or o in !Ui languages (see examples in Starostin 2008: 372); it is still unclear if such situations reflect additional original phonemes (such as *Ј and Ъ) or the results of phonetic contraction of different morphological variants (for a good example, see notes on fire below).
The precise inventory of Proto-Tuu codas (i.e. second morae of nominal and verbal word forms, which are often morphologically detachable even on the synchronic level, or may be shown to have been fossilized through external comparison) cannot be determined at the moment; on the whole, relatively few bimoraic sequences may be reliably reconstructed by comparing !Ui, Nossob, and Taa data. Given the fact that only !Xoo yields itself relatively well to detailed morphophonological analysis (in Nluu, most of the old derivational morphemes seem to have lost their productivity, and data on all other languages are antiquated and unreliable), reconstruction of nominal and verbal morphological elements for Proto-Tuu may turn out to be an even more challenging task than the reconstruction of its click system. Consequently, in the current paper, the emphasis is always on checking whether a bisegmental (initial click or non-click consonant + main vowel) sequence may be identified as the original root morpheme for Proto-!Ui, Proto-Nossob, Proto-Taa, and, ultimately, Proto-Tuu: by default, discrepancies between codas are provisionally written off as reflecting morphological variation, either already present on the Proto-Tuu level or arising independently in one or more branches after the split of the proto-language.
Подобные документы
Language is the most important aspect in the life of all beings. General information about Proto-Indo-European language. Proto-Indo-European phonology. Comparison of modern languages of origin. All words about family, particularly family members.
курсовая работа [30,2 K], добавлен 12.12.2013Today it is quite evident that everyone should know at least one foreign language. Knowing one or more foreign languages makes it possible to get acquainted with different ways of thinking, to understand a new civilisation.
топик [5,4 K], добавлен 13.05.2002The description of languages of Canada — a significant amount of languages of indigenous population, immigrants and dialects arising in Canada and hybrid languages. English and French languages are recognised by the Constitution of Canada as "official".
презентация [750,5 K], добавлен 27.11.2010The great diversity of opinion among the well-known domestic and foreign phoneticists in question on allocation of the main components of intonation. Functions and lexico-grammatical structure of intonation in English and in Ukrainian languages.
реферат [17,8 K], добавлен 29.04.2013In the world there are thousands of different languages. How indeed modern English is optimum mean for intercourse of people of different nationalities. Knowledge of English is needed for the effective teaching subsequent work and improvement of our life.
сочинение [13,7 K], добавлен 11.02.2009Study of lexical and morphological differences of the women’s and men’s language; grammatical forms of verbs according to the sex of the speaker. Peculiarities of women’s and men’s language and the linguistic behavior of men and women across languages.
дипломная работа [73,0 K], добавлен 28.01.2014The old Germanic languages, their classification and principal features. The chronological division of the History of English. The role of the Wessex dialect. The Norman Conquest and its effect on English. The Germanic languages in the modern world.
контрольная работа [34,7 K], добавлен 17.01.2010Comparing instructed and natural settings for language learning. Natural and instructional settings. Five principles for classroom teaching. The principle getting right from the beginning. The principle of saying what you mean and meaning what you say.
дипломная работа [54,3 K], добавлен 10.07.2009Comparative analysis and classification of English and Turkish consonant system. Peculiarities of consonant systems and their equivalents and opposites in the modern Turkish language. Similarities and differences between the consonants of these languages.
дипломная работа [176,2 K], добавлен 28.01.2014An analysis of homonyms is in Modern English. Lexical, grammatical and lexico-grammatical, distinctions of homonyms in a language. Modern methods of research of homonyms. Practical approach is in the study of homonyms. Prospects of work of qualification.
дипломная работа [55,3 K], добавлен 10.07.2009