A MDEL OF LANGUAGE EXTINCTION AND FORMATION1 Richard Roark This paper reports the attempt to construct a theoretical model of the process whereby languages, during the passage of time, expand into neighboring geographical areas, replace languages formerly spoken, and ultimately split into separate languages. In addition, a comparison is made of the data yielded by the model with empirical data derived2from Sydney M. Lamb's 1959 classifica- tion of North American Indian Languages. Three basic assumptions underlie the construction of this model. The most important is that there is a maximum possible area (MPA) which a language can occupy and still remain one language, though the exact size is not con- stant for all languages. Various forces of' linguistic divergence are continu- ally operating, and if expansion occurs beyond this MPA, contact between the members of the speech community is no longer sufficient to counteract these forces of divergence. Once the boundaries of the MPA have been exceeded, it is but a question of time before the various dialects of the original languages become mutually unintelligible. The period necessary for a language which has expanded beyong the MPA to split into daughter languages may be something like six centuries. Today the MPA of some languages, such as English, is very large, due to the efficiency of modern means of communication and transportation. This efficiency is, however, a historically recent phenomenon. During most of the past, transportation and communication have presumably been at a low and rather stable level, leading to the conclusion that the MPA must have been of moderate and roughly constant size. The second assumption is that for an area of continental dimensions the total number of languages has remained roughly constant for large periods of time (perhaps in excess of ten or fifteen thousand years). Indeed, up to a certain point in the past it may be that the number of languages spoken in a given large area increases.. The reason would be that the poorer means of communication between speech conmunities and within a single speech community would favor linguistic diversification. Whenever and however language originated it presumably conferred on its speakers advantages which helped them increase in numbers and occupy larger areas. It would seem likely that once language originated in, or was introduced to, a given continental area, the speakers expanded, exceeding the MPA limits in fairly rapid succession. As they did so, divergence would have set in and produced many daughter languages. Eventually most of the continent would have been occupied, so that expansion could take place only at the ex- pense of the language into whose territory expansion occurred. The expanded language would diversify while the other language would be displaced, or its speakers would learn the new language--or perhaps even be killed by the speak- ers of the expanded language. 86 At this point, extinction of a language would normally accomparny the formation of each new language, and the total number of languages would remain roughly constant, so long as the MPA remained roughly constant. In quite modern times improved means of communication have increased the limits of the MPA enormously. * few language-s have expanded at the e ense of many (English at the expense of many Amerindian languages, for example) and one might pre- dict that the next several centuries will witness a sizable decline in the num- ber of languages spoken in the world. The third assumption is that in any six century period there is some probability that a language will expand beyond the limits of the MPA. The actual value was initially assumed to be 50 percent that tentative expansion (to be explained in the operating rules) would occur. In order to understand the processes of language extinction and forma- tion it may be useful to consider one MPA at tVo separate points in t'ime. I. The language initially in the MPA may no longer be spoken. A. The speakers may be dead, B. The language may have been replaced by another language. C. The speakers may have moved elsewhere. (Note that such renoval would not alter the total number of languages spoken.) II. The language may still be one language. A. Its area may have contracted. B. Its area may have expanded without exceeding the MPA limit. C. Its area may have remained constant. (None of these,possibilities would alter the total.) III. The initial language may now be more than one language. Such diversifi- cation will presumably occur six centuries after the MPA limit is exceeded. Since the total number of languages is assumed to remain constant, IA + IB I In summary there seem to be two basic processes leading to language formation and two leading to language extinction. I. New languages may be formed by A. Spontaneous generation. A language ma;y appear with no prior glottogene. (By glottogene is meant a language viewed through time, rather than at an instant in time.) This process must have occurred at least once, but has never to this writer's knowl- edge been observed, and no provision for it has been made in the model. B. By diversification of languages which have expanded beyond the MPA limit. II. Languages may become extinct A. Because the speakers learn another language. This process will normally be due to intrusion of a foreign language. (IB) B. Because the speakers have all died. This process might or might not be due to intrusion of the speakers of a foreign language. 8.7 The model attempts to represent these processes by means of a series of squares, representing the MPA's of an entire continent. Most of these squares wnll contain a language; a few will be blank. Surrounding the conti- nent is a row of border squares which limits expansion to the confines of the continent. This border may be thought of as the ocean. Whenever a language expands from one MPA into another it replaces the previous language, and then diversifies to produce one or more new languages six centuries later. Di versi- fication also occurs when a language expands into a blank square, and in addi- tion a language chosen at random becomes extinct, so that the total remains constant. The model consists of 40 continental squares (8 x 5 on each side) sur- rounded by a row of border, or ocean squares. The squares are numbered (n) from 1-70. Initially 32 of the 40 continental squares are assigned a language; the other 8 are left blank. Each square bears a Replacement Number (RN) in the upper left hand corner and a Diversification Number (DN) in the lower right hand- corner. The RN of each ocean square is always 33. The RN's of other squares, and all DN's have two parts (a,b). Other components of the model are: 1. Expansion Probability Selector: 16 numbers, eight of which are zero. The others are -8, -7, -6, -1, 1, 6, 7, 8 (the numbers of squares contiguous to any given square). This selector provides an 8 out of 16 chance that tenta- tive expansion will occur into a given square during a six century period. 2. Random Extinction Selector: contains the numbers of all continental squares, allowing a language chosen at random to become extinct when expansion takes place into a blank square (numbers are 9-13, 16-20, 23-27, 30-34, 37-41, 44-48, 51-55, 59-62). 3. Randotn Extinction File (REF): room for the numbers of languages which become extinct at random (up to a maximum of elght). 4. Don't Use Again File (DUA File): sublists 1-32. Each sublist contains a number (d,u). Initially these numbers are 1, 0; 2, 0; 3, 0 . . . 32, 0. This file ensures that no language can be produced by diversification more than-once. 5. First Occurrence File (FOF): room for 32 numbers (F,O). This file keeps track of the first occurrence of a language in each six century period. (The first occurrence is arbitrarily defined as the one in the square with the lowest number.) The model starts at the beginning of a six century period with one language per MPA. Immediately after the beginning of the period expansion takes place, resulting in some languages occupying more than one MPA. By the end of the six century period divergence results in the formation of new languages (n-I for each n MPA's occupied by a single language). General Expansion and Diversification Rules 1. Expansion takes place from a square containing a language to a contiguous, non-ocean square, subject to certain restrictions. 2. Expansion takes place only once into a given square per six century period. 3. If-expansion attempts to take place from square A to square B and also from B to A during the same period, no expansion results. 4. Expansion into an occupied square results in the extinction of the lan- guage previously there. 5. Expansion into a blank square is followed by extinction of a language chosen at random. After expansion, languages which occupy more than one MPA diverge for the remainder of the six century period. At the end of this period they become separate languages. 6. When languages occupy more than one MPA the first instance of the language is not altered. Each succeeding instance becomes a new language. 7. Diversification may not result in the production of a language presently or previously in existence. Operating Procedure Assign RN of 33 to each ocean square. Assign RN 's of 1, 0; 2, 0. 32, 0 at random to 32 of the continental squares. Assign RN of 0, 0 to other eight continental squares. Assign DN of 0, 0 to all squares. Assign d, u; of 1, 0 to sublist 1 of DUA File; 2, 0 to sublist 2; etc. Place cards con- taining the numbers -8, -7, -6, -1, 1, 6, 7, 8, 0, O, O, O, 0, 0, O, 0 in the Expansion Probability Selector and shuffle. Place 40 cards, each containing the number of a continental square in the Random Extinction Selector and shuffle. Place 0, 0 in each of the eight consecutive cells of the Random Extinction File. Place 0, 0 in each of the 32 consecutive cells of the First Occurrence File. The model is now ready to operate; a picture of it is shown on the following two pages. O, 0 is indicated by a blank. START WITH SQUARE n w 9 and ask question ONE. 1. Can expansion tentatively occur into n? To find out, choose one of the cards in the Expansion Probability Selector. This card will have a number, q. Ask: is q ;ero? If yes, there will be no expansion. Increase n by one and ask 1. -If no, there may perhaps be tentative expansion. Ask 2. Is n ocean? (Is RN of n 33?) If yes, increase n by one and ask 1. If no, ask 3. Has n been previously stored in the Random Extinction File? (To find out examine the first number, r, in REF and ask*: is r a n?). If yes, increase n by one and ask 1. If no, increase r by one and ask 3. After r - 8 ask 82 33 33. square 1 2 7 1,0 2,0 4,o0 14,o 14 15,0 29,90 21,0 9,0 18,0 21 27,0 20,0 24,0 8,0P 25,0 28 5. ,0 10,0 32,0 17,0 16,90 28,0O 13,0O 22,0O 42 6,0o 7,0 26,90 149 23, 0 3,0 12,0 31,0 30,90 19,00 63 70 9D REF CELLS 1 - 8 DUA FILE sublists 1 - 32 FOF CELLS 1 - 32 I II I 'I a I A 1,l0 2,0 I , ' , L ~~~~~i I t r I 1'- rl-xI 91 4. Is n plus q ocean? If yes, increase n by one and ask 1. If no, ask 5. Is n plus q blank? (Is RNO, 0?) If yes, increase n by one and ask 1. If no, there will be tentative expansion (subject to later random extinc- tion) from n plus q to n, unless there has already been tentative expan- sion from n to n plus q. Ask 6. Has there been tentative expansion during this six century period from n to n plus q?- (If there has, the DN of n plus q will equal the RN of n.) If yes, no tentative expansion takes place. 0, 0 replaces DN of n plus q, nullifying the previous tentative expansion from n to n plus q. Increase n by one and ask 1. If no, tentative expansion occurs from n plus q to n. DN of n replaced by RN of n plus q. Ask 7. Was n blank when expansion took place into it? (If it was, its RN will be 0, 0 at this point.) If no, increase n by one and ask l1 If yes, choose number, x, at random from Random Extinction Sek ctor. Ask 8. Is square x blank? If yes, choose new x and ask 8. If no, store x in first zero cell in Random Extinction File. 0, 0 replaces RN and DN of x. (Language in square x becomes extinct.) Increase n by one and ask 1. After n = 62, zero replaces each number in REF. At this point each square into which expansion has occurred will have a DN NOT EQUAL to 0, 0. Let n a 9 and ask 9. Is DN of n 0, O? If yes, increase n by one and ask 9. If no, the new language replaces the old. DN of n replaces RN of n. O, 0 replaces DN. Increase n by one and ask 9. After n u 62, print picture of each continental square. All squares now have DN's of 0, 0. All occupied continental squares have RN's other than 0, 0. The same RN may exist in more than one MPA, indicating that a language has exceeded the MPA limits. The time is now just after the beginning of the six century period. This situation persists until, after six centuries of d'ivergence, those languages which have expanded beyond the MPA have become separate lan- guages. Let n - 9 and ask 10. Is n ocean? If yes, increase n by one and ask 10. If no, ask 11. Is n blank? 92 If yes, increase n by one and ask 10. If no, it must be determined whether this is the first occurrence of the language in n. Examine the number (F, 0) in the first cell (cell f) of the First Occurrence File and ask 12. Does F, 0 0, 0? if yes, this is the first occurrence of the language in n. Store RN of n in cell f of FOF. Increase n by one and ask 10. If no, this is perhaps not the first occurrence of the language in n. Ask 13. Does TNW-of n . F, O? (Note that RN is of the form a, b). If no, increase f by one and ask 12. If yes, this is not the first occurrence of the language in n. TReplace the number (d, u) in sublist a of the DUA File by d, u plus 1. Replace RN of n by d, u plus 1. Increase n by one and ask 10. After n - 62, 0, 0 replaces each number in the First Occurrence File. Print contents of each Continental square.- The time is now just at the end of a six century period. The b part of each RN has been increased by one for each occurrence of the RN after the first. Thus the a part of each RN indicates genetic relationship and the b part indicates the past history of a group of genetically related languages. For example, an RN of 1, 4 means that this is the fourth language in a language family ultimately derived from language 1, 0. The following pages illustrate the steps in one six century period. At the end of the period there were still 32 languages, but seven of the original 32 had become extinct, while seven had been produced by divergence from languages which expanded beyond the MPA limits. The 32 languages exist- ing,at this point go back to 25 ancestral languages which existed six cen- turies earlier. Continuing the process for a total of 60 centuries yields the follow- ing data: Time Ongmal a lottogenes Still Represented 0 32 6 25 12 17 18 14 24 14 30 13 36 11 42 11 48 9 54 9 60 8 Thus, viewing the situation from the present, the 32 existing languages are derived from 8 languages which existed 60 centuries ago, 13 languages which excisted 30 centuries ago, etc. 93 The remainder of this paper compares the data yielded by the model with data obtained from Sydney Lamb's Classification of American Indian languages, previously referred to. Dr. Lamb's classification of Amerindian languages attempts to set up genetic groups of roughly comparable time depth and roughly comparable internal diversity. Perhaps 300-400 languages were spoken in native North America at white contact; the present writer has chosen the figure 368, for computational purposes. The classification groups these languages into 109 families, of time depth 25 to 30 centuries; 61 stocks of time depth 45 to 50 centuries, and 23 orders of time depth 65 to 75 centuries. That is, the 368 languages spoken at white contact go back to 23 glottogenes 65 to 75 centuries ago, or 60 to 70 centuries before 1500 A.D But these 23 glottogenes did not become separate languages for perhaps six centuries more, so that the 368 languages are des- cended from 23 languages which existed 54 to 64 centuries ago. Letting x be centuries before 1500 A.D. and y number of languages from which the final 368 were derived: y x 368 0 109 14-19 61 34-39 23 54-64 (see graph one) An equation was derived to represent this trend by choosing the values of x which led to the most regular curve. y x 368 0 109 19 61 34 23 64 These values yield a curve, concave to the right, when plotted on semi-logarithmic paper. A straight line on such paper indicates that there is some period, p, each iteration of which will reduce the initial value of y by one half. Such a line represents an exponential equation, and is sometimes referred to as a constant growth law, Radioactive decay is an example: there is a period, 1600 years, which will reduce my amount of radium by half. But the shape of the curve obtained from the linguistic data indicates that the period of time needed to reduce the languages spoken at white contact by half, and this number in turn by half, is itself a slowly increasing variable. Two equations were derived: 368 368 y (2)antilog (.765 + .245 log x) (2ntilog (.763 log 05 (10.5 is the number of centuries needed to reduce the original 368 languages to 184; each iteration of this period is x divided by 10.5. Thus going back 94 in time reduces the 368 original languages to 184, which are part of a differ- ent 368 then spoken. Going back another 10.5 centuries reduces this second 368 to 184, but only reduces the first 184 to 114, not to 92. For further discussion, refer to the mathematics appendix.) This graph may now be compared with the results of the model. The model contained only 32 languages; that is, it was 11 and one half times smaller tha'n Native North America. Hence all the values it produced for number of languages need to be multiplied by 11.5. Languages Produced 32 25 17 14 14 13 11 11 9 9 8 (See graph two) 11.5 x Languages by Mode1 Produced 368 288 196 161 161 150 127 127 103 103 92 Languages Read f rom Gr 368 250 170 115 88 72 58 4z8 37 32 23 There is similarity in shape between the theoretical and empirical curves. The overly high values produced by the model indicate that the tenta- tive expansion probability figure of 50%o is too low. Another trial was made with three zeroes in the Expansion Probability Selector, rather than eight. The following results were obtained. Time 0 6 12 18 24 30 36 42 48 54 60 (See graphs 3 and 4) Languages Produced 11.5 x Languages Produced br Model xy Mode 1 32 368 22 253 21 241 16 184 15 172 12 138 11 127 - 10 105 8 92 7 81 6 69 These figures agree more closely with the empirical data. It is to be expected that the same expansion probability figure will yield a smaller number of languages with a larger model, since the proportion of ocean squares to continental squares is less. That is, a smaller percentage of the total continental squares suffers the disadvantage of having expansion )Time 0 6 12 18 24 30 36 42 48 54 60 time = 0 33 33 square 1 2 7 1,0 2,0 4,0 14,00 114 15,0 29,0 2190 90, 10890 2 1 27,0 20,0 24,O 8,0 25,0 28 5,?O 1090 32,0 17,0 16,6 28,0 15,0 22,0 [ 0 4 1~6,,o T-O 1 s 26,90 1 149 23,0 3,0 _ 12,0 31,0 30,0 19,0 63 70 96 time = 0+ expansion occurs square 1 2 7 l4,0 14,0 15,0 29,00 90 18,08.0 21 27,0 20,0 244,0 8,0 25,0 271, 28 5,0 10,0 32,0 117,0 20,0j 10,00 16,o0 28T90 5i2- 2,0 (R~EF) 142 28,0o 32,90 6,0 7,0 26,0 149 28,0 22,0 23,0 3,0 12,0 31,0 56 3,90 7,90 31,40 30,0 19,0 23I0j ! | a63 70 97 square 1 2 7 4,0 4,0 4,90 14,0 l 15,90 29,0 9,00 118J l0 21 27,0 27,0 21,0 8,0 8,0 28 5l l |-20,0 10,0 o17,0O 28,O 32,0 22,0 142 6,0 28,0 22,0 149 5,o 0 3,0 12,0 7o0 31,0 30,0 23,0 19,0 11,0 63 70 98 t ime = 0 - 6 expansion complete, slow-divergence time = 6 divergence has resulted in new languages, square 1 2 7 l4,0 '4,1 4,2 14,90 1)4 15.,0 290 9s0 18,0 21 27,0 27,1 21,90 8So0 8,1 28 5,0 2090 1090 1790 28,0 32,90 22,0 '42 .6,0O 28,$1 22,11 '49 3,0 3,1 12,0 7,0 31,0 30,0 23,0 19,0 11,0 63 70 99 rendered impossible from three or four contiguous squares by the presence of the ocean. The Rules for operating the model have been written as a series of yes or no questions in order to facilitate eventual writing of a computer program which would enable the IBM 704 to run the model with any desired number of languages. Nothing more complicated than a two way program branch would need to be employed, and it is hoped that ultimately the program can be completed. ENDNOTES 1. I am greatly indebted to Sydney Lamb for his insight and patience during the preparation of this paper, and to Dell Hymes for his critical review. Any faults remain'ing should be attributed to my own stubbornness. 2. Sydney M. Lpmb, Some Proposals for Linguistic Tsxonomy, Anthropological Linguistics 1:33-49 (1959). 3. Sydney M. Lamb, personal communication. BIBLIOGRAPHY Lamb, Sydney M. 1959 Some Proposals for Linguistic Taxonomy, Anthropological Linguistics l:33-49. WATHE}ATICS APPENDIX For computational purposes the following values of x (centuries, counting backward from 1500 A.D.) and y were chosen, as they gave the most regular possible curve on the graph: y (languages x (centuries 368 0 109 19 61 34 23 64 (see graph 5) These values plotted on semilog paper yield a curve concave to the right. A straight line would represent a simple exponential equation, in which there is some period, p, which reduces the value of y by 1/2. But the shape of the durve indicates that this period is a variable which increases with time. From the graph interpolations can be made to find the time needed for successive reductions of the initial value of y by 1/2: 100 y x n 368 184 92 46 23 368 2n 0 10.5 23.5 )42.5 64.0 (n) (average) 0 1 2 3 4 n avg P + 2+ n; - 10.5; P2 ' (23.5 - 10.5) - 13; etc. x n Hence, p'avg ii n Is fln = x andpavg were rithmic paper. x Pavg 368 and y ( x p (2)av found to be nearly in a straight line when plotted on loga- Hence log pavg a a + b log x or pavg - antilog (a + b log x) P N log p I; X u log x avgl Pavg 10.5 12.75 1)4.17 16.0 x 2 1.045 1.880 2.647 3.259 B13 2(P) - na + bZ(X); Z(XP) - ay(X) + x 10.5 (see graph 6) 23.5 42.5 6400 n is four cases bZ(X2 ) 368 x an tilog (.765 + o245 log x) (2) .' hence a - .765; b - .2)45 Pavg ' antilog (.765 + .245 log x) This equation represents the trend trustworthy for small values of x; of the curve moderately well, but is not less than five, for example. It will be noted that each iteration of the period 10.5 will reduce y by a successively smaller percentage. lp1 p l.02-2 10 106 1.151 1.204 .43 x 1.022 1.371 1.628 1.806 5.827 XP 1.045 1.528 1.872 2.175 6.620 Y = iterations 1. (10o5) 2. (21.0) 3. (3l,5) 4. (42.0) 5;. (52.5) 6. (63.0) ya 368 - y 184 114 76 48 35 24 2exponent 21 21.69 22.31 22.91 23.38 23.*94 Plotting the exponent of 2 against iterations of 10.5 yields a straight line on logarithmic paper (see graph 7). Hence log exponent = a + b log iterations. Let E be log exponent and I log iterations. I 000 .301 477 .602 .699 .778 2.557 EI 0.0 .0686 .1733 .2795 .3700 L4640 - 1*3554 I2 0.0 .0907 .227 .363 n - 6 cases *489 .606 1.7757 :(E) a na + bZ(I); I(EI) a aZ(I) + bX(I2) a a 0; b a .762. Hence exponent - anti 368 (2)antilog (.763 log T5*5) ilog (.7621 log iterations) 1,02 E 000 .229 .363 o464 e 529 .596 10 CENTURIES 20 0 10 20 30 40 50 60 CENTURIES 1Q3 AL N G U A G E S L A N G U A G E S 70 40 3 2 I 10 Graph 2 6 loO 8 60- Cv - aph (I 30 .'N 24 --- 11 0 70 30 40 60 1400 300 200 ___ __I 100 _ _ _ _ 80 60 ho 1_____ n___o r EE 30 20- eu ju 4L Graph 6 5 6 7 5 3 2 lo5- exp 1 ou Graph 7 1 ~~~~~~~~~~~~~~~~~~~~~ . . X1 I I I I I l/ 2 3 45 6 7 iterations 104 L A N G U A G E S u IV CENTURIES 'V 5 14 3 2 1-5- Pavc 1 L 1 x . ~ ~ ~ ~ ~ ~ ~~~~~~~~~ - - - . - . . . . . . I I I , I I . . . . I It It , 6 16 )L) n' 3