This book introduces a mathematically naïve reader to those statistical tools which are applicable in modern quantitative text and language analysis, and does this in terms of simple examples dealing exclusively with language and literature.

Front Matter Front Matter (pp. iiv) 
Table of Contents Table of Contents (pp. vviii) 
Preface Preface (pp. ix2)B. B. 
Mathematical preliminaries Mathematical preliminaries (pp. 317)The reader is assumed to have already obtained at least a limited mathematical background before he attempts to read this book. This background should consist of a knowledge of elementary theory of sets, some knowledge of the properties of numbers, and an intuitive grasp of the concept of limit. The major definitions and results from these areas, which will be used without mention, are outlined below.
We have attempted to provide a treatment that can be understood without a prior knowledge of the calculus. However, some technical arguments involving the calculus are included, mainly for those who have some knowledge...

1 Reduction of numerical data 1 Reduction of numerical data (pp. 1851)The word statistics can be construed in a number of different ways. We shall be concerned here particularly with two of these meanings: (1) the collecting, organizing, summarizing, and analysing of quantitative information; (2) a set of techniques for drawing inferences and generalizations from small collections, calledsamples,to larger collections, calledpopulations,using the mathematical theory of probability.
Meaning (1) is the subject of this chapter, and (2) will be discussed in some detail in later chapters. The distinction between (1) and (2) can be made manifest by calling the statistics defined in (1)descriptive statisticsand that defined...

2 Introduction to probability 2 Introduction to probability (pp. 5285)The student of language is often concerned with populations made up of stretches of speech, composed of a small inventory of atomic items (phones, morphs, words, etc.), which vary through time and geographical space. The researcher is interested in making inferences about the nature and the distribution of variates defined on these populations, and in answering such questions as: when are the distributions of two such variates essentially the same? with what degree of certainty can we say their distribution is the same or different? Indeed, language researchers, like other social scientists, are faced with the problem of weighing evidence...

3 Random variables 3 Random variables (pp. 86136)Earlier we encountered the notion of a variate, examples of which were
w=no. of nountokens in a paragraph/no. of wordtokens in the paragraph
in section 1.1,
Y = no. of different Chinese characters in 50 characters taken from theTao Teh Ching
in example 1 of section 1.1, and
Z = no. of articles in a 50word passage
in example 2 of section 1.6. These and nearly all the other variates we have considered and shall consider in this book have certain properties definable in terms of probability.
Take for example Z, the number of articles in a 50word passage...

4 Estimation 4 Estimation (pp. 137153)The notions of sample and population were introduced in section 1.1. You will remember that a population is a set of objects to be observed. It may be a finite set like the set of inhabitants of a certain area, or a countably infinite set like the set of possible natural numbers corresponding to the first success in an infinite sequence of Bernoulli trials, or an uncountably infinite set like the set of points between 0 and 1, the possible values of a point selected at random from the interval [0, 1]. Our general interest has centred on the values...

5 Hypothesis testing 5 Hypothesis testing (pp. 154216)Astatistical hypothesisis a statement about the nature of the distribution of a random variable. The purpose of this section is to explain how such hypotheses can be tested.
In example 2 of section 2.4, we have already treated one such statistical hypothesis (due to A.S.C. Ross) and indicated how such a hypothesis might be tested. There we gave Ross’s argument to the effect that the hypothesis that two languages, L_{1}and L_{2}, say, are not genetically related is equivalent to the following hypothesis:
(H)If, in a set of N IndoEuropean roots, n_{1}have a cognate in L_{1}...

6 Some more extended studies 6 Some more extended studies (pp. 217268)In this chapter we shall treat some specific problems in more detail than we have done up to the present. This will allow us to synthesize the material developed in previous chapters as well as introduce one or two new topics based upon it.
We shall concentrate on three main areas: variations among pronouns and articles in literary texts, syllable counts in literary texts, and the use of the binomial distribution in lexicostatistical studies.
I should like to begin by making some general remarks on methodology. First, it goes without saying that what we observe and the manner in which...

References References (pp. 269272) 
Index Index (pp. 273276)