Computing and Language Variation

Computing and Language Variation: International Journal of Humanities and Arts Computing Volume 2

David J. Bodenhamer
Paul S. Ell
John Nerbonne
Charlotte Gooskens
Sebastian Kürschner
Renée van Bezooijen
Copyright Date: 2009
Pages: 268
https://www.jstor.org/stable/10.3366/j.ctt1r1z9x
  • Cite this Item
  • Book Info
    Computing and Language Variation
    Book Description:

    Computing and Language Variation explores dialects and social differences in language computationally, examining topics such as how (and how much) linguistic differences impede intelligibility, how national borders accelerate and direct change, how opinion and hearsay shape perceptions of language differences, the role of intonation (melody), the differences between variation in pronunciation and vocabulary, and techniques for recognizing structure in larger collections of linguistic data. The computational investigations engage more traditional work deeply, and a panel discussion focuses on the opportunities and risks of pursuing humanities research using computational science. There is also an extensive introduction which attempts to sketch perspectives from which to approach the individual contributions.

    eISBN: 978-0-7486-4164-2
    Subjects: Language & Literature

Table of Contents

  1. Front Matter
    (pp. i-ii)
  2. Table of Contents
    (pp. iii-iv)
  3. FROM THE EDITORS
    (pp. v-vi)
    DJB and PSE
  4. Notes on Contributors
    (pp. vii-xii)
  5. Introduction: LANGUAGE VARIATION STUDIES AND COMPUTATIONAL HUMANITIES
    (pp. 1-18)
    JOHN NERBONNE, CHARLOTTE GOOSKENS, SEBASTIAN KÜRSCHNER and RENÉE VAN BEZOOIJEN

    The volume we are introducing here contains a selection of the papers presented at a special track on computational techniques for studying language variation held at The Thirteenth International Conference on Methods in Dialectology in Leeds on 4–5 August 2008. We are grateful to the conference organisers, Nigel Armstrong, Joan Beal, Fiona Douglas, Barry Heselwood, Susan Lacey, Ann Thompson, and Clive Upton for their cooperation in our organisation of the event. We likewise owe thanks to the referees of the present volume, who we are pleased to acknowledge explicitly: Agnes de Bie, Roberto Bolognesi, David Britain, Cynthia Clopper, Ken...

  6. PANEL DISCUSSION ON COMPUTING AND THE HUMANITIES
    (pp. 19-38)
    JOHN NERBONNE, PAUL HEGGARTY, ROELAND VAN HOUT and DAVID ROBEY

    This is the report of a panel discussion held in connection with the special session on computational methods in dialectology at Methods XIII: Methods in Dialectology on 5 August, 2008 at the University of Leeds. We scheduled this panel discussion in order to reflect on what the introduction of computational methods has meant to our subfield of linguistics, dialectology (in alternative divisions of linguistic subfields also known as variationist linguistics), and whether the dialectologists’ experience is typical of such introductions in other humanities studies. Let’s emphasise that we approach the question as working scientists and scholars in the humanities rather...

  7. MAKING SENSE OF STRANGE SOUNDS: (MUTUAL) INTELLIGIBILITY OF RELATED LANGUAGE VARIETIES. A REVIEW
    (pp. 39-62)
    VINCENT J. VAN HEUVEN

    In this paper we ask two questions, which superficially seem to ask the same thing but in actual fact do not. First, we ask to what degree two languages (or language varieties) A and B resemble each other. The second question is how well a listener of variety B understands a speaker of variety A.

    When we ask to what degree two language varieties resemble one another, or how different they are (which is basically the same question), it should be clear that the answer cannot be expressed in a single number. Languages differ from each other not in just...

  8. PHONETIC AND LEXICAL PREDICTORS OF INTELLIGIBILITY
    (pp. 63-82)
    CHARLOTTE GOOSKENS, WILBERT HEERINGA and KARIN BEIJERING

    Gooskens (2007) correlated lexical and phonetic distances with mutual intelligibility scores for the Mainland Scandinavian standard languages, Danish, Norwegian and Swedish. Subjects from different places in Denmark, Norway and Sweden listened to the two standard languages spoken in the neighbouring countries and linguistic distances were measured between the language varieties of the listeners and the test languages. In total there were 18 mean intelligibility scores and 18 corresponding linguistic distances. The distances were measured at the two linguistic levels that are generally taken to be most important for mutual intelligibility in Scandinavia, namely the lexical and the phonetic level (Delsing...

  9. LINGUISTIC DETERMINANTS OF THE INTELLIGIBILITY OF SWEDISH WORDS AMONG DANES
    (pp. 83-100)
    SEBASTIAN KÜRSCHNER, CHARLOTTE GOOSKENS and RENÉE VAN BEZOOIJEN

    Danish and Swedish are closely related languages within the North Germanic language branch. The two languages are mutually intelligible to such a high degree that in Danish-Swedish communication speakers mostly use their own mother tongues, a mode of communication termed semi-communication by Haugen (1966). In previous research it was shown that intelligibility scores correlate highly with global phonetic distances between the languages involved (cf. e.g. Beijering, Gooskens and Heeringa, 2008; Gooskens, 2007). Hence, linguistic factors play a major role in determining mutual intelligibility. Additionally, it is often assumed that attitudes and prior exposure to the variety in question are important...

  10. MUTUAL INTELLIGIBILITY OF STANDARD AND REGIONAL DUTCH LANGUAGE VARIETIES
    (pp. 101-118)
    LEEN IMPE, DIRK GEERAERTS and DIRK SPEELMAN

    When speakers of different languages or language varieties communicate with each other, one group (generally the economically and culturally weaker one) often switches to the language or language variety of the other, or both groups of speakers adopt a third, common lingua franca. However, if the languages or language varieties are so much alike that the degree of mutual comprehension is sufficiently high, both groups of speakers might opt for communicating in their own language variety.

    This type of interaction between closely related language varieties, which Haugen (1966) coins semicommunication and Braunmüller and Zeevaert (2001) refer to as receptive multilingualism,...

  11. THE DUTCH–GERMAN BORDER: RELATING LINGUISTIC, GEOGRAPHIC AND SOCIAL DISTANCES
    (pp. 119-134)
    FOLKERT DE VRIEND, CHARLOTTE GIESBERS, ROELAND VAN HOUT and LOUIS TEN BOSCH

    The Dutch-German state border south of the river Rhine was established in 1830. Before that time, the administrative borders in this region frequently changed. The Kleverlandish dialect area, which extends from Duisburg in Germany to Nijmegen in The Netherlands, crosses the state border south of the Rhine. The area is demarcated by the Uerdingen line in the south, the diphthongisation line of the West Germanic ‘i’ in the West, and the border with the Low Saxon dialects of the Achterhoek area in the North-East. The geographic details of the area can be found in Figure 1 (the state border is...

  12. THE SPACE OF TUSCAN DIALECTAL VARIATION: A CORRELATION STUDY
    (pp. 135-152)
    SIMONETTA MONTEMAGNI

    It is a well-known fact that different types of features contribute to the linguistic distance between any two locations, which can differ for instance with respect to the word used to denote the same object or the phonetic realisation of a particular word. Yet, the correlation between different feature types in defining patterns of dialectal variation represents an area of research still unexplored. In traditional dialectology, there is no obvious way to approach this matter beyond fairly superficial and impressionistic observations. The situation changes if the same research question is addressed in the framework of dialectometric studies, where it is...

  13. RECOGNISING GROUPS AMONG DIALECTS
    (pp. 153-172)
    JELENA PROKIĆ and JOHN NERBONNE

    Dialectometry is a multidisciplinary field that uses various quantitative methods in the analysis of dialect data. Very often those techniques include classification algorithms such as hierarchical clustering algorithms used to detect groups within certain dialect area. Although known for their instability (Jain and Dubes, 1988), clustering algorithms are often applied without evaluation (Goebl, 2007; Nerbonne and Siedle, 2005) or with only partial evaluation (Moisl and Jones, 2005). Very small differences in the input data can produce substantially different grouping of dialects (Nerbonne et al., 2008).Without proper evaluation, it is very hard to determine if the results of the applied clustering...

  14. COMPARISON OF COMPONENT MODELS IN ANALYSING THE DISTRIBUTION OF DIALECTAL FEATURES
    (pp. 173-188)
    ANTTI LEINO and SAARA HYVÖNEN

    Languages are traditionally subdivided into geographically distinct dialects, although any such division is just a coarse approximation of a more fine-grained variation. This underlying variation is usually visualised in the form of maps, where the distribution of various features is shown as isoglosses. It is possible to view dialectal regions, in this paper also called simply dialects, as combinations of the distribution areas of these features, where the features have been weighted in such a way that the differences between the resulting dialects are as sharp as possible. Ideally, dialect borders are drawn where several isoglosses overlap.

    As more and...

  15. FACTOR ANALYSIS OF VOWEL PRONUNCIATION IN SWEDISH DIALECTS
    (pp. 189-204)
    THERESE LEINONEN

    The traditional method of identifying dialect areas has been the so-called isogloss method, where researchers choose some linguistic features that they find representative for the dialect areas and draw lines on maps based on different realisations of these features. One problem with the isogloss method is that isoglosses rarely coincide, and a second is that the choice of linguistic features is subjective and depends on what the researcher chooses to emphasise. Dialectometric research has been trying to avoid these problems by aggregating over large data sets and using more objective data-driven methods when determining dialect areas (Séguy, 1973; Goebl, 1982;...

  16. REPRESENTING TONE IN LEVENSHTEIN DISTANCE
    (pp. 205-220)
    CATHRYN YANG and ANDY CASTRO

    The Levenshtein distance algorithm measures the phonetic distance between closely related language varieties by counting the cost of transforming the phonetic segment string of one cognate into another by means of insertions, deletions and substitutions. After Kessler (1995) first applied the algorithm to dialect data in Irish Gaelic, Heeringa (2004) showed that cluster analysis based on Levenshtein distances agreed remarkably with expert consensus on Dutch dialect groupings. In addition, Gooskens and Heeringa (2004) found a significant correlation between Levenshtein distance and perceived distance among Norwegian listeners (r = .67, r < .001), and Gooskens (2006) found an even stronger correlation with...

  17. THE ROLE OF CONCEPT CHARACTERISTICS IN LEXICAL DIALECTOMETRY
    (pp. 221-242)
    DIRK SPEELMAN and DIRK GEERAERTS

    An important assumption underlying most if not all methods of dialectometry is that the automated analysis of the differences in language use between different locations, as they are recorded by dialectologists in large scale surveys, can reveal patterns which directly reflect regional variation. In this paper, in which we focus on lexical variation, we want to address one factor, viz. concept characteristics, which we will claim complicates this picture.

    The argumentation which underlies our claim consists of three consecutive logical steps. As a first step, we analyse data taken from a large lexical database of Limburgish dialects in Belgium and...

  18. WHAT ROLE DOES DIALECT KNOWLEDGE PLAY IN THE PERCEPTION OF LINGUISTIC DISTANCES?
    (pp. 243-260)
    WILBERT HEERINGA, CHARLOTTE GOOSKENS and KOENRAAD DE SMEDT

    To what extent do subjects base their judgment of linguistic distances between dialects on what they really hear, i.e. on the linguistic phenomena available in the speech signal, and to what degree do they generalise from the knowledge that they have from previous confrontations with the dialects? This is the central question of the investigation described in this paper. The answer to this question is important to scholars who want to understand how dialect speakers perceive dialect pronunciation differences and may give more insight in the mechanisms behind the way in which linguistic variation is experienced. Our study is of...

  19. QUANTIFYING DIALECT SIMILARITY BY COMPARISON OF THE LEXICAL DISTRIBUTION OF PHONEMES
    (pp. 261-278)
    WARREN MAGUIRE

    In recent years considerable progress has been made in assessing the relationships between linguistic varieties by measuring the similarity between strictly comparable sets of phonetic data. In particular, measurement of Levenshtein Distance (see, for example, Nerbonne, Heeringa, and Kleiweg, 1999; Nerbonne and Heeringa, 2001; Heeringa, 2004) has proved useful for determining the relationships between closely related varieties, and the ‘Sound Comparisons’ method for assessing the distance between varieties provides a very promising alternative technique for looking into the changing relationships between closely-related and not so closely-related varieties (Heggarty, McMahon and McMahon, 2005; McMahon, Heggarty, McMahon and Maguire, 2007).¹

    Phonetic comparison...

  20. CORPUS-BASED DIALECTOMETRY: AGGREGATE MORPHOSYNTACTIC VARIABILITY IN BRITISH ENGLISH DIALECTS
    (pp. 279-296)
    BENEDIKT SZMRECSANYI

    The overarching aim in this study is to provide a methodological sketch of how to blend philologically responsible corpus-based research with aggregational-dialectometrical analysis techniques. The bulk of previous research in dialectometry has focussed on phonology and lexis (however, for work on Dutch dialect syntax see Spruit 2005, 2006, 2008, Spruit et al. t.a.). Moreover, orthodox dialectometry draws on linguistic atlas classifications as its primary data source. The present study departs from these traditions in several ways. It endeavours, first, to measure aggregate morphosyntactic distances and similarities between traditional dialects in the British Isles. Second, the present study does not rely...