Sequence Alignment

Sequence Alignment: Methods, Models, Concepts, and Strategies

Edited by Michael S. Rosenberg
Copyright Date: 2009
Edition: 1
Pages: 360
  • Cite this Item
  • Book Info
    Sequence Alignment
    Book Description:

    The sequencing of the human genome involved thousands of scientists but used relatively few tools. Today, obtaining sequences is simpler, but aligning the sequences-making sure that sequences from one source are properly compared to those from other sources-remains a complicated but underappreciated aspect of comparative molecular biology. This volume, the first to focus on this crucial step in analyzing sequence data, is about the practice of alignment, the procedures by which alignments are established, and more importantly, how the outcomes of any alignment algorithm should be interpreted. Edited by Michael S. Rosenberg with essays by many of the field's leading experts,Sequence Alignmentcovers molecular causes, computational advances, approaches for assessing alignment quality, and philosophical underpinnings of the algorithms themselves.

    eISBN: 978-0-520-94374-2
    Subjects: Ecology & Evolutionary Biology

Table of Contents

  1. Front Matter
    (pp. i-iv)
  2. Table of Contents
    (pp. v-vi)
  3. Contributors
    (pp. vii-x)
  4. Preface
    (pp. xi-xvi)
    Michael S. Rosenberg
  5. CHAPTER 1 Sequence Alignment: Concepts and History
    (pp. 1-22)

    Sequence alignment is a fundamental procedure (implicitly or explicitly) conducted in any biological study that compares two or more biological sequences (whether DNA, RNA, or protein). It is the procedure by which one attempts to infer which positions (sites) within sequences are homologous, that is, which sites share a common evolutionary history (see the section “Homology” in this chapter for more detail). For the majority of scientists, alignment is a task whose automated solution was solved years ago; the alignment is of little direct interest but is rather a necessary step that allows one to study deeper questions, such as...

  6. CHAPTER 2 Insertion and Deletion Events, Their Molecular Mechanisms, and Their Impact on Sequence Alignments
    (pp. 23-38)

    The alignment of biological sequences allows us to infer the evolutionary relationships between different genes and proteins. Most new genes and proteins will evolve either through insertions or deletions of sets of subsequences, or through point mutations, where one amino acid is replaced with another. Therefore, we can judge the evolutionary distance between related organisms by scoring the differences occurring between their protein and DNA sequences. In this chapter, we will focus on insertion and deletion events and explain how these events affect how we carry out and score sequence alignments.

    We will begin by looking at a sequence alignment...

  7. CHAPTER 3 Local versus Global Alignments
    (pp. 39-54)

    For a given set of input sequences, the overall goal of pairwise and multiple sequence alignment is to identify those parts of the sequences that are related to each other by common structure, function, or evolution. As with other bioinformatics approaches, computational methods for sequence alignment have to make a number of aprioriassumptions on the data to be analyzed, either explicitly or implicitly. For example, a basic assumption made by almost all alignment methods is that homologies between the input sequences, if there are any at all, appear in the same relative order in all sequences. Obviously, if...

  8. CHAPTER 4 Computing Multiple Sequence Alignment with Template-Based Methods
    (pp. 55-70)

    An ever increasing number of biological modeling methods depend on the assembly of an accurate multiple sequence alignment (MSA). These include phylogenetic tree reconstruction, hidden Markov modeling (profiles; HMM), secondary or tertiary structure prediction, function prediction, and many minor but useful applications, such as PCR primer design and data validation. Assembling an accurate multiple sequence alignment is not, however, a trivial task, and none of the existing methods have yet managed to overcome the biological and computational hurdles preventing the delivery of biologically perfect MSAs. These limitations combined with a growing reliance of biology on the computation of accurate MSAs...

  9. CHAPTER 5 Sequence Evolution Models for Simultaneous Alignment and Phylogeny Reconstruction
    (pp. 71-94)

    While it has long been recognized that the problems of multiple sequence alignment and phylogeny reconstruction are interdependent, they traditionally are tackled with different methodologies. The principles of statistical inference, be it Bayesian or based on likelihood maximization, form the sound foundation for the most widely used phylogeny estimation methods; yet, for sequence alignment, heuristic optimization of more or less arbitrary scoring schemes is still common. The main reason for this split may have been the lack of insertion-deletion models which are both realistic and computationally tractable. In this chapter, we give an overview of recent advances in modeling insertions...

  10. CHAPTER 6 Phylogenetic Hypotheses and the Utility of Multiple Sequence Alignment
    (pp. 95-104)

    Multiple sequence alignment (MSA) is not a necessary, but rather a potentially useful, technique in phylogenetic analysis. By this we mean that we can construct and evaluate phylogenetic hypotheses without MSA, and it may be productive in terms of time or optimality to do so. In order to evaluate this statement, we must first define phylogenetic hypothesis, define the problem, define the criteria we will use to assay the relative merits of hypotheses, define what we mean by the utility of a technique, and then finally compare the results of alternate techniques.

    In the following sections, each of these terms...

  11. CHAPTER 7 Structural and Evolutionary Considerations for Multiple Sequence Alignment of RNA, and the Challenges for Algorithms That Ignore Them
    (pp. 105-150)

    What Is It You are Trying to Accomplish with an Alignment?Some of the disagreement over alignment approaches comes from differences in objectives among investigators. Are the data merely meant to distinguish target DNA from contaminants in a BLAST search? Or is there a specific node on a cladogram you wish to test? Are you aligning genomes or genes? Are the data protein-coding, structural RNAs or noncoding sequences? Do you consider phylogenetics to be a process of inference or estimation? Would you rather be more consistent or more accurate? Are you studying the performance of your selected programs or the...

  12. CHAPTER 8 Constructing Alignment Benchmarks
    (pp. 151-178)

    Multiple sequence alignment is one of the most fundamental tools in molecular biology. It is used not only in evolutionary studies to define the phylogenetic relationships between organisms, but also in numerous other tasks ranging from comparative multiple genome analysis to detailed structural analyses of gene products and the characterization of the molecular and cellular functions of the protein. Many of these applications are discussed in detail in Chapter 11. The accuracy and reliability of all of these applications depend critically on the quality of the underlying alignments. Errors in the initial alignment will be propagated and further amplified in...

  13. CHAPTER 9 Simulation Approaches to Evaluating Alignment Error and Methods for Comparing Alternate Alignments
    (pp. 179-208)

    As this book demonstrates, sequence alignment is an important tool for biological research and may be used for a variety of purposes ranging from secondary structure identification (Coventry et al. 2004; Dowell and Eddy 2004 ; Holmes 2005; Knudsen and Hein 1999), noncoding functional RNA (ncRNA) detection (di Bernardo et al. 2004 ; Rivas and Eddy 2001), and phylogenetic inference. Generally speaking, the goal of multiple sequence alignment is to hypothesize site homology for a string of characters that represent evidence from data (DNA, amino acids, morphological data, etc.). While the goal is to hypothesize the correct or “true” site...

  14. CHAPTER 10 Robust Inferences from Ambiguous Alignments
    (pp. 209-270)

    Molecular sequence data have become an invaluable source of information for understanding evolutionary processes and for inferring evolutionary relationships between organisms. Molecular sequences provide a large number of separate characters (individual nucleotides, amino acids, or codons) that are easy to identify and distinguish. In addition, probabilistic models of how these molecular characters change over time allow us to estimate evolutionary process parameters and to quantify the evidence for evolutionary hypotheses. These parameters include phylogenetic trees, divergence times, insertion and deletion rates, and substitution rates. Probabilistic models of evolution also enable researchers to locate sequence motifs that are especially conserved or...

  15. CHAPTER 11 Strategies for Efficient Exploitation of the Informational Content of Protein Multiple Alignments
    (pp. 271-296)

    Life can now be considered as a complex system in which molecular agents are interconnected in space and time. The information for life is mainly stored and organized by stretched chains of chemical building blocks: four nucleotides for genes and, after gene translation, 20 amino acids to form proteins. Despite the recent developments highlighting the existence of codes and higher orders of organization at the genome level (Kepes 2003; Segal et al. 2006), proteins still represent the major mediator for life information management. Various levels of organization can be considered. First, the primary structure describes the number and arrangement of...

  16. References
    (pp. 297-332)
  17. Index
    (pp. 333-338)
  18. Back Matter
    (pp. 339-340)