We also examined the possibility that specific categories of genes could support conflicting phylogenetic hypotheses but we find little evidence for a relationship between gene ontology and species topology. Finally, we examined the support for several competing alternative topologies pertaining to the position of the ctenophores by estimating and comparing their marginal likelihoods using stepping stone integration and Bayes factor analysis [ 23 , 24 ]. In summary, we report analysis of the largest number of characters to be applied to metazoan phylogeny to date. We recover a phylogeny that is broadly consistent with the recent view of metazoan phylogeny [ 12 ].
All of the concatenated analyses and locus-selection experiments reported here support the hypothesis of the Ctenophora as sister to the other metazoan species. While support for this node varies depending on the subset of data analyzed, it is consistent across analyses and is strongly supported by the Bayesian test of topological hypotheses.
Our results strongly reject the Coelenterata hypothesis that places cnidarians and ctenophores in a monophyletic group, or an arrangement placing sponges and ctenophores in a monophyletic group. Our study illustrates an optimized workflow for future analyses of hundreds or thousands of taxa represented by whole genome data and our user-friendly source code is freely available.
We retained individual alignments of putative orthologs following orthology prediction, removal of spurious sequences, alignment, and trimming see Methods. In total, our data partitions are enriched for gene ontology GO terms across the molecular function, cellular component, and biological process categories relative to a reference genome Fig. Visualization of gene ontology GO term enrichment across the total dataset. In total, GO terms were significantly enriched in the total datasets compared to an outgroup reference annotation. In each, area subtended by a given GO term represents its frequency among significantly enriched GO terms.
We first inferred the topology from the Total matrix under maximum likelihood ML; [ 25 , 26 ] using best-fitting empirical models of protein evolution [ 27 , 28 ] for each partition Additional file 2 : Figure S1. The topology of this tree reflects the emerging [ 1 , 4 , 5 , 9 , 10 ] but still controversial [ 6 — 8 , 13 ] view of the ctenophores Mnemiopsis as the sister lineage to all other metazoans including sponges Amphimedon.
This topology also recovers all major metazoan clades and many widely-recognized relationships and the positions of several long-branch taxa, which include the nematodes Brugia and Caenorhabditis , the larvacean Oikopleura — by far the longest branch in our whole genome metazoan dataset — and the spider mite Tetranychus , are each as expected based on previously published studies [ 29 — 31 ]. The Total dataset is too large to analyze under more appropriate, but computationally expensive site-heterogeneous models [ 18 ]. Because of this, we first analyzed each partition separately in order to derive data on 1 information content [ 32 , 33 ], 2 taxon occupancy, 3 saturation [ 34 ], 4 long-branch score [ 35 ], and 5 rate of evolution.
We also assessed the influence of each of these criteria on the phylogeny, as they have each been proposed to negatively impact phylogenetic inference [ 13 , 35 , 36 ] Additional file 3 : Figure S2 and Additional file 4 : Figure S3. Heat maps depicting long-branch scores among partitions for both the Total and Best matrices and the taxon occupancy for each matrix are shown in Fig.
As with the Total matrix, we performed partitioned ML inference under best-fitting empirical models of protein evolution on the Best matrix. Distributions of long-branch scores and gene occupancy for Total and Best matrices. In long-branch score heat maps, the scores were Z-scaled across columns to highlight among-taxon variability. Red indicates high long-branch scores relative to other taxa and blue denotes low scores.
White in gene occupancy plots corresponds to missing data. The cladograms illustrate results of similarity by hierarchical clustering. Note that in the Total dataset, Amphimedon , Mnemiopsis and Tetranychus cluster with other long-branched taxa that include the outgroups, the nematodes, and Oikopleura.
However, in the Best matrix these taxa cluster with the main group, leaving only the outgroups, nematodes and Oikopleura in the long-branch cluster. In both, the ctenophore Mnemiopsis is the sister to all other Metazoa with maximum bootstrap support and the centipede Strigamia is the sister to the chelicerates Ixodes and Tetranychus.
We note that recent studies [ 38 ] have demonstrated that this topology Paradoxopoda can result from model inadequacies in phylogenetic reconstruction under ML see discussion. Bayesian analyses of the Best matrix under the CAT-GTR model produced a topology similar to the ML analysis of the same dataset, with the exception that the position of the centipede Strigamia is now resolved with maximum support as the sister to Pancrustacea, reflecting the Mandibulata hypothesis [ 38 , 39 ].
This finding presumably reflects the more accurate fit of the model to the data, compared to ML analyses. Summary of phylogenetic results. Unannotated nodes have maximum support for all measures. Scale bar in substitutions per site. For image attributions see Additional file 8.
Similarly, all analyses conducted under CAT-GTR recovered the ctenophore as sister to the remaining Metazoa, but with varying degrees of support depending on the choice of dataset. The CAT-GTR model accounts for differences in the substitution process across sites in a data set, but it does not account for compositional heterogeneity across branches.
This among-branch heterogeneity is present in metazoan alignments from phylogenomic data [ 40 ] and may also negatively impact phylogeny estimation [ 41 , 42 ]. Current implementations of models combining site- and branch-heterogeneity of substitution process are difficult to apply to large data sets [ 40 , 43 ]. We therefore used an alternative approach that has been shown to be successful in reducing the effects of across-taxon heterogeneity [ 40 ] and recoded the amino acids in our Best matrix into six, four, and two categories and analyzed these recoded Best datasets under Bayesian CAT-GTR.
Unfortunately, recoding data into fewer than the original 20 categories results in significant loss of signal in the alignments. The topologies resulting from these analyses where highly inconsistent placing Trichoplax as the sister to the remaining Metazoa, and in some cases, failing to recover a monophyletic Deuterostomia Additional file 4 : Figure S3. The position of ctenophores as sister to the remaining Metazoa was recovered in most analyses above, but some workers have suggested that this topology can be explained by long-branch attraction LBA , a phenomenon that causes long-branched taxa to group together artifactually in a phylogeny, often with strong support [ 44 ].
Several studies have indicated that LBA is a potential problem for reconstructing deep animal phylogeny [ 6 — 8 , 13 , 45 ]. In order to address the potential for LBA to bias our results, we explored various strategies to detect the LBA problem [ 44 ]. If the outgroups were to influence the branching order of non-bilaterians, we would expect the internal topology, or the support therein, to be impacted in an analysis excluding the outgroup taxa. Without the outgroups we lose the ability to reliably root the tree, but it is still possible to explore alternative rooting scenarios and to ask if the topology of the ingroup tree is different from those recovered in outgroup rooted analyses.
For example, these analyses allow for the examination of possible rooting scenarios where ctenophores are sister to cnidarians, a hypothesis representing the so-called Coelenterata hypothesis [ 6 , 8 ]. The ingroup-only topology derived from partitioned ML analysis allows for no possible rooting that would place ctenophores and cnidarians together in a monophyletic group Fig. If rooted with Mnemiopsis , the topology of this tree would be identical to the tree resulting from the ML analysis of a matrix that included outgroups Additional file 2 : Figure S1.
If the position of ctenophores was affected by long-branch attraction, we would expect that the removal of outgroup taxa would alter the branching order or lessen support for non-bilaterian relationships [ 6 ].
About this product
Neither of these possibilities is evident in this analysis. We also examined topologies from partitioned ML analyses in which either the sponge Amphimedon Fig. A rooting where Mnemiopsis forms a clade with cnidarians, thus supporting the Coelenterata hypothesis, is not possible in any of these analyses. Summary of tests for Long Branch Attraction. Unrooted trees from analyses excluding putative long-branch taxa are shown. All analyses were conducted under maximum likelihood, partitioned empirical models.
Non-bilaterian metazoan taxa are highlighted. Other studies focusing on metazoan phylogeny have suggested that the phylogenetic signal needed to resolve deep relationships is confined to slowly-evolving loci and that specific classes of genes may introduce noise that could mislead analyses [ 8 , 31 ]. In order to explore the influence of rate of evolution of partitions on the support for metazoan relationships, we ranked all loci according to their rate of evolution, approximated by the average branch length of the ML tree inferred for each locus.
We then performed a series of unpartitioned ML analyses on matrices that we generated of varying lengths, from few to all loci, beginning with the slowest evolving partitions then progressively adding faster and faster evolving partitions. Unpartitioned ML analysis was conducted for each iteration and support for topologies was assessed using bootstrap replicates. Results from this progressive concatenation approach are detailed in Fig.
Sensitivity analyses using progressive concatenation and rate binning. The x-axis represents number of loci concatenated in order of rate of evolution, from 5 of the most slowly evolving at left to all loci at right. The y-axis indicates bootstrap support. Red circles in cladograms above corresponding plots denote the node for which bootstrap support was assessed.
The x-axis represents bin number and the y-axis indicates bootstrap support. Bin number 1 contains slowest evolving loci in the data set and bin number 10 contains fastest evolving loci. We evaluated support for several possible hypotheses on the position of the ctenophores in metazoan phylogeny including:. Next we explored phylogenetic signal in non-overlapping bins of concatenated data, also of increasing rates. We note that the three bins containing the most slowly evolving loci support the hypothesis that the ctenophore is the sister to other Metazoa.
The most prevalent competing topology places the sponge Amphimedon as sister to all other metazoans, with ctenophores branching second Ctenophore Placozoa, Eumetazoa. None of the analyses showed consistent support for the Coelenterata hypothesis. Our selection of a bin size of loci per bin permitted statistical analyses of GO term enrichment on a bin-by-bin basis.
However, these analyses did not reveal a single instance of GO term enrichment in any of the bins compared to the GO terms present in the total dataset. While individual bins may differ in their rates of evolution and the topologies they support, their composition is not significantly different from the total matrix as measured by GO term enrichment analyses. To further explore the effect of GO category on phylogenetic signal, we prepared datasets for phylogenetic analysis from the only two GO categories from our initial gene dataset Fig.
Concatenated analyses of these datasets under ML produced similar results, however the tree estimated for the mitochondrial cellular component dataset was generally poorly supported Additional file 5 : Figure S4. Both ML partitioned analyses under the best fitting models and Bayesian analyses under GTR-CAT supported the hypothesis of ctenophores as sisters to the remaining metazoans.
Next we sought to understand the relative degree of support for this hypothesis compared to other alternatives. Bayesian tests of topological hypothesis are a powerful means of estimating the relative support for conflicting topologies [ 23 ]. We estimated the marginal likelihoods of three possible hypotheses of monophyly that relate to the position of the ctenophores in our dataset using stepping stone integration [ 24 ] including:.
Our results indicated very strong support for proposal 1 above, which represents the hypothesis of ctenophores as the sister to remaining Metazoa. Large data sets are often insufficient to resolve recalcitrant nodes in the animal tree of life and it has long been recognized that simply increasing the amount of data can exacerbate systematic bias in phylogeny estimation [ 13 , 45 , 46 ].
Because of this, two approaches to improving phylogenomic inference have been proposed. The other is to employ more realistic models of sequence evolution that account for various systematic biases [ 13 , 15 , 45 ]. Here we leverage both approaches and, due to the large size of our initial data matrix, we are able to minimize the impact of various sources of non-phylogenetic signal while retaining a large number of characters for analysis. Our Best dataset represents a refinement of the Total dataset as shown in Fig.
In both datasets, hierarchical clustering sorts a subset of taxa into a long-branch group of sequences. In the Total matrix, this long-branch cluster includes eight taxa including Mnemiopsis and Amphimedon. In the Best matrix, the long-branch cluster is reduced to five taxa and only includes those taxa that reside in non-controversial positions e. In addition, taxon occupancy is enhanced in the Best dataset over the Total dataset, while the rates of evolution are lower and the potential for saturation is minimized.
For these reasons, we expect that the reduced dataset should contain less phylogenetic noise than the Total dataset.
Our results are congruent with several recent studies [ 1 , 4 , 9 , 10 ] that depict the ctenophores as the sister lineage to all other metazoans. This hypothesis receives maximum support in all of our ML analyses Fig. Further, our additional analyses suggest that long-branch attraction artifacts do not drive this result Fig. Perhaps most compelling are our tests of competing hypotheses for the position of the ctenophores using Bayes factors.
This approach to topology comparison is more robust to statistical error than common ML procedures, and the analyses presented here were done using stepping stone integration, which is the most accurate method of estimating the marginal likelihoods of competing hypotheses currently available [ 24 ]. Our results are consistent with the Parahoxozoa hypothesis, which postulates a single origin of Hox genes in the clade comprised of Bilateria, Cnidaria and Placozoa, to the exclusion of Porifera and Ctenophora [ 1 , 10 , 47 ].
None of our analyses support the Coelenterata hypothesis uniting Cnidaria and Ctenophora, a clade that has been recovered in some morphological and phylogenomic analyses [ 6 , 8 , 48 , 49 ]. Our results relating to the position of ctenophores are consistent across the majority of analyses, but one taxon, the sole representative myriapod Strigamia , is decidedly the most labile across analyses. The instability of Strigamia is further demonstrated in progressive concatenation analyses Additional file 6 : Figure S5.
Our findings are consistent with previous studies that demonstrate the importance of model selection and the potential for LBA artifacts in the placement of the myriapod lineage [ 38 ].
In contrast to ctenophores where their position is invariable across the models of molecular evolution employed, the position of the myriapods appears to be sensitive to model selection. Our study addresses the problem of basal metazoan relationships using a large dataset drawn exclusively from whole genome sequences.
Multicellular Animals : Volume II: The Phylogenetic System of the Metazoa
By applying stringent filtering procedures on a very large initial dataset, we were able to obtain reduced datasets that are still much larger than previous analyses, but are exclusively comprised of partitions with high taxon occupancy and low potential for non-phylogenetic signal. Ctenophores are strongly supported as the sister to the remaining Metazoa and support for Parahoxozoa is overwhelming in our analyses, arguing against the traditional grouping of ctenophores and cnidarians into Coelenterata.
While consistently referring to a group that includes Cnidaria and Ctenophora, various workers have also included echinoderms, bryozoans, tunicates and sponges in different formulations of Coelenterata reviewed in Hyman [ 50 ]. Our results are consistent with several recent studies that strongly reject the systematic utility of the term, finding coelenterates animals with a central, fluid-filled cavity to be a polyphyletic assemblage.
One obvious drawback of exclusively relying on taxa with whole genome sequences for metazoan phylogeny reconstruction is that taxon sampling is necessarily low compared to other studies that have analyzed transcriptome-based datasets. While numerous workers have emphasized the importance of taxon sampling [ 4 , 13 ], others have emphasized the importance of data matrix size [ 51 ].
Ideally, both parameters would be maximized while maintaining the computational tractability of matrices under the most appropriate models for molecular evolution. Future studies of metazoan phylogeny will benefit from ongoing efforts to sequence the genomes of additional invertebrate taxa that will inform our view of the relationships between the major lineages of animals [ 52 ]. This is true especially of sponges, where branches subtending this group could be dramatically shortened [ 1 , 6 , 9 ] with additional sampling.
Taxon sampling aimed to maximize the phylogenetic breadth of species that can inform metazoan relationships, while relying exclusively on species with whole genome sequences. Long-branch attraction LBA has been suspected in contributing to the placement of the ctenophores in metazoan phylogeny [ 7 ]. We specifically included other known long-branched taxa such as the nematodes Brugia and Caenorhabditis , the tunicate Oikopleura, and the spider mite Tetranychus so that we could monitor the potential for LBA in our dataset.
Additional file 1 : Table S1 lists these species and the genome databases from which they were obtained. Gene orthology analysis was performed using a pre-release version 2. This version of OrthologID uses the MCL algorithm [ 54 , 55 ] for improved clustering and includes automated extraction of orthologs from gene trees into a partitioned matrix.
Amino acid sequences of 1,, gene models from the complete gene sets of all 36 species were used as input to OrthologID, which produced 26, orthologous groups with at least 4 species represented. We then selected partitions that included 27 taxa or more for inclusion in our analyses, resulting in a total of orthologous groups OGs.
- Multicellular animals: the phylogenetic system of the Metazoa.
- Java Enterprise in a Nutshell?
- Multicellular Animals: Volume II: The Phylogenetic System of the Metazoa / Edition 1!
- A Lethal Cocktail: Exploring the Impact of Corruption on HIV AIDS Prevention and Treatment Efforts in South Africa.
- Brief Survey of the Diversity of the Animal Kingdom.
- 1.d4 (Grandmaster Repertoire) (v. 1)?
- Passar bra ihop!
We then conducted maximum likelihood ML tree estimation on each locus see below. We identified potentially spurious sequences with terminal branches more than five times longer than the average for the tree. We discarded individual sequences using this arbitrary cut-off. One randomly chosen gene from each of OGs was subjected to blast, annotation and mapping using Blast2GO [ 58 ]. Gene Ontology identification numbers GO IDs for each Metazoan partition were abstracted from this analysis and tested for enrichment against GO IDs from the genome of Arabidopsis thaliana , a taxon outside the phylogenetic scope of the focal taxa.
In order to examine the individual topologies of partitions, we estimated a tree for each of the alignments using the best-fitting empirical model under maximum likelihood ML in RAxML [ 26 ]. We also performed bootstrap replicates for each gene tree. The alignment and corresponding single-gene tree characteristics see below served as a basis for several alternative locus selection strategies. ML analyses of each for each concatenated dataset are reported in Additional file 3 : Figure S2.
Animal - Wikipedia
We assembled two matrices selecting for information content. We conserved all 36 taxa and used alpha setting of 3. We evaluated saturation in each locus by performing simple linear regression on uncorrected p-distances against inferred distances for each locus [ 34 ]. In the absence of sequence saturation, the expectation is that these distances would show a perfect fit to simple linear regression.
When there is a need of correction for multiple substitutions, however, the curve will depart from linearity. We used slope and R 2 of the regression to assess fit in each locus. The score is a taxon-specific measure defined as the mean pairwise distance of a terminal to all other terminals, relative to average pairwise distance across all taxa. Because of its taxon-specificity, direct comparisons are not possible among loci, and Struck [ 35 ] suggested standard deviation of LB scores as a measure by which loci can be compared.
However, we observed that alignments with low standard deviation of LB scores had high proportion of missing data for long-branched taxa. Because of this we implemented an alternative approach, focusing on LB scores of the long-branched Amphimedon , Mnemiopsis , and the outgroups, Monosiga and Salpingoeca. We first identified LB mode of density distribution for each taxon, calculated from the Total data set. We then used the number zero to four of these focal taxa falling under the mode in each locus to rank all loci. We used the average branch length of a tree as an approximation of the rate of evolution.
The trees were derived from an ML analysis of each of the loci under the best-fitting empirical model of sequence evolution see above. The average branch length was calculated by dividing the total tree length by the total number of edges internal and terminal branches in the tree. While this measure does not account for the differences in taxon sampling among the alignments, we found that it provides a useful estimate of relative rates among loci in this data set.
A list of loci ranked by average branch length served as a basis for progressive concatenation and binned analyses. We used R packages seqinr [ 63 ] and ape [ 64 ] to compute these statistics, and our R script can be found in the Dryad repository and on GitHub see Availability of supporting data below. We used PartitionFinderProtein [ 65 ] to find optimal partitioning schemes and models for all concatenated matrices.
Because PartitionFinder by default uses Neighbor Joining to estimate guide trees, we first inferred maximum likelihood trees for each unpartitioned matrix using RAxML and used these as user-supplied guide trees for PartitionFinder. We then used RAxML standard versions 8. This program prompts for user input and allows for easy creation of locus-jackknife alignments with other data sets. Maximum likelihood trees were estimated for each unpartitioned matrix under the best empirical model selection scheme in RAxML.
- Let Us, With a Gladsome Mind!
- Windows Undocumented File Formats - Working Inside 16- And 32- Bit Windows?
- Tolleys Managing a Diverse Workforce!
- Epub Multicellular Animals The Phylogenetic System Of The Metazoa Volume Ii;
Analysis with recoded amino acids were performed using PhyloBayes 3. Two independent Monte Carlo Markov chains were produced for every matrix. The resulting tree for each matrix is the majority-rule consensus of all trees pooled across both chains sampled at equilibrium. Trace plots were generated using the mcmcplots package [ 68 ] in R.
To assess the effect that partitions with high rates of evolution have on the inference, we also incrementally concatenated loci evolving at increasing rates. We sorted the gene partitions by their rates of evolution, and created ten matrices by concatenating 5, 10, 15, 20, 30, 50, , , , and slowest evolving loci. We ran a bootstrap replicate, unpartitioned RAxML search on all these matrices and the all-inclusive matrix of loci. We also performed binned analyses where loci were concatenated into ten gene non-overlapping matrices and subjected to a RAxML search as the above.
We then mapped bootstrap support for nodes in alternative topologies using RAxML for all progressively concatenated matrices and bins. The trees and support from these experiments can be found in the Dryad repository associated with this article. Tests of topological hypotheses were conducted in MrBayes 3. Briefly, we sampled from 50 steps with generations each. One step was discarded as burnin. Marginal likelihoods were estimated from , generations and interpreted as per [ 23 ]. Control files and output from stepping stone runs are included in the Dryad repository associated with this article.
These scripts, written in R and Python languages, have been well-annotated and allow for customized input. All sequence datasets, alignments, spreadsheets, annotation files, output files and lists of gene ontology terms for analysis are available at the Dryad link associated with this study. The ctenophore genome and the evolutionary origins of neural systems.
Bosch TC. Cnidarian-microbe interactions and the origin of innate immunity in metazoans. Annu Rev Microbiol. Arendt D. The evolution of cell types in animals: emerging principles from molecular studies. Nat Rev Genet. Broad phylogenomic sampling improves resolution of the animal tree of life. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc Royal Soc B. Phylogenomics revives traditional views on deep animal relationships. Curr Biol. Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships.
Mol Biol Evol. Deep metazoan phylogeny: When different genes tell different stories. Mol Phyl Evol. Error, signal, and the placement of Ctenophora sister to all other animals. Proc Natl Acad Sci. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Ryan JF. Did the ctenophore nervous system evolve independently? Animal phylogeny and its evolutionary implications.
Annu Rev Ecol Evol Syst. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst Biol. According to ROD , p. Although the goal is easily formulated, the path is thorny, and the results achieved continue to be imperfect.
This is the fate of any science that bases its propositions on the interpretation of histor ical evidence. The diversity found in the millions of species originated as a result of the continuous splitting of biopopulations through time. Combined with this was the emergence of hierarchically linked des cent communities of species. We call the process of origin of descent communities phylogenesis.
Related Multicellular Animals: The Phylogenetic System of the Metazoa. Volume II
Copyright 2019 - All Right Reserved