Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Austin G Meyer, Claus O Wilke

We have carried out a comprehensive analysis of the determinants of human influenza A H3 hemagglutinin evolution, considering three distinct predictors of evolutionary variation at in- dividual sites: solvent accessibility (as a proxy for protein fold stability and/or conservation), experimental epitope sites (as a proxy for host immune bias), and proximity to the receptor- binding region (as a proxy for protein function). We have found that these three predictors individually explain approximately 15% of the variation in site-wise dN/dS. However, the sol- vent accessibility and proximity predictors seem largely independent of each other, while the epitope sites are not. In combination, solvent accessibility and proximity explain 32% of the variation in dN/dS. Incorporating experimental epitope sites into the model adds only an ad- ditional 2 percentage points. We have also found that the historical H3 epitope sites, which date back to the 1980s and 1990s, show only weak overlap with the latest experimental epi- tope data, and we have defined a novel set of four epitope groups which are experimentally supported and cluster in 3D space. Finally, sites with dN/dS > 1, i.e., the sites most likely driving seasonal immune escape, are not correctly predicted by either historical or experimental epitope sites, but only by proximity to the receptor-binding region. In summary, proximity to the receptor-binding region, rather than host immune bias, seems to be the primary determinant of H3 immune-escape evolution.

Rates of karyotypic evolution in Estrildid finches differ between island and continental clades

Rates of karyotypic evolution in Estrildid finches differ between island and continental clades
Daniel M Hooper, Trevor D Price

Reasons why chromosomal rearrangements spread to fixation and frequently distinguish related taxa remain poorly understood. We used cytological descriptions of karyotype to identify large pericentric inversions between species of Estrildid finches (family Estrildidae) and a time-dated phylogeny to assess the genomic, geographic, and phylogenetic context of karyotype evolution in this group. Inversions between finch species fixed at an average rate of one every 2.26 My. Inversions were twice as likely to fix on the sex chromosomes compared to the autosomes, possibly a result of their repeat density, and inversion fixation rate for all chromosomes scales with range size. Alternative mutagenic input explanations are not supported, as the number of inversions on a chromosome does not correlate with its length or map size. Inversions have fixed 3.3× faster in three continental clades than in two island chain clades, and fixation rate correlates with both range size and the number of sympatric species pairs. These results point to adaptation as the dominant mechanism driving fixation and suggest a role for gene flow in karyotype divergence. A review shows that the rapid karyotype evolution observed in the Estrildid finches appears to be more general across birds, and by implication other understudied taxa.

Ecological patterns of genome size variation in salamanders

Ecological patterns of genome size variation in salamanders
Bianca Sclavi, John Herrick
Comments: 19 Pages, 4 figures, 1 supplementary figure
Subjects: Genomics (q-bio.GN); Populations and Evolution (q-bio.PE)

Salamanders (urodela) have among the largest vertebrate genomes, ranging in size from 10 to over 80 pg. The urodela are divided into ten extant families each with a characteristic range in genome size. Although changes in genome size often occur randomly and in the absence of selection pressure, non-random patterns of genome size variation are evident among specific vertebrate lineages. Here we report that genome size in salamander families varies inversely with species richness and other ecological factors: clades that began radiating earlier (older crown age) tend to have smaller genomes, higher levels of diversity and larger geographical ranges. These observations support the hypothesis that urodel families with larger genomes either have a lower propensity to diversify or are more vulnerable to extinction than families with smaller genomes.

Reconstructing gene content in the last common ancestor of cellular life: is it possible, should it be done, and are we making any progress?

Reconstructing gene content in the last common ancestor of cellular life: is it possible, should it be done, and are we making any progress?

Arcady Mushegian

I review recent literature on the reconstruction of gene repertoire of the Last Universal Common Ancestor of cellular life (LUCA). The form of the phylogenetic record of cellular life on Earth is important to know in order to reconstruct any ancestral state; therefore I also discuss the emerging understanding that this record does not take the form of a tree. I argue that despite this, “tree-thinking” remains an essential component in evolutionary thinking and that “pattern pluralism” in evolutionary biology can be only epistemological, but not ontological.

A FISH-based chromosome map for the European corn borers yields insights into ancient chromosomal fusions in the silkworm.

A FISH-based chromosome map for the European corn borers yields insights into ancient chromosomal fusions in the silkworm.

Yuji Yasukochi, Mizuki Ohno, Fukashi Shibata, Akiya Jouraku, Ryo Nakano, Yukio Ishikawa, Ken Sahara

A significant feature of the genomes of Lepidoptera, butterflies and moths, is the high conservation of chromosome organization. Recent remarkable progress in genome sequencing of Lepidoptera has revealed that syntenic gene order is extensively conserved across phylogenetically distant species. The ancestral karyotype of Lepidoptera is thought to be n = 31; however, that of the most well studied moth, Bombyx mori, is n = 28, suggesting that three chromosomal fusion events occurred in this lineage. To identify the boundaries between predicted ancient fusions involving B. mori chromosomes 11, 23 and 24, we constructed FISH-based chromosome maps of the European corn borer, Ostrinia nubilalis (n = 31). We first determined 511 Mb genomic sequence of the Asian corn borer, Ostrinia furnacalis, a congener of O. nubilalis, and isolated BAC and fosmid clones that were expected to localize in candidate regions for the boundaries using these sequences. Combined with FISH and genetic analysis, we narrowed down the candidate regions to 40kb ??? 1.5Mb, in strong agreement with a previous estimate based on the genome of a butterfly, Melitaea cinxia. The significant difference in the lengths of the candidate regions where no functional genes were observed may reflect the evolutionary time after fusion events.

Expansion of the HSFY gene family in pig lineages

Expansion of the HSFY gene family in pig lineages

Benjamin M Skinner, Kim Lachani, Carole A Sargent, Fengtang Yang, Peter JI Ellis, Toby Hunt, Beiyuan Fu, Sandra Louzada, Carol Churcher, Chris Tyler-Smith, Nabeel A Affara

Amplified gene families on sex chromosomes can harbour genes with important biological functions, especially relating to fertility. The HSFY family has amplified on the Y chromosome of the domestic pig (Sus scrofa), in an apparently independent event to an HSFY expansion on the Y chromosome of cattle (Bos taurus). Although the biological functions of HSFY genes are poorly understood, they appear to be involved in gametogenesis in a number of mammalian species, and, in cattle, HSFY gene copy number correlates with levels of fertility. We have investigated the HSFY family in domestic pigs, and other suid species including warthogs, bushpigs, babirusas and peccaries. The domestic pig contains at least two amplified variants of HSFY, distinguished predominantly by presence or absence of a SINE within the intron. Both these variants are expressed in testis, and both are present in approximately 50 copies each in a single cluster on the short arm of the Y. The longer form has multiple nonsense mutations rendering it likely non-functional, but many of the shorter forms still have coding potential. Other suid species also have these two variants of HSFY, and estimates of copy number suggest the HSFY family may have amplified independently twice during suid evolution. Given the association of HSFY gene copy number with fertility in cattle, HSFY is likely to play an important role in spermatogenesis in pigs also.

The pig X and Y chromosomes: structure, sequence and evolution

The pig X and Y chromosomes: structure, sequence and evolution

Benjamin M Skinner, Carole A Sargent, Carol Churcher, Toby Hunt, Javier Herrero, Jane Loveland, Matt Dunn, Sandra Louzada, Beiyuan Fu, William Chow, James Gilbert, Siobhan Austin-Guest, Kathryn Beal, Denise Carvalho-Silva, William Cheng, Daria Gordon, Darren Grafham, Matt Hardy, Jo Harley, Heidi Hauser, Philip Howden, Kerstin Howe, Kim Lachani, Peter JI Ellis, Daniel Kelly, Giselle Kerry, James Kerwin, Bee Ling Ng, Glen Threadgold, Thomas Wileman, Jonathan MD Wood, Fengtang Yang, Jen Harrow, Nabeel A Affara, Chris Tyler-Smith

We have generated an improved assembly and gene annotation of the pig X chromosome, and a first draft assembly of the pig Y chromosome, by sequencing BAC and fosmid clones, and incorporating information from optical mapping and fibre-FISH. The X chromosome carries 1,014 annotated genes, 689 of which are protein-coding. Gene order closely matches that found in Primates (including humans) and Carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X chromosome were absent from the pig (e.g. the cancer/testis antigen family) or inactive (e.g. AWAT1), and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y chromosome assembly focussed on two clusters of male-specific low-copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. The long arm of the chromosome is almost entirely repetitive, containing previously characterised sequences. Many of the ancestral X-related genes previously reported in at least one mammalian Y chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes – both single copy and amplified – on the pig Y, to compare the pig X and Y chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y chromosome evolution.

Recent Y chromosome divergence despite ancient origin of dioecy in poplars (Populus)

Recent Y chromosome divergence despite ancient origin of dioecy in poplars (Populus)
Armando Geraldes, Charles A Hefer, Arnaud Capron, Natalia Kolosova, Felix Martinez-Nuñez, Raju Y Soolanayakanahally, Brian Stanton, Robert D Guy, Shawn D Mansfield, Carl J Douglas, Quentin C B Cronk

All species of the genus Populus (poplar, aspen) are dioecious, suggesting an ancient origin of this trait. Theory suggests that non-recombining sex-linked regions should quickly spread, eventually becoming heteromorphic chromosomes. In contrast, we show using whole genome scans that the sex-associated region in P. trichocarpa is small and much younger than the age of the genus. This indicates that sex-determination is highly labile in poplar, consistent with recent evidence of “turnover” of sex determination regions in animals. We performed whole genome resequencing of 52 Populus trichocarpa (black cottonwood) and 34 P. balsamifera (balsam poplar) individuals of known sex. Genome-wide association studies (GWAS) in these unstructured populations identified 650 SNPs significantly associated with sex. We estimate the size of the sex-linked region to be ∼100 Kbp. All significant SNPs were in strong linkage disequilibrium despite the fact that they were mapped to six different chromosomes (plus 3 unmapped scaffolds) in version 2.2 of the reference genome. We show that this is likely due to genome misassembly. The segregation pattern of sex associated SNPs revealed this to be an XY sex determining system. Estimated divergence times of X and Y haplotype sequences (6-7 MYA) are much more recent than the divergence of P. trichocarpa (poplar) and P. tremuloides (aspen). Consistent with this, in P. tremuloides we found no XY haplotype divergence within the P. trichocarpa sex-determining region. These two species therefore have a different genomic architecture of sex, suggestive of at least one turnover event in the recent past.

Triticeae resources in Ensembl Plants

Triticeae resources in Ensembl Plants

Dan M Bolser, Arnaud Kerhornou, Brandon Walts, Paul Kersey

Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last two years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison to other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants ( is an integrative resource organising, analysing and visualising genome-scale information for important crop and model plants. Available data includes reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information, are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history the gene family, are made available for visualisation and analysis. Driven by the use case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment.

The genetic architecture of local adaptation I: The genomic landscape of foxtail pine (Pinus balfouriana Grev. & Balf.) as revealed from a high-density linkage map

The genetic architecture of local adaptation I: The genomic landscape of foxtail pine (Pinus balfouriana Grev. & Balf.) as revealed from a high-density linkage map
Christopher J Friedline, Brandon M Lind, Erin M Hobson, Douglas E Harwood, Annette Delfino-Mix, Patricia E Maloney, Andrew J Eckert

Explaining the origin and evolutionary dynamics of the genetic architecture of adaptation is a major research goal of evolutionary genetics. Despite controversy surrounding success of the attempts to accomplish this goal, a full understanding of adaptive genetic variation necessitates knowledge about the genomic location and patterns of dispersion for the genetic components affecting fitness-related phenotypic traits. Even with advances in next generation sequencing technologies, the production of full genome sequences for non-model species is often cost prohibitive, especially for tree species such as pines where genome size often exceeds 20 to 30 Gbp. We address this need by constructing a dense linkage map for fox- tail pine (Pinus balfouriana Grev. & Balf.), with the ultimate goal of uncovering and explaining the origin and evolutionary dynamics of adaptive genetic variation in natural populations of this forest tree species. We utilized megagametophyte arrays (n = 76–95 megagametophytes/tree) from four maternal trees in combination with double-digestion restriction site associated DNA sequencing (ddRADseq) to produce a consensus linkage map covering 98.58% of the foxtail pine genome, which was estimated to be 1276 cM in length (95% CI: 1174cM to 1378cM). A novel bioinformatic approach using iterative rounds of marker ordering and imputation was employed to produce single-tree linkage maps (507–17066 contigs/map; lengths: 1037.40–1572.80 cM). These linkage maps were collinear across maternal trees, with highly correlated marker orderings (Spearman’s ρ > 0.95). A consensus linkage map derived from these single-tree linkage maps contained 12 linkage groups along which 20 655 contigs were non-randomly distributed across 901 unique positions (n = 23 contigs/position), with an average spacing of 1.34 cM between adjacent positions. Of the 20 655 contigs positioned on the consensus linkage map, 5627 had enough sequence similarity to contigs contained within the most recent build of the loblolly pine (P. taeda L.) genome to identify them as putative homologs containing both genic and non-genic loci. Importantly, all 901 unique positions on the consensus linkage map had at least one contig with putative homology to loblolly pine. When combined with the other biological signals that predominate in our data (e.g., correlations of recombination fractions across single trees), we show that dense linkage maps for non-model forest tree species can be efficiently constructed using next generation sequencing technologies. We subsequently discuss the usefulness of these maps as community-wide resources and as tools with which to test hypotheses about the genetic architecture of adaptation.