Expansion of the HSFY gene family in pig lineages

Expansion of the HSFY gene family in pig lineages

Benjamin M Skinner, Kim Lachani, Carole A Sargent, Fengtang Yang, Peter JI Ellis, Toby Hunt, Beiyuan Fu, Sandra Louzada, Carol Churcher, Chris Tyler-Smith, Nabeel A Affara
doi: http://dx.doi.org/10.1101/012906

Amplified gene families on sex chromosomes can harbour genes with important biological functions, especially relating to fertility. The HSFY family has amplified on the Y chromosome of the domestic pig (Sus scrofa), in an apparently independent event to an HSFY expansion on the Y chromosome of cattle (Bos taurus). Although the biological functions of HSFY genes are poorly understood, they appear to be involved in gametogenesis in a number of mammalian species, and, in cattle, HSFY gene copy number correlates with levels of fertility. We have investigated the HSFY family in domestic pigs, and other suid species including warthogs, bushpigs, babirusas and peccaries. The domestic pig contains at least two amplified variants of HSFY, distinguished predominantly by presence or absence of a SINE within the intron. Both these variants are expressed in testis, and both are present in approximately 50 copies each in a single cluster on the short arm of the Y. The longer form has multiple nonsense mutations rendering it likely non-functional, but many of the shorter forms still have coding potential. Other suid species also have these two variants of HSFY, and estimates of copy number suggest the HSFY family may have amplified independently twice during suid evolution. Given the association of HSFY gene copy number with fertility in cattle, HSFY is likely to play an important role in spermatogenesis in pigs also.

Stationary solutions for metapopulation Moran models with mutation and selection

Stationary solutions for metapopulation Moran models with mutation and selection

George W. A. Constable, Alan J. McKane
(Submitted on 19 Dec 2014)

We construct an individual-based metapopulation model of population genetics featuring migration, mutation, selection and genetic drift. In the case of a single `island’, the model reduces to the Moran model. Using the diffusion approximation and timescale separation arguments, an effective one-variable description of the model is developed. The effective description bears similarities to the well-mixed Moran model with effective parameters which depend on the network structure and island sizes, and is amenable to analysis. Predictions from the reduced theory match the results from stochastic simulations across a range of parameters. The nature of the fast-variable elimination technique we adopt is further studied by applying it to a linear system, where it provides a precise description of the slow-dynamics in the limit of large timescale separation.

The pig X and Y chromosomes: structure, sequence and evolution

The pig X and Y chromosomes: structure, sequence and evolution

Benjamin M Skinner, Carole A Sargent, Carol Churcher, Toby Hunt, Javier Herrero, Jane Loveland, Matt Dunn, Sandra Louzada, Beiyuan Fu, William Chow, James Gilbert, Siobhan Austin-Guest, Kathryn Beal, Denise Carvalho-Silva, William Cheng, Daria Gordon, Darren Grafham, Matt Hardy, Jo Harley, Heidi Hauser, Philip Howden, Kerstin Howe, Kim Lachani, Peter JI Ellis, Daniel Kelly, Giselle Kerry, James Kerwin, Bee Ling Ng, Glen Threadgold, Thomas Wileman, Jonathan MD Wood, Fengtang Yang, Jen Harrow, Nabeel A Affara, Chris Tyler-Smith
doi: http://dx.doi.org/10.1101/012914

We have generated an improved assembly and gene annotation of the pig X chromosome, and a first draft assembly of the pig Y chromosome, by sequencing BAC and fosmid clones, and incorporating information from optical mapping and fibre-FISH. The X chromosome carries 1,014 annotated genes, 689 of which are protein-coding. Gene order closely matches that found in Primates (including humans) and Carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X chromosome were absent from the pig (e.g. the cancer/testis antigen family) or inactive (e.g. AWAT1), and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y chromosome assembly focussed on two clusters of male-specific low-copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. The long arm of the chromosome is almost entirely repetitive, containing previously characterised sequences. Many of the ancestral X-related genes previously reported in at least one mammalian Y chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes – both single copy and amplified – on the pig Y, to compare the pig X and Y chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y chromosome evolution.

FORGE : A tool to discover cell specific enrichments of GWAS associated SNPs in regulatory regions.

FORGE : A tool to discover cell specific enrichments of GWAS associated SNPs in regulatory regions.

Ian Dunham, Eugene Kulesha, Valentina Iotchkova, Sandro Morganella, Ewan Birney
doi: http://dx.doi.org/10.1101/013045

Genome wide association studies provide an unbiased discovery mechanism for numerous human diseases. However, a frustration in the analysis of GWAS is that the majority of variants discovered do not directly alter protein-coding genes. We have developed a simple analysis approach that detects the tissue-specific regulatory component of a set of GWAS SNPs by identifying enrichment of overlap with DNase I hotspots from diverse tissue samples. Functional element Overlap analysis of the Results of GWAS Experiments (FORGE) is available as a web tool and as standalone software and provides tabular and graphical summaries of the enrichments. Conducting FORGE analysis on SNP sets for 260 phenotypes available from the GWAS catalogue reveals numerous overlap enrichments with tissue–specific components reflecting the known aetiology of the phenotypes as well as revealing other unforeseen tissue involvements that may lead to mechanistic insights for disease.

Genetic Analysis of Substrain Divergence in NOD Mice

Genetic Analysis of Substrain Divergence in NOD Mice

Petr Simecek, Gary A Churchill, Hyuna Yang, Lucy B Rowe, Lieselotte Herberg, David V Serreze, Edward H Leiter
doi: http://dx.doi.org/10.1101/013037

The NOD mouse is a polygenic model for type 1 diabetes that is characterized by insulitis, a leukocytic infiltration of the pancreatic islets. During ~35 years since the original inbred strain was developed in Japan, NOD substrains have been established at different laboratories around the world. Although environmental differences among NOD colonies capable of impacting diabetes incidence have been recognized, differences arising from genetic divergence have not previously been analyzed. We illustrate the importance of intersubstrain genetic differences by showing a difference in diabetes incidence between two substrains (NOD/ShiLtJ and NOD/Bom) maintained in a common environment. We use both Mouse Diversity Array and Whole Exome Capture Sequencing platforms to identify genetic differences distinguishing 5 NOD substrains. We describe 64 SNPs, and 2 short indels that differ in coding regions of the 5 NOD substrains. A 100 kb deletion on Chromosome 3 distinguishes NOD/ShiLtJ and NOD/ShiLtDvs from 3 other substrains, while a 111 kb deletion in the Icam2 gene on Chromosome 11 is unique to the NOD/ShiLtDvs genome. The extent of genetic divergence for NOD substrains is compared to similar studies for C57BL6 and BALB/c substrains. As mutations are fixed to homozygosity by continued inbreeding, significant differences in substrain phenotypes are to be expected. These results emphasize the importance of using embryo freezing methods to minimize genetic drift within substrains.

Y Chromosome of Aisin Gioro, the Imperial House of Qing Dynasty

Y Chromosome of Aisin Gioro, the Imperial House of Qing Dynasty

Shi Yan, Harumasa Tachibana, Lan-Hai Wei, Ge Yu, Shao-Qing Wen, Chuan-Chao Wang
(Submitted on 19 Dec 2014)

House of Aisin Gioro is the imperial family of the last dynasty in Chinese history – Qing Dynasty (1644 – 1911). Aisin Gioro family originated from Jurchen tribes and developed the Manchu people before they conquered China. By investigating the Y chromosomal short tandem repeats (STRs) of 7 modern male individuals who claim belonging to Aisin Gioro family (in which 3 have full records of pedigree), we found that 3 of them (in which 2 keep full pedigree, whose most recent common ancestor is Nurgaci) shows very close relationship (1 – 2 steps of difference in 17 STR) and the haplotype is rare. We therefore conclude that this haplotype is the Y chromosome of the House of Aisin Gioro. Further tests of single nucleotide polymorphisms (SNPs) indicates that they belong to Haplogroup C3b2b1*-M401(xF5483), although their Y-STR results are distant to the “star cluster”, which also belongs to the same haplogroup. This study forms the base for the pedigree research of the imperial family of Qing Dynasty by means of genetics.

Using Bayesian multilevel whole-genome regression models for partial pooling of estimation sets in genomic prediction

Using Bayesian multilevel whole-genome regression models for partial pooling of estimation sets in genomic prediction

Frank Technow, L. Radu Totir
doi: http://dx.doi.org/10.1101/012971

Estimation set size is an important determinant of genomic prediction accuracy. Plant breeding programs are characterized by a high degree of structuring, particularly into populations. This hampers establishment of large estimation sets for each population. Pooling populations increases estimation set size but ignores unique genetic characteristics of each. A possible solution is partial pooling with multilevel models, which allows estimating population specific marker effects while still leveraging information across populations. We developed a Bayesian multilevel whole-genome regression model and compared its performance to that of the popular BayesA model applied to each population separately (no pooling) and to the joined data set (complete pooling). As example we analyzed a wide array of traits from the nested association mapping maize population. There we show that for small population sizes (e.g., < 50), partial pooling increased prediction accuracy over no or complete pooling for populations represented in the estimation set. No pooling was superior however when populations were large. In another example data set of interconnected biparental maize populations either partial or complete pooling were superior, depending on the trait. A simulation showed that no pooling is superior when differences in genetic effects among populations are large and partial pooling when they are intermediate. With small differences, partial and complete pooling achieved equally high accuracy. For prediction of new populations, partial and complete pooling had very similar accuracy in all cases. We conclude that partial pooling with multilevel models can maximize the potential of pooling by making optimal use of information in pooled estimation sets.