Methods for Joint Imaging and RNA-seq Data Analysis

Methods for Joint Imaging and RNA-seq Data Analysis

Junhai Jiang, Nan Lin, Shicheng Guo, Jinyun Chen, Momiao Xiong
(Submitted on 13 Sep 2014)

Emerging integrative analysis of genomic and anatomical imaging data which has not been well developed, provides invaluable information for the holistic discovery of the genomic structure of disease and has the potential to open a new avenue for discovering novel disease susceptibility genes which cannot be identified if they are analyzed separately. A key issue to the success of imaging and genomic data analysis is how to reduce their dimensions. Most previous methods for imaging information extraction and RNA-seq data reduction do not explore imaging spatial information and often ignore gene expression variation at genomic positional level. To overcome these limitations, we extend functional principle component analysis from one dimension to two dimension (2DFPCA) for representing imaging data and develop a multiple functional linear model (MFLM) in which functional principal scores of images are taken as multiple quantitative traits and RNA-seq profile across a gene is taken as a function predictor for assessing the association of gene expression with images. The developed method has been applied to image and RNA-seq data of ovarian cancer and KIRC studies. We identified 24 and 84 genes whose expressions were associated with imaging variations in ovarian cancer and KIRC studies, respectively. Our results showed that many significantly associated genes with images were not differentially expressed, but revealed their morphological and metabolic functions. The results also demonstrated that the peaks of the estimated regression coefficient function in the MFLM often allowed the discovery of splicing sites and multiple isoform of gene expressions.

Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

Matthew D MacManes, Michael B Eisen
doi: http://dx.doi.org/10.1101/009134

As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes) from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript – Slc2a9, likely related to desert osmoregulation – undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

Author post: Generation of a Panel of Induced Pluripotent Stem Cells From Chimpanzees: a Resource for Comparative Functional Genomics

Thus guest post is by Irene Gallego Romero (@ee_reh_neh) on her paper Gallego Romero et al “Generation of a Panel of Induced Pluripotent Stem Cells From Chimpanzees: a Resource for Comparative Functional Genomics” bioRxived here.

Genetic divergence in protein coding regions between humans and chimpanzees cannot explain phenotypic differences between the two species, or, more broadly, between other closely related groups. Although we have known this since the early days of genetic sequencing, it has been very hard to formally test the hypothesis that follows logically – that it may be changes in gene expression and regulation that underlie the divergence in phenotypes. This is especially true in the great apes, where there are plenty of ethical and practical impediments to experimentation. For instance, our ability to carry out functional studies and really decode cellular mechanisms is restricted to tissues that can be sampled non-invasively. To date, this has mostly meant fibroblasts and immortalised lymphoblastoid cell lines. The rest of comparative work in primates tends to be done in tissue samples collected post-mortem, where experimental manipulation is not a possibility.

Together, these limitations provided the impetus for us to develop a panel of high-quality induced pluripotent stem cell (iPSC) lines from chimpanzees. The promise of this panel lies, of course, not just in insights into the pluripotent state in chimpanzees (although that is certainly a worthy subject) but in how it opens the door to a tantalizing number of previously inaccessible questions, when we combine it with any of the many protocols available for differentiating iPSCs into particular somatic cell types that have remained out of reach until now.

The amount of work that went into developing an effective reprogramming protocol is not readily apparent in our preprint, but it was exhaustive – and exhausting! We began by using retroviral vectors to deliver the four factors that are commonly used to reprogram somatic cells to pluripotency, but soon encountered two fairly sizable problems with that approach. First, these viral vectors are integrated into the host genome during the course of reprogramming, and one never knows what they’re going to disrupt. This is an issue that everyone using retro- or lentiviral vectors has to contend with, and indeed, when we began working on the project three and a half years ago they were the most reliable and established reprogramming method around, so we were prepared to take our chances and scan the resulting lines to determine insertion sites. Regardless, the thought of random insertions of pluripotency genes set us somewhat on edge!

However, for reasons that we never fully understood, those chimpanzee lines had a lot of trouble silencing the retroviral vectors and maintaining pluripotency solely through endogenous mechanisms, as we show in one of our supplemental figures. At the time, we were making human iPSC lines in tandem using exactly the same vector stocks. While the human lines would lose most exogenous vector expression after 12 to 15 passages, in chimpanzee iPSCs of the same age we would generally find that expression of at least one, if not more, exongenous genes was as high as it had been on day one. This did not bode well for the lines, or for our ability to do interesting things with them! So we scrapped the integrating approach, and began optimizing protocols all over again. Fortunately for us, Shinya Yamanaka’s group had just published a very thorough protocol on reprogramming cells using non-integrating episomal vectors, which ended up laying the foundations of the one we present in our preprint.

The lines we have generated with it are of fantastic quality, and they have passed every test we have thrown at them with flying colours. Pluripotency is being endogenously maintained, they’re karyotypically normal, and they differentiate into all three germ layers both spontaneously as embryoid bodies and teratomas when injected into mice, and when we use directed protocols to push them towards a particular fate.

We were very interested in quantifying how human and chimpanzee iPSC lines differ from each other. To this end, we collected RNA-sequencing and methylation data from the chimpanzee iPSCs and the fibroblast lines they were generated from, as well as from seven human iPSC lines from various ethnic and cellular origins and their precursors, and compared them to one another. We find large numbers of inter-species differences both before and after reprogramming, but crucially, most of them are not the same differences. Of all the genes with strong evidence for differential expression between species at the iPSC stage, only 38% are also differentially expressed before reprogramming, and the situation is quite similar with regards to methylation.

Another thing we have found very striking in the data is the very clear increase in homogeneity within (and possibly between, although our design makes that harder to effectively quantify) species at the iPSC level relative to the precursor cells, both in gene expression levels and in DNA methylation. This finding will be very interesting to keep in mind as we go forward and differentiate the iPSCs into a suite of somatic cell types and see how these measures fluctuate through differentiation.

Ultimately, however, where the biggest significance of this work lies for us is in the fact that the lines are not just for our own use. They’re available to other researchers, and this is something we have had in mind from the earliest stages of the work. There is no possible way for our lab to even begin to tackle all the questions that these lines can be used to answer. So if you want to work with our chimpanzee iPSC lines, get in touch.

Population genomic analysis uncovers African and European admixture in Drosophila melanogaster populations from the southeastern United States and Caribbean Islands

Population genomic analysis uncovers African and European admixture in Drosophila melanogaster populations from the southeastern United States and Caribbean Islands

Joyce Y Kao, Asif Zubair, Matthew P Salomon, Sergey V Nuzhdin, Daniel Campo
doi: http://dx.doi.org/10.1101/009092

Genome sequences from North American Drosophila melanogaster populations have become available to the scientific community. Deciphering the underlying population structure of these resources is crucial to make the most of these population genomic resources. Accepted models of North American colonization generally purport that several hundred years ago, flies from Africa and Europe were transported to the east coast United States and the Caribbean Islands respectively and thus current east coast US and Caribbean populations are an admixture of African and European ancestry. Theses models have been constructed based on phenotypes and limited genetic data. In our study, we have sequenced individual whole genomes of flies from populations in the southeast US and Caribbean Islands and examined these populations in conjunction with population sequences from Winters, CA, (USA); Raleigh, NC (USA); Cameroon (Africa); and Montpellier (France) to uncover the underlying population structure of North American populations. We find that west coast US populations are most like European populations likely reflecting a rapid westward expansion upon first settlements into North America. We also find genomic evidence of African and European admixture in east coast US and Caribbean populations, with a clinal pattern of decreasing proportions of African ancestry with higher latitude further supporting the proposed demographic model of Caribbean flies being established by African ancestors. Our genomic analysis of Caribbean flies is the first study that exposes the source of previously reported novel African alleles found in east coast US populations.

Secondary contact and local adaptation contribute to genome-wide patterns of clinal variation in Drosophila melanogaster

Secondary contact and local adaptation contribute to genome-wide patterns of clinal variation in Drosophila melanogaster

Alan O. Bergland, Ray Tobler, Josefa Gonzalez, Paul Schmidt, Dmitri Petrov
doi: http://dx.doi.org/10.1101/009084

Populations arrayed along broad latitudinal gradients often show patterns of clinal variation in phenotype and genotype. Such population differentiation can be generated and maintained by a combination of demographic events and adaptive evolutionary processes. Here, we investigate the evolutionary forces that generated and maintain clinal variation genome-wide among populations of Drosophila melanogaster sampled in North America and Australia. We contrast patterns of clinal variation in these continents with patterns of differentiation among ancestral European and African populations. We show that recently derived North America and Australia populations were likely founded by both European and African lineages and that this admixture event generated genome-wide patterns of parallel clinal variation. The pervasive effects of admixture meant that only a handful of loci could be attributed to the operation of spatially varying selection using an FST outlier approach. Our results provide novel insight into a well-studied system of clinal differentiation and provide a context for future studies seeking to identify loci contributing to local adaptation in D. melanogaster.

Average genome size estimation enables accurate quantification of gene family abundance and sheds light on the functional ecology of the human microbiome

Average genome size estimation enables accurate quantification of gene family abundance and sheds light on the functional ecology of the human microbiome

Stephen Nayfach, Katherine S Pollard
doi: http://dx.doi.org/10.1101/009001

Average genome size (AGS) is an important, yet often overlooked property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate AGS from short-read metagenomics data and applied our tool to over 1,300 human microbiome samples. We found that AGS differs significantly within and between body sites and tracks with major functional and taxonomic differences. For example, in the gut, AGS ranges from 2.5 to 5.8 megabases and is positively correlated with the abundance of Bacteroides and polysaccharide metabolism. Furthermore, we found that AGS variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Increasing evolvability of local adaptation during range expansion.

Increasing evolvability of local adaptation during range expansion.
Marleen M. P. Cobben, Alexander Kubisch
doi: http://dx.doi.org/10.1101/008979

Increasing dispersal under range expansion increases invasion speed, which implies that a species needs to adapt more rapidly to newly experienced local conditions. However, due to iterated founder effects, local genetic diversity under range expansion is low. Evolvability (the evolution of mutation rates) has been reported to possibly be an adaptive trait itself. Thus, we expect that increased dispersal during range expansion may raise the evolvability of local adaptation, and thus increase the survival of expanding populations. We have studied this phenomenon with a spatially explicit individual-based metapopulation model of a sexually reproducing species with discrete generations, expanding into an elevational gradient. Our results show that evolvability is likely to evolve as a result of spatial variation experienced under range expansion. In addition, we show that different spatial phenomena associated with range expansion, in this case spatial sorting / kin selection and priority effects, can enforce each other.