Filter by type:
The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. As the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y. However, the strategy required the manual setting of key parameters, a time-consuming process leading to sub-optimal assemblies.
We develop a method, RecoverY, that selects Y-specific reads by automatically choosing the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used by Tomaszkiewicz et al., we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection.
Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverY.
Supplementary data are available at Bioinformatics online.
Maternal breast milk (MBM) is enriched in microRNAs, factors that regulate protein translation throughout the human body. MBM from mothers of term and preterm infants differs in nutrient, hormone, and bioactive-factor composition, but the microRNA differences between these groups have not been compared. We hypothesized that gestational age at delivery influences microRNA in MBM, particularly microRNAs involved in immunologic and metabolic regulation.
MBM from mothers of premature infants (pMBM) obtained 3-4 weeks post delivery was compared with MBM from mothers of term infants obtained at birth (tColostrum) and 3-4 weeks post delivery (tMBM). The microRNA profile in lipid and skim fractions of each sample was evaluated with high-throughput sequencing.
The expression profiles of nine microRNAs in lipid and skim pMBM differed from those in tMBM. Gene targets of these microRNAs were functionally related to elemental metabolism and lipid biosynthesis. The microRNA profile of tColostrum was also distinct from that of pMBM, but it clustered closely with tMBM. Twenty-one microRNAs correlated with gestational age demonstrated limited relationships with method of delivery, but not other maternal-infant factors.
Premature delivery results in a unique MBM microRNA profile with metabolic targets. This suggests that preterm milk may have adaptive functions for growth in premature infants.
Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3” half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology.
Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy.
Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.
Because early life growth has long-lasting metabolic and behavioral consequences, intervention during this period of developmental plasticity may alter long-term obesity risk. While modifiable factors during infancy have been identified, until recently, preventive interventions had not been tested. The Intervention Nurses Starting Infants Growing on Healthy Trajectories (INSIGHT). Study is a longitudinal, randomized, controlled trial evaluating a responsive parenting intervention designed for the primary prevention of obesity. This “parenting” intervention is being compared with a home safety control among first-born infants and their parents. INSIGHT’s central hypothesis is that responsive parenting and specifically responsive feeding promotes self-regulation and shared parent-child responsibility for feeding, reducing subsequent risk for overeating and overweight.
316 first-time mothers and their full-term newborns were enrolled from one maternity ward. Two weeks following delivery, dyads were randomly assigned to the “parenting” or “safety” groups. Subsequently, research nurses conduct study visits for both groups consisting of home visits at infant age 3-4, 16, 28, and 40 weeks, followed by annual clinic-based visits at 1, 2, and 3 years. Both groups receive intervention components framed around four behavior states: Sleeping, Fussy, Alert and Calm, and Drowsy. The main study outcome is BMI z-score at age 3 years; additional outcomes include those related to patterns of infant weight gain, infant sleep hygiene and duration, maternal responsiveness and soothing strategies for infant/toddler distress and fussiness, maternal feeding style and infant dietary content and physical activity. Maternal outcomes related to weight status, diet, mental health, and parenting sense of competence are being collected. Infant temperament will be explored as a moderator of parenting effects, and blood is collected to obtain genetic predictors of weight status. Finally, second-born siblings of INSIGHT participants will be enrolled in an observation-only study to explore parenting differences between siblings, their effect on weight outcomes, and carryover effects of INSIGHT interventions to subsequent siblings.
With increasing evidence suggesting the importance of early life experiences on long-term health trajectories, the INSIGHT trial has the ability to inform future obesity prevention efforts in clinical settings.
NCT01167270. Registered 21 July 2010.
Originally believed to be a rare phenomenon, heteroplasmy - the presence of more than one mitochondrial DNA (mtDNA) variant within a cell, tissue, or individual - is emerging as an important component of eukaryotic genetic diversity. Heteroplasmies can be used as genetic markers in applications ranging from forensics to cancer diagnostics. Yet the frequency of heteroplasmic alleles may vary from generation to generation due to the bottleneck occurring during oogenesis. Therefore, to understand the alterations in allele frequencies at heteroplasmic sites, it is of critical importance to investigate the dynamics of maternal mtDNA transmission.
Here we sequenced, at high coverage, mtDNA from blood and buccal tissues of nine individuals from three families with a total of six maternal transmission events. Using simulations and re-sequencing of clonal DNA, we devised a set of criteria for detecting polymorphic sites in heterogeneous genetic samples that is resistant to the noise originating from massively parallel sequencing technologies. Application of these criteria to nine human mtDNA samples revealed four heteroplasmic sites.
Our results suggest that the incidence of heteroplasmy may be lower than estimated in some other recent re-sequencing studies, and that mtDNA allelic frequencies differ significantly both between tissues of the same individual and between a mother and her offspring. We designed our study in such a way that the complete analysis described here can be repeated by anyone either at our site or directly on the Amazon Cloud. Our computational pipeline can be easily modified to accommodate other applications, such as viral re-sequencing.
While the abundance of available sequenced genomes has led to many studies of regional heterogeneity in mutation rates, the co-variation among rates of different mutation types remains largely unexplored, hindering a deeper understanding of mutagenesis and genome dynamics. Here, utilizing primate and rodent genomic alignments, we apply two multivariate analysis techniques (principal components and canonical correlations) to investigate the structure of rate co-variation for four mutation types and simultaneously explore the associations with multiple genomic features at different genomic scales and phylogenetic distances.
We observe a consistent, largely linear co-variation among rates of nucleotide substitutions, small insertions and small deletions, with some non-linear associations detected among these rates on chromosome X and near autosomal telomeres. This co-variation appears to be shaped by a common set of genomic features, some previously investigated and some novel to this study (nuclear lamina binding sites, methylated non-CpG sites and nucleosome-free regions). Strong non-linear relationships are also detected among genomic features near the centromeres of large chromosomes. Microsatellite mutability co-varies with other mutation rates at finer scales, but not at 1 Mb, and shows varying degrees of association with genomic features at different scales.
Our results allow us to speculate about the role of different molecular mechanisms, such as replication, recombination, repair and local chromatin environment, in mutagenesis. The software tools developed for our analyses are available through Galaxy, an open-source genomics portal, to facilitate the use of multivariate techniques in future large-scale genomics studies.
Feline immunodeficiency virus (FIV) and human immunodeficiency virus (HIV) are recently identified lentiviruses that cause progressive immune decline and ultimately death in infected cats and humans. It is of great interest to understand how to prevent immune system collapse caused by these lentiviruses. We recently described that disease caused by a virulent FIV strain in cats can be attenuated if animals are first infected with a feline immunodeficiency virus derived from a wild cougar. The detailed temporal tracking of cat immunological parameters in response to two viral infections resulted in high-dimensional datasets containing variables that exhibit strong co-variation. Initial analyses of these complex data using univariate statistical techniques did not account for interactions among immunological response variables and therefore potentially obscured significant effects between infection state and immunological parameters.
Here, we apply a suite of multivariate statistical tools, including Principal Component Analysis, MANOVA and Linear Discriminant Analysis, to temporal immunological data resulting from FIV superinfection in domestic cats. We investigated the co-variation among immunological responses, the differences in immune parameters among four groups of five cats each (uninfected, single and dual infected animals), and the “immune profiles” that discriminate among them over the first four weeks following superinfection. Dual infected cats mount an immune response by 24 days post superinfection that is characterized by elevated levels of CD8 and CD25 cells and increased expression of IL4 and IFNγ, and FAS. This profile discriminates dual infected cats from cats infected with FIV alone, which show high IL-10 and lower numbers of CD8 and CD25 cells.
Multivariate statistical analyses demonstrate both the dynamic nature of the immune response to FIV single and dual infection and the development of a unique immunological profile in dual infected cats, which are protected from immune decline.
The evolutionary distance between human and macaque is particularly attractive for investigating local variation in neutral substitution rates, because substitutions can be inferred more reliably than in comparisons with rodents and are less influenced by the effects of current and ancient diversity than in comparisons with closer primates. Here we investigate the human-macaque neutral substitution rate as a function of a number of genomic parameters.
Using regression analyses we find that male mutation bias, male (but not female) recombination rate, distance to telomeres and substitution rates computed from orthologous regions in mouse-rat and dog-cow comparisons are prominent predictors of the neutral rate. Additionally, we demonstrate that the previously observed biphasic relationship between neutral rate and GC content can be accounted for by properly combining rates at CpG and non-CpG sites. Finally, we find the neutral rate to be negatively correlated with the densities of several classes of computationally predicted functional elements, and less so with the densities of certain classes of experimentally verified functional elements.
Our results suggest that while female recombination may be mainly responsible for driving evolution in GC content, male recombination may be mutagenic, and that other mutagenic mechanisms acting near telomeres, and mechanisms whose effects are shared across mammalian genomes, play significant roles. We also have evidence that the nonlinear increase in rates at high GC levels may be largely due to hyper-mutability of CpG dinucleotides. Finally, our results suggest that the performance of conservation-based prediction methods can be improved by accounting for neutral rates.