Genomics@Wayne 2016 Talk Abstracts
Epigenetic alterations at intergenic regions associated with progression in a subset of IDH mutant gliomas
Abstract: Malignant gliomas are heterogeneous brain tumors characterized by progression and recurrence with variable patterns. Generally, IDH mutant gliomas are associated with favorable clinical outcome and higher DNA methylation at CpG islands (G-CIMP). However, our group reported a subclassification for G-CIMP (Ceccarelli et al. 2016). We termed these novel subtypes as ‘G-CIMP-low’, with lower DNA methylation profile and significantly worse survival, and ‘G-CIMP-high’, with higher levels of DNA methylation and better clinical outcome. Also, in matched primary and recurrent glioma cases, potential progression from G-CIMP-high to G-CIMP-low was observed. We aimed to understand the potential epigenomic mechanisms leading to progression. To evaluate this, we evaluated publicly available whole-genome bisulfite sequencing (WGBS) and RNAsequencing/Microarray data for G-CIMP-low (n=1), G-CIMP-high (n=2) and non-tumor brain (n=2) samples. We assessed DNA methylation levels at single CpG level (~25 million) and integrated with chromHMM, epigenomic histone modification, and transcriptome data from other datasets. We observed high concordance between Infinium-CpG-microarray and WGBS data (n=25,516 CpG, p-value<0.05, cor=0.95) suggesting our sample size is sufficient for exploratory analysis. At known intergenic regions and potential regulatory elements, G-CIMP-high and non-tumor brain samples show similar DNA methylation profiles whereas G-CIMP-low presents with lower levels of DNA methylation. A significant number of CTCF insulator binding sites (n=24, p-value<0.05) showed DNA methylation differences between G-CIMP-low and G-CIMP-high and deregulated genes near these sites were found to be associated with cell cycle pathways (n=23). Also, there were DNA methylation changes within long nuclear lamina domains, suggesting that long-range chromatin accessibility may be affected in G-CIMP-low. Combined these results suggest that most of the epigenetic alterations in G-CIMP-low are at intergenic regions, especially candidate functional regulatory elements, which may alter the 3D structure of chromatin domains in G-CIMP-low, allowing specific tumor suppressor genes to be deregulated and thereby leading to more aggressive phenotypes for G-CIMP-low.
Epigenomic profiling at high resolution reveals genetic regulatory signatures underlying islet gene expression and type 2 diabetes
Abstract: GWAS have identified >100 SNPs that encode type 2 diabetes (T2D) and related trait susceptibility. However, the pathogenic mechanisms for most of these SNPs remain elusive. Here, we examined genomic, epigenomic, and transcriptomic profiles in disease-relevant human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D. We integrated genome and transcriptome (RNA-seq) variation across 112 islet samples to produce dense cis-eQTL maps. Further integration with chromatin state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet specific genes are specifically and significantly enriched in islet stretch enhancers. High-depth (>1.4B reads) ATAC-seq chromatin profiling in two islet samples enabled us to identify specific transcription factor (TF) footprints in active regulatory elements, which are highly enriched for islet cis-eQTL. Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality of the TF footprints. Interestingly, we found that T2D GWAS loci were specifically and significantly enriched (P = 4.4×10-07, fold enrichment = 30.1) in islet Regulatory Factor X (RFX) footprints. Remarkably, across independent T2D GWAS loci, risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high information content positions. Among RFX TFs, RFX6 is expressed in islets with high specificity, is involved in maintaining beta cell functional identity, and controls glucose homeostasis. Rare recessive mutations in the DNA binding domain of RFX6 result in Mitchell-Riley syndrome, which is characterized by neonatal diabetes. Our findings may represent a novel connection between rare coding variation in the islet master TF RFX6 and common non-coding variation in multiple target sites for this TF. Together, these results suggest that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.
Are small RNAs the missing link to X recognition?
Abstract: Eukaryotic genomes are organized into large domains of coordinated regulation. The role of small RNAs in formation of these domains is largely unexplored. An extraordinary example of domain-wide regulation is X chromosome compensation in Drosophila melanogaster males. Males have one X chromosome while females have two. To adjust X chromosome gene expression to maintain the X to autosome expression ratio, D. melanogaster males double transcription of X-linked genes. This is accomplished by the Male Specific Lethal (MSL) complex, which binds the X, modifies chromatin and increases expression. It is unclear how the MSL complex selectively identifies the X chromosome, but the siRNA pathway contributes to X-localization. However, no components of the RNAi pathway interact directly with the MSL complex, suggesting a novel, indirect mechanism. For example, an Ago2-containing complex could bind nascent RNAs from the X chromosome and recruit activities that alter epigenetic marks or chromatin architecture. This might facilitate MSL recruitment and spreading along the X-chromosome. To test this model, we used publically available databases to assemble an Ago2 protein interaction network. This formed the basis of a targeted screen that identified several Ago2-interacting proteins that contribute to dosage compensation, including some that modify chromatin. I then used Chromatin ImmunoPrecipitation (ChIP) to demonstrate that these proteins do indeed maintain chromatin marks on the X chromosome. This analysis will be extended determine if these proteins themselves localize to X chromatin. Understanding the basis of whole chromosome recognition will enhance our understanding of genome regulation in eukaryotes.
Di-butyl phthalate exposure shifts the RNA profiles of human sperm
Abstract: Endocrine disruptors, chemicals that perturb hormonal function, are suspected of affecting reproductive function. Low-dose exposures to endocrine disruptors such as phthalates, widely used as plasticizers, are commonplace. Environmental stimuli, such as paternal stress or high fat diet, are known to alter the abundance of RNA species in the male germline. We evaluated the relationship between di-butyl phthalate (DBP) exposure and sperm RNA profiles among men with Inflammatory Bowel Disease (IBD). Men with mild IBD on mesalamine were recruited. Mesalamine is formulated as Asacol using a DBP-containing coating or as Pentasa/Lialda without DBP in the coating. Patients that entered the study on Asacol, switched to Pentasa for 4 months (Crossover), then back to Asacol for 4 months (Crossback). Those entering the study on Pentasa/Lialda were switched to Asacol for 4 months and then crossbacked to their original medication for 4 months. This study structure provides longitudinal samples from patients across both time periods and medication, with semen samples obtained at baseline, at the 4 month Crossover and 4 month Crossback. RNA was isolated and RNA-seq libraries generated and sequenced. The relative contribution of other cell tissue types to the sperm samples was inferred using the GTEx tissue expression (http://www.gtexportal.org/home/). To assess sample homogeneity, the most abundant transcripts in each GTEx tissue were correlated with their abundance in each of the sperm samples used in this study. Using mixed linear models, differences in sperm RNA species abundance between phthalate-exposed and unexposed sperm were assessed. Altered levels of transcripts were associated with GO categories highlighting testis and spermatozoa, indicative of DBP’s effect on the male reproductive system. Even in the absence of changes to typical semen parameters, RNA-seq revealed a series of subtle changes. This highlights the use of sperm RNAs as biomarkers to describe the effects of environmentally relevant DBP exposure.
GENOME-WIDE STUDIES REVEAL NOVEL BIOLOGICAL PROCESSES REGULATED BY SIN3 ISOFORMS
Abstract: The multisubunit SIN3 complex is a global transcriptional regulator. In Drosophila, a single Sin3A gene encodes different isoforms of SIN3, of which SIN3 187 and SIN3 220 are the major isoforms. Previous studies have demonstrated functional non-redundancy of SIN3 isoforms. The role of SIN3 isoforms in regulating distinct biological processes, however, is not well characterized. We established a Drosophila S2 cell culture model system in which cells predominantly express either SIN3 187 or SIN3 220. To identify genomic targets of SIN3 isoforms, we performed chromatin immunoprecipitation followed by deep sequencing. Our data demonstrate that upon overexpression of SIN3 187, the level of SIN3 220 decreased and the large majority of genomic sites bound by SIN3 220 were instead bound by SIN3 187. We used RNA-seq to identify genes regulated by the expression of one isoform or the other. In S2 cells, which predominantly express SIN3 220, we found that SIN3 220 directly regulates genes involved in metabolism and cell proliferation. We also determined that SIN3 187 regulates a unique set of genes and likely modulates expression of many genes also regulated by SIN3 220. Interestingly, biological pathways enriched for genes specifically regulated by SIN3 187 strongly suggest that this isoform plays an important role during the transition from the embryonic to the larval stage of development. These data establish the role of SIN3 isoforms in regulating distinct biological processes. This study substantially contributes to our understanding of the complexity of gene regulation by SIN3.
Highly sensitive fetal fraction determination allows for noninvasive diagnosis of monogenic diseases in utero
Abstract: The discovery of fetal DNA in the mother’s blood opened the door to a new era of prenatal diagnostics, however it was the ability to detect fetal specific variations against an overwhelming background of maternal DNA that has propelled noninvasive prenatal testing (NIPT) to where it is today. The amount of fetal DNA present in the mother’s blood is highly variable, dependent on factors such as gestational age, maternal BMI and smoking, and represents only 2-20% of the total DNA. While the ability to detect large scale copy number variations, such as changes in chromosome ploidy and sub-chromosomal deletions, has been well established using a variety of methods, The limited number of fetal genome equivalents present in the maternal blood can result in asymmetric allele ratios and difficulty in detecting fetal variants at a finer scale.
The ability to detect fetal variants on a finer scale, i.e. disease causing SNPs, could allow for diagnosis of monogenic disorders early in gestation using noninvasive methodologies. By incorporating an accurate fetal fraction into noninvasive prenatal methodologies the sensitivity of chromosome ploidy and microdeletion detection is significantly increased and we are able to detect single base variants with a minimized impact of asymmetric allele ratios.
Our highly accurate method for fetal fraction estimation is based on the presence of paternally inherited alleles over especially selected polymorphic sites in the genome. This method focuses on error reduction through INDEL targeting, PCR-duplicate removal and an alignment free analysis. As a proof of concept we have screened pregnant mothers for a common benign SNP within CFTR using droplet digital PCR (ddPCR) to assess the maternal and fetal genotypes. Overall, our method of fetal fraction estimation combined with ddPCR for disease allele genotyping, provides an attractive and promising strategy for noninvasive diagnosis of monogenic diseases in utero.
2-Scale KNN Classification of Mass Spectra Data
Abstract: Diverse algorithms and methods are needed to answer the ever increasing need of adequately harnessing Mass Spectrometer generated data. The unique nature and structure of mass spectra data usually, requires a high level of expertise and rigorous algorithms. This study's methodology discusses feature selection based on direct observations of variables and their inter-relationships, Jackknife technique for data sampling, matrix to vector decomposition and successfully classifies Alzheimer's disease patients into three disease stages; age-matched controls without any evidence of dementia, patients with mild cognitive impairment and patients with clinical symptoms of Alzheimer's disease (AD), using a 2-scale principle of K-nearest neighbor (KNN) algorithm on SELDI data and without collaborating clinical records. Hitherto, there exists no clinical diagnostic tool for AD, in lieu of this, patient cognitive abilities are clinically followed-up over a period of time (may be months) to make a diagnosis. This practice usually leads to inconclusive diagnosis and results obtained from it are not generalizable. Our model provides an immediate classification and correctly classifies test data sets with 82% confidence. It can also identify traces of positive/negative change within and across data sets in regards to severity of the disease over time.
Disease-Causing Mutation Screening in miR‐183/96/182 in Patients with Inherited Retinal Dystrophy
Abstract: Purpose: microRNAs (miRNAs) are small, non-coding RNAs and represent a newly recognized, important mechanism of gene-expression regulation. However, their roles in inherited retinal dystrophy (IRD) are still largely unknown. Previously, we identified a sensory organ-specific miRNA cluster, the miR-183/96/182 cluster (miR-183/96/182), located at Chr6qA3.3 in mouse and Chr7q32.2 in human. Inactivation of miR-183/96/182 in mice resulted in congenital syndromic IRD. We hypothesize that mutations in miR-183/96/182 in human may cause IRD. We will test this hypothesis in this project.
Results: We have received more than 1100 patient samples. Samples with X-linked IRD or known disease-causing mutations are excluded from the screening. Among 188 patient samples screened by far, three sequence variants are found, two in pre-miR- 182 and one in pre-miR- 96, in four patient samples. Among these, two sequence variants in pre-miR- 182 are known single nucleotide polymorphisms (SNPs). However, the mutation in pre-miR- 96 is a novel variant. Whether the new variant affects miRNA biogenesis and contributes to the disease is under investigation.
Conclusions: We established the methodology of robust amplification and sequencing of miR-183/96/182 and efficient identification of sequence variants. Further characterization of the new variants and screening in other patient samples may identify mutations in miR-183/96/182 responsible for IRD. This research will uncover new molecular mechanism of IRD and provide new gene diagnosis and therapeutic target for gene therapy.
High-throughput allele-specific expression across 250 environmental conditions
Abstract: Gene-by-environment interactions (GxE) determine common disease risk factors and biomedically relevant complex traits. However, quantifying how the environment modulates genetic effects on human quantitative phenotypes presents unique challenges. Environmental covariates are complex and difficult to measure and control at the organismal level, as found in GWAS and epidemiological studies. An alternative approach focuses on the cellular environment using in vitro treatments as a proxy for the organismal environment. These cellular environments simplify the organism-level environmental exposures to provide a tractable influence on sub-cellular phenotypes, such as gene expression. Expression quantitative trait loci (eQTL) mapping studies identified GxE in response to drug treatment and pathogen exposure. However, eQTL mapping approaches are infeasible for large scale analysis of multiple cellular environments. Recently, allele-specific expression (ASE) analysis emerged as a powerful tool to identify GxE in gene expression patterns by exploiting naturally-occurring environmental exposures. Here we characterized genetic effects on the transcriptional response to 50 treatments in 5 cell types. We discovered 1,455 genes with allele-specific expression (ASE) (FDR<10%) and 215 genes with GxE. We demonstrated a major role for GxE in complex traits. Genes with a transcriptional response to environmental perturbations showed 7-fold higher odds of being found in GWAS. Additionally, 105 genes that indicated GxE (49%) were identified by GWAS as associated with complex traits. Examples include GIPR-caffeine interaction and obesity, and LAMP3-selenium interaction and Parkinson disease. Our results demonstrate that comprehensive catalogs of GxE interactions are indispensable to thoroughly annotate genes and bridge epidemiological and genome-wide association studies.
Prevalence of Antibiotic-resistant Soil Bacteria and Antibiotic Resistance Genes in the Urban Agricultural Environment
Abstract: Environmental reservoirs, in particular soils, are widely believed to constitute an important source of antibiotic resistance. However, the nature and extent of this antibiotic resistance reservoir is yet to be determined, especially that associated with urban agriculture. The aim of this study was to isolate antibiotic-resistant soil bacteria and determine the distribution of common antibiotic resistance genes in soil associated with urban agriculture. A total of 41 soil samples were collected from an urban garden in Detroit in the summer of 2015. Soil bacteria were isolated and identified by 16S rRNA gene sequencing. Total metagenomic DNA was extracted from 21 soil samples, followed by deep HISeq Illumina sequencing. A total of 207 soil bacteria were isolated; and predominated by Chryseobacterium sp. (33.33%), Stenotrophomonas sp. (18.10%) and Sphingobacterium sp (12.38%). Gram-negative bacteria showed highest resistance to ampicillin (94.2%), followed by chloramphenicol (80.0%), cefoxitin (79.5%), gentamicin (78.4%) and ceftriaxone (71.1%). All Gram-positive bacteria (100%) were resistant to three antibiotics- penicillin, gentamicin and kanamycin. Metagenomics study showed high prevalence of β-lactam (37.3%), macrolides (36.0%) and tetracyclines (18.4%) resistance genes. Soil bacterial isolation and phenotypic determination of antibiotic resistance, together with soil metagenomics, provide a valuable tool to study the nature and extent of antibiotic resistance in the environment.
Genomics@Wayne 2016 Poster Abstracts
Genetic and transcriptional analysis of human host response to healthy gut microbiota
Abstract: Over 1,000 species of bacteria live in the human gut. This gut microbiota has been shown to play a role in both healthy and diseased states. However, establishing the causality of host-microbiota interactions in humans is still challenging. We have developed a novel experimental system to define the transcriptional response induced by the microbiota in human cells and shed light on the molecular mechanisms underlying host-gut microbiota interactions. We cultured primary human colonic epithelial cells in low oxygen to recapitulate the gut environment for 4 and 6 hours under three conditions: with high (100:1) and low (10:1) concentrations of bacteria and alone, as controls. We sequenced the RNA from each sample and identified over 6,000 host genes that change expression following co-culturing as compared to colonocytes cultured alone. The differentially expressed genes are enriched for genes associated with several microbiota-related diseases, such as obesity and colorectal cancer. In addition, we identified 87 host SNPs that show allele-specific expression in 69 genes. For 12 SNPs in 12 different genes, allele-specific expression is conditional on the exposure to the microbiota. Of these 12 genes, eight have been associated with diseases linked to the gut microbiota, specifically colorectal cancer, obesity and type 2 diabetes, suggesting that these genes may link the microbiota mechanistically to these diseases. For example, LASP1, a gene involved in the cytoskeleton and cell migration, shows an increase in total expression as well as allele-specific expression only following co-culturing with the microbiota. Furthermore, LASP1 expression is increased in colorectal cancer suggesting that the microbiota may influence colorectal cancer through modulation of LASP1 expression in a genotype-specific manner. Our study demonstrates a scalable approach to study host-gut microbiota interactions and can be used to identify putative mechanisms for the interplay between host genetics and microbiota in health and disease.
Evolutionary Fates and Dynamic Functionalization of Young Duplicate Genes in Arabidopsis Genomes
Abstract: Gene duplication is one of the major sources to supply the raw genomic material, facilitating the evolution of genomes and organisms. Particularly in plants, a high abundance of duplicate genes have been maintained for significantly long periods of evolutionary time, as it constitutes 65% of Arabidopsis thaliana genes. Although several evolutionary processes (i.e. conservation, subfunctionalization, neofunctionalization, and specialization) for gene duplication have been studied individually, the exact contributions of these evolutionary processes in plant genomes are still largely unknown. To determine the potentially driving mechanisms of young duplicate genes in plants, we performed a comparative analysis of genome-wide expression profiles of five tissues from Arabidopsis thaliana and other two closely related species. Conservation, neofunctionalization, and specialization are found to be three main evolutionary processes for plant young paralogs. Upon origination, duplicate genes tend to maintain their ancestral functions; but the novel function is more likely to be gained as they survive longer. Further analyses reveal that evolutionary fates of duplicate genes are associated with their ancestral functions. Our findings also support the “out of pollen” hypothesis, which shows that duplicate genes tend to be initially expressed in pollen, and then gain specific expression spectra ultimately. The important roles of specialisation and neofunctionalization in the retention of young duplicates and the evolutionary fates of gene duplication in Arabidopsis genomes could be illustrated in this study.
Developing new genomic concepts: ‘chromosomal coding’ and ‘fuzzy inheritance’
Abstract: Understanding the genomic basis of disease has now become much more complicated as it is reflective of huge genomic heterogeneity. Despite various large-scale genomic studies that have challenged the theoretical bases of inheritance, little effort has been paid to searching for new conceptual frameworks that can better explain the genomic reality of heterogeneity as well as missing heritability. Considering heterogeneity, we have proposed the genome theory, which considers genome-level alterations, rather than individual gene mutations, as representing an information package under somatic cell selection, which contributes to diseases. In this presentation, three critical concepts will be discussed: 1. Chromosomal coding; 2. The relationship between gene coding and chromosomal coding; 3. Fuzzy inheritance-defined heterogeneity as a main mechanism for disease. These concepts will be articulated through the theoretical synthesis of genomic heterogeneity and experimental results within the context of somatic cell evolution. The ensuing analyses lead to the conclusions that 1. The chromosome codes for the physical topology of the gene interaction map, specifically by coding the gene address within the nucleus, 2. Multiple levels of fuzzy inheritance are responsible for the high degree of heterogeneity detected in both individuals and populations, and 3. Cellular heterogeneity is essential for both adaptations and diseases.
Heng HH. (2016). Debating Cancer: The Paradox in Cancer Research (book).
Heng HH et al., (2016). Genotype, environment, and evolutionary mechanism of diseases. Environ Dis 1:14-23.
Heng HH et al (2016). Heterogeneity mediated system complexity: The ultimate challenge for studying common and complex diseases. In: The Value of Systems and Complexity Sciences for Healthcare, ed. Sturmberg JP, pp. 107-120. Springer
Horne SD et al., (2015). Chromosomal instability (CIN) in cancer. eLS, 1-9.
Heng HH et al (2011). Decoding the genome beyond sequencing: The new phase of genomic research. Genomics, 98.4: 242-252.
centiSNPedia: A universal annotation for eQTL enrichment in any tissue
Abstract: A large fraction of the genetic loci identified by genome wide association studies (GWAS) are located in non-coding regions of the genome and likely act by affecting gene regulation. Expression quantitative trait loci (eQTLs) analysis has emerged as a powerful tool to link genetic variants with changes in gene expression, but as in GWAS, eQTL resolution is limited by linkage disequilibrium. High resolution annotations of genetic variants are useful in locating and predicting the effect of the causal variant in transcription factor (TF) binding sites, but the available data are often on specific cell-types and may not exactly match the eQTL tissue of interest. Here, we propose to use a new annotation (centiSNPedia) that aggregates CENTIPEDE footprints across hundreds of cell-types and retains the TF identity as a categorical variable (1117 factors) to be used for enrichment analysis. This is a useful simplification that assumes that when a TF is active it may bind to similar locations across tissues. Using centiSNPedia, we analyzed GTEx eQTLs in 44 different tissues with a deterministic approximation of posteriors (DAP) approach we recently developed. By combining multiple cell-types, we obtain a more complete genome-wide annotation for each TF, with tighter estimates for the enrichment parameters. For homogeneous tissues with obvious matching cell-types, we obtain a similar set of enriched TFs (e.g. HNF in liver), as if we were using a cell-type specific annotation. The signature of TFs enriched is highly representative of the underlying eQTL tissue, even if the relevant cell types were not available in building centiSNPedia. Additionally, for many tissues (e.g. fat), we discover that the most enriched categorical annotations correspond to TFs that can be modulated by the environment (e.g. glucocorticoid receptor). These may represent latent GxE interactions in the GTEx individuals exposed to that environment.
A novel allele-specific data analysis method for high-throughput reporter assays
Abstract: The majority of the human genome is composed of non-coding regions filled with regulatory elements such as enhancers, which are crucial to controlling gene expression. Many GWAS signals are in these regions, and may disrupt gene regulatory sequences, so it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, high-throughput assays such as Massively Parallel Reporter Assays (MPRA) have been used to fine-map variants associated with gene expression in lymphoblastoid cell lines (LCLs) and HepG2. However, we are still missing high-quality data analysis tools to analyze these datasets to identify variants showing differential expression between alleles. We have further developed our method for allele specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele specific barcode read counts data from MPRAs. Using this approach, we can take into account the uncertainty on the original plasmid proportions and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. We also show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically calibrated. Applying this approach to the data by Tewhey et al 2016, we find 612 SNPs with significant (FDR 10%) allele specific regulatory function in LCLs. Our study shows that by having the appropriate data analysis tools, we can greatly improve our power to detect allelic effects in synthetic massively parallel reporter assays.
Whole Exome Sequencing Identifies BRCA2 Truncating Variant K3326* as a Possible Modifier of Penetrance in Hereditary Ovarian Cancer
Abstract: Ovarian cancer (OVCA) is the most lethal gynecological malignancy due to a lack of early detection methods. Since an estimated 25% of OVCA cases are suspected to be inherited, genetic testing can be utilized to identify those at elevated risk. Unfortunately, the majority of the genetic predisposition underlying familial ovarian cancer remains unaccounted for, limiting the impact of clinical genetic counseling. To identify novel risk variants associated with OVCA, we performed whole exome sequencing on 48 OVCA patients suspected to have an inherited risk factor, yet previously undergone clinical testing and found to be negative for known pathogenic variants in either BRCA1 or BRCA2. We employed in silico SNP analysis to identify suspect variants followed by validation using Sanger DNA sequencing.
The BRCA2 variant p.K3326*, resulting in a 93 amino acid truncation, was overrepresented in our cohort (OR = 4.95, p= 0.01) Originally this variant was considered benign, and therefore not identified in the initial BRCA1/BRCA2 screening. However, it has since been established to be a risk factor for lung, oral and pancreatic cancers and associates with invasive serous OVCA [p= 7.11x10-8 OR (95% CI) = 1.57 (1.44-1.70). Two of the four carriers also had a second known pathogenic variant of low to moderate penetrance, one in ATM and another in RAD51D. Other rare and predicted to be damaging variants of unknown significance in RECQL, ERCC6, TP53BP1, BUB1, AXIN1 and HMMR were observed in p.K3326* carriers. Additionally, we found three BRCA2 p.k3326* carriers in The Cancer Genome Atlas Database, all of which had OVCA and a second pathogenic variant(NBN=1, BRCA1=2). These findings as well as preliminary experimental data suggest that the BRCA2 p.K3326* variant may act as a modifier of penetrance to other pathogenic variants in the DNA repair pathway.
Systems framework for multi-dimensional redox system regulations in vascular dysfunction
Abstract: Introduction: Endothelial dysfunction, characterized by reduced nitric oxide (NO) bioavailability and increased reactive oxygen species (ROS) production, underlies the vascular complications in diabetes. Under physiological conditions the ROS are neutralized by antioxidant enzymes such as superoxide dismutase (SOD), catalase and glutathione peroxidase. However, there is a lack of understanding about the individual and collective effects of oxidative stress and antioxidant system on the endothelial cellular redox state.
Materials and Methods: A kinetic model of endothelial cell involving eNOS catalysis, oxidative stress and antioxidant enzymes was developed. The effects of SOD concentration on the eNOS NO production rates and concentration profiles of NO, O₂•‾ and other ROS including hydrogen peroxide (H2O2) under oxidative stress conditions were investigated. In addition, we measured O₂•‾, and NO levels experimentally using dihydroethidium (DHE) and DAF-FM diacetate, during superoxide dismutase (SOD) and catalase treatments in hyperglycemia-induced oxidative stress environment in HUVECs.
Results and Discussion: The model results predicted that the NO production rate decreased as cellular oxidative stress increased. Supplementation of SOD improved NO production rate, thus keeping eNOS in coupled state (Figure 1). SOD also increased H2O2 concentration under oxidative stress conditions. Experimental studies of hyperglycemia significantly increased O₂•‾ and decreased NO levels in HUVECs. Treatments to exogenous SOD and catalase decreased O₂•‾ levels and improved NO levels in hyperglycemic-HUVECs.
Conclusions: The production of NO, O₂•‾ and H2O2 is tightly regulated in endothelial dysfunction. Thus, in vascular diseases along with the O₂•‾ and NO, regulation of H₂O₂ levels serves a viable therapeutic alternative for improving the redox state. The indirect manipulation of the fate of O₂•‾ may hold a key to reverse and manage the endothelial dysfunction. The use of culminating modeling and experimental studies can further outline the dynamics of cellular redox system in oxidative stress condition.
Potential of Blood RNA-Seq Profiles to Predict Post Concussive Syndrome
Abstract: Following a blow to the head, concussion symptoms (headache, sleep disturbances, mood changes, etc.) may persist beyond a period of days or weeks, a condition referred to as post-concussive syndrome (PCS). A major obstacle in diagnosing PCS and concussion in general, is lack of objective methods for verifying injury occurrence. Here we use whole transcriptome analysis to identify patients with PCS. Whole blood was collected from 40 volunteers (33 males and 7 females; 24 previously concussed and experiencing long term symptoms as identified by neuropsychological assessment and 16 controls with no reported concussion) at the Dwight D. Eisenhower Army Medical Center in Fort Gordon, GA. The number of concussions ranged from 1-16 and time since last concussion between 12 and144 months. Differential exon expression was determined via RNA-Seq on a SOLiD 5500xl DNA sequencer. rRNA-depleted samples were run at an average depth of 25 M. DNA reads were aligned to the hg19 reference genome using LifeScope software, and annotated using the RefSeq annotation guide (2015-11-03). Data were filtered to include genes with greater than 10 reads in 25% of all samples. Data were then quantile normalized followed by log2 transformation with an offset of 1. Gene expression values (RPKM) were subjected to ANOVA to determine differential expression between control and concussion patients. A total of 71 genes showed significant differential expression between concussed and controls (unadjusted p<0.01), and was used with support vector machine modeling to create several prediction models (normalized accuracy rate ≥ 90%). Our preliminary results demonstrate the potential utility of gene expression analysis as the foundation for an objective method of diagnosing PCS in males.
Transcription factor footprints predict the gene expression response to environmental stimuli
Abstract: Transcription factors are major determinants of gene expression changes in response to environmental stimuli. As such, the identity of transcription factors that have binding sites (TFBS) near a gene should be informative as to how that gene responds to environmental perturbations. However, the diversity of TFBS and their combinatorial nature, combined with extensive cell type-specific and distance-dependent effects, makes predicting transcriptional responses from TFBS challenging. Here we used a generalized linear model with elastic net regularization to select TFBS derived from DNase-seq footprints that discriminates between up- and downregulated genes in response to 29 environmental stimuli in five cell types. We started building the model using TFBS derived from cell type-specific footprints for lymphoblastoid cells (LCLs) and human umbilical vein endothelial cells (HUVECs). While baseline gene expression and H3K27ac histone marks were predictive of gene expression changes, the addition of TFBS led to significant increases in our predictive ability. To predict gene expression response in cell types that do not have specific TFBS data available, we combined annotations from all ENCODE and Roadmap Epigenomics DNase-seq datasets. Gene expression response was most predictable in LCLs, where we correctly classified a gene as being up- or downregulated 80% of the time. Response to certain treatments, such as vitamin D, was highly predictable across multiple cell types. For a number of well-studied treatments such as dexamethasone and retinoic acid, we identified factors that have been shown experimentally to be involved in cellular response to that treatment, such as the glucocorticoid receptor for dexamethasone. In addition, we identified a number of TFBS that were predictive across a variety of treatments and cell types, such as SREBP-1. These results indicate that compendium and pre-treatment landscape of bound TFs can be used to predict cellular responses to environmental stimuli and to identify regulators of gene expression.
Monitoring HDAC class IIa activity in the brain after single prolonged stress using noninvasive PET imaging with 18F-TFAHA
Abstract: Posttraumatic stress disorder is an incapacitating psychiatric disorder which is characterized by exposure to a traumatic event and about 9% of the general population that experiences a traumatic event develops PTSD and about 11-20% of military and veteran populations are diagnosed with PTSD. Emotional as well as psychological strain even without brain injury often leads to PTSD-like conditions such as hyper arousal, anxiety, disruption cognition, memory, and mood. Trauma and stress -induced changes in acetylation of histones involved in the mechanism of PTSD is due to the up-regulation of expression-activity of histone deacetylases (HDACs). Therefore, several general and isotype-specific HDAC inhibitors are undergoing preclinical and clinical studies for treatment of PTSD. However, there is little known about the temporal dynamics of isotype specific HDACs activity in various structures of the brain that are involved in the mechanism of PTSD. Our previous studies demonstrated that PET imaging with 18F-TFAHA allows for quantitative visualization of expression-activity of class IIa HDAC enzymes in the brain (predominantly HDACs 4 and 5). Upon intravenous administration, ([18F]trifluoroacetamido)-1-hexanoicanilide (18F-TFAHA) accumulates specifically in in the n. accumbens, periaqueductal grey, hippocampus and in the cerebellum, as the result of increased HDACs 4 and 5 expression-activity in these brain structures. In the current study, we aimed to visualize the spatial and temporal dynamics of HDACs class IIa activity in the rat brain using a model of single prolonged stress (SPS). PET/CT imaging was performed before the SPS and on different days thereafter (days 1 and 7). Logan graphical analysis was implemented to assess changes in the magnitude of 18F-TFAHA accumulation in different brain regions. Consistent increased levels of HDACs 4 and 5 early after SPS was observed, so additional studies will be conducted to assess the efficacy of HDAC inhibitors (i.e., vorinostat, valproic acid) for therapy and/or prevention of SPS.
Monitoring HDAC class IIa activity in the brain after traumatic brain injury using noninvasive PET imaging with 18F-TFAHA
Abstract: Traumatic brain injury is a severe complex injury which is a growing public health concern around the world. In U.S over 150,000 military personnel are diagnosed with a form of mild traumatic brain injury resulting in wide range of neurological and psychological symptoms followed by long term cognitive disabilities including neurodegenerative disorders such as Alzheimer’s disease. The fundamental role of epigenetic regulatory mechanism involved in neuroplasticity and adaptive responses to TBI is gaining recognition. Previous studies indicated that trauma-induced neurodegeneration is associated with several epigenetic changes in histones, including aberrant acetylation, methylation, and phosphorylation. However, the spatial and temporal dynamics of expression-activity of different HDAC classes and isotypes in the brain at baseline, after TBI, and after therapy with HDAC inhibitors is largely unknown due to the lack of studies using non-invasive molecular imaging. Previously, we developed the HDAC class IIa specific substrate-type radiotracer 6-([18F]trifluoroacetamido)-1-hexanoicanilide (18F-TFAHA), with high substrate affinity and specificity to HDACs 4 and 5. Upon intravenous administration, 18F-TFAHA accumulates specifically in in the n. accumbens, periaqueductal grey, and hippocampus, as the result of increased HDACs 4 and 5 expression-activity in these brain structures. In the current study, we assessed the spatial and temporal dynamics of HDACs class IIa activity in the rat brain using positron PET/CT with 18F-TFAHA in a model of diffuse traumatic brain injury (TBI; Marmarou model). PET/CT imaging with 18F-TFAHA was performed before TBI, 24-48 hours and 7 post TBI. A significant decrease in 18F-TFAHA accumulation was observed at 24-48 hours post TBI in the hippocampus, n. accumbens and periaqueductal grey, as compared to baseline levels. At day 7 post TBI, the levels of 18F-TFAHA accumulation in the hippocampus, n. accumbens and periaqueductal grey were similar to those at baseline. The results of non-invasive imaging were validated by immunohistochemical analyses of rat brains
Epigenomic stemness signature associated with glioma molecular subtypes
Abstract: Despite the past significant progress, glioma diagnostic faces suboptimal disease classification which impacts patient management and treatment. Recent studies of genomic and epigenomic landscape of gliomas showed a distinct subtype of IDH-mutant, named G-CIMP-low, with loss of DNA methylation at SOX-family binding sites in intergenic regions, and overexpression of cell cycle genes, suggesting a stem cell phenotype. This subgroup was associated with poor survival. Alongside, a stem cell-like phenotype in various cancer is directly correlated with increased aggressiveness. We hypothesize that a stem cell signature can identify subtypes of glioma associated with poor outcome and improve our comprehension of glioma biology and classification. We analysed molecular profiles from 686 TCGA gliomas (532 LGG; 154 GBM) in order to score the features of tissue de-differentiation. We derived metrics to measure undifferentiated states in tumor samples to identify glioma samples with cancer stem cell phenotypes. We found that gliomas with higher de-differentiation scores were associated with higher tumor purity (p<0.001) and grade (p<0.001). Interestingly, by sorting tumors by their cancer stem cell phenotype we detected a strong association with the recently described glioma molecular subtypes (p<0.001). These subtypes are, in turn, associated with the clinical outcomes (e.g. progression-free and overall survival). The G-CIMP-low, Classical-like, Mesenchymal-like and LGm6-GBM samples, which have the worst outcome, presented the highest cancer stem cell scores. We defined a molecular signatures of cancer stem cells that enabled classification of gliomas and identification of tumors with stem cell- and progenitor-like phenotypes, associated with poor clinical outcome. Identifying the undifferentiated phenotype of gliomas cells can improve our understanding of tumor biology and may pave the way for novel diagnostics and therapies for cancer patients.
Lipidome profiles are related to preterm birth in African American women
Abstract: African American women are more likely to have preterm birth (less than completed 37 weeks gestation) compared with non-Hispanic white women. Depressive symptoms have been related to preterm birth. Limited data also suggest that lipidome profiles [e.g., omega(n)-3 PUFA, n-6 PUFA] are altered for pregnant women with higher levels of depressive symptoms and women with preterm birth. The goal of this study was to explore if a lipidome profile is related to depressive symptoms and preterm birth. A sample of 38 pregnant African American women participated in this pilot study. Women completed questionnaires and had blood drawn in the 2nd trimester of pregnancy. Plasma lipidome profiles were determined by “shotgun” high resolution/accurate mass spectrometry. Birth data were collected from medical records. Compared with women with term birth, women with preterm birth had had lower levels of GPSer-44:09 (glycerophosphoserine) and DG (40:4) (diacylglycerols) when controlling for maternal age and medical history (e.g., hypertensive disorders). Depressive symptoms were not related to lipidome profiles. The high preterm birth rate for African American women has persisted for decades, pointing to the need for new approaches. Lipidomics is a tool for discovery of potential novel biomarkers for preterm birth research.
Sublethal TCDD Exposure During Zebrafish Development Produces Multigenerational Testicular Abnormalities in Histology and Gene Expression
Abstract: The industrial by-product TCDD (2,3,7,8-tetrachlorodibenzo-p-dioxin) is a potent environmental toxin and endocrine-disrupting chemical (EDC) with known multigenerational teratogenic effects on humans, rodents and fish. A developmental basis of adult-onset diseases has been implicated following exposure to some EDCs. Zebrafish (Danio rerio) are an effective model system for investigating developmental and multigenerational toxic effects of EDCs, due to their short generation time, transparency in early development, and ease of developmental exposure. Previous work in this lab has shown that both structural and reproductive abnormalities (spinal deformities, sex ratio skewed toward female fish, body plan/gonad mismatch, and decreased fertility) were observed in young zebrafish exposed to TCDD. Reproductive abnormalities observed in subsequent unexposed generations (F1 and F2) were male-mediated, suggesting transmission through the male germline. We analyzed the testicular tissue of TCDD-exposed male zebrafish from all three generations, looking for changes in histology and gene expression that could account for decreased reproductive capacity. For histological analysis, spermatogenic cells were categorized by differentiation stage and quantified within seminiferous tubules. Statistical analysis demonstrated significant differences in certain spermatogenic cell types between exposed and control groups in the F0 and F1 generations, indicating delayed spermiation in directly exposed males and their unexposed offspring. Gene expression analysis of testis samples revealed altered expression involved in spermatogenesis, steroidogenesis, aryl hydrocarbon receptor (AhR) xenobiotic response pathways, and lipid metabolism in exposed fish in all three generations. Overall, differential expression of reproductive genes and reduced capacity of sperm cells to reach maturity could account for the multigenerational reproductive defects previously seen in TCDD-exposed male zebrafish and their offspring.
Epigenomic Landscape of Adult Lower-Grade Recurrent Gliomas
Abstract: Loss of DNA methylation is a molecular process that can distinguish a subset of primary IDH mutant non-codel gliomas with worst clinical outcome defined as G-CIMP-low (Ceccarelli et al., 2016). Here, we have evaluated the methylome signature retention associated with glioma recurrence by using a machine learning computational method known as random forest (RF). Using 163 G-CIMP specific CpG sites as defined by our previous work (Ceccarelli et al., 2016) the RF model’s sensitivity and specificity were greater than 90%. We applied the model across 75 publicly available matched primary and recurrent glioma cases (total of 183 fragments) from The Cancer Genome Atlas, University of California San Francisco (UCSF), and Yale University (Ceccarelli et al., 2016, Mazor et al., 2015; Bai et al., 2016). The majority of the matched glioma samples did not change their primary state epigenetic signature compared to the recurrent state. Interestingly, 21 recurrent cases (28%) shifted their epigenomic subtype classification. Out of 21 cases, 19 of which shifted from a G-CIMP-high phenotype at primary to either an intermediate state at first recurrence, or into a bonafide G-CIMP-low state at second recurrence. These cases shows a dramatic DNA demethylation profile that resembles an aggressive GBM IDH wild-type molecular phenotype. A de-differentiation signature/model derived from publicly available epigenetic data of both stem and differentiated cells demonstrates a significantly higher stem-like score in mostly G-CIMP-low recurrent gliomas. Whole-genome bisulfite sequencing of primary gliomas suggests structural chromatin reorganization at CTCF binding motifs in G-CIMP-low samples compared to other known IDH mutant (G-CIMP-high) and IDH wild-type (Mesenchymal-like and LGm6-GBM) subgroups of gliomas as well as non-tumor brain samples. In conclusion, our data suggest a new understanding of recurrent glioma progression via a remodeling of the epigenomic landscape and the knowledge gained may offer potential novel clinical biomarkers for prognosis.
A versatile new method to map uracil rich regions in the genome
Abstract: Uracil, which is a nucleotide base normally found in RNA, can also occur in DNA. One such mechanism is during cytosine deamination by activation-induced DNA cytosine deaminase (AID) during the antibody maturation process in the germinal center of B cells. However, AID off-targeting activity can result in mutations in non-immunoglobulin targets resulting in B cell lymphoma. Here we introduce a novel method, based on a biotin linker (SS-ARP) that can be effectively used to enrich the uracil containing regions of the genome.
In this study, by using a bacterial strain CJ236, we have optimized the conditions for the uracil containing genomic DNA enrichment and Illumina library preparation with the enriched DNA.
This method provides a direct and a convenient approach to map the uracil rich genomic regions and we believe that it would help in identifying the AID off-targeting activity in a genome wide fashion in B cell lymphoma.
Aneuploidy Testing in Pregnancy Before and After the Availability of Non-Invasive Prenatal Screening (NIPS) In A Predominantly Inner City Population: Is It Different?
Abstract: Objective: Our objective in this study was to investigate choices in aneuploidy testing among women referred for a genetic counseling, before and after the availability of NIPS in a single tertiary care center setting with predominantly inner city, low socioeconomic status (SES) population in contrast to most previous cohorts.
Methods: We reviewed results of clinical decision making of women referred for genetic counseling at < 24 weeks in a tertiary care center during 2011, prior to the availability of NIPS, and 2013-2015, after NIPS availability. Women seen in January-April each year were included in the sample.
Results: Comparing 197 women referred before and 357 after the availability of NIPS (total n=554), there were no significant differences in maternal age, gestational age, gravidity, living children, race or insurance type. There was no change in the proportion of women who proceeded with invasive testing before and after the availability of NIPS (19.3 vs 23.2%, p>0.05). After availability, 45.7% (163) of women had NIPS. The proportion of patients whose final test was maternal serum screening was reduced by almost two thirds. (72.1 vs 27.5%, p<0.05). There was no difference among women who declined all testing (8.6 vs 6.4%, p>0.05).
Conclusion: In our low SES environment, NIPS was a popular choice by women after positive maternal serum screening. Despite this choice we did not see a decrease in the uptake of invasive testing, a result differing from several other US studies, in higher SES samples. Counselor and demographic factors could potentially play a role in this difference and warrant further study. With its excellent sensitivity and specificity in high-risk populations, the increase in uptake of NIPS could arguably increase the detection rate for tested aneuploidies. NIPS provided further genetic workup for women who previously declined invasive testing and had no other options.
Factors Influencing The Decision to Proceed with Prenatal Invasive Diagnostic Testing: Seeing Is Believing
Abstract: Objective: Our objective in this study was to examine factors that predict the decision to proceed with prenatal invasive diagnostic testing following genetic counseling in a single tertiary care setting with a predominantly inner city low socioeconomic status population.
Methods: We reviewed results of clinical decision-making, along with demographic and socioeconomic data, of women referred for genetic counseling at less than 24 weeks gestation in a core city tertiary care center in 2011 and 2013-2015. Women seen from January to April of each year were included. To evaluate possible factors related to choice of invasive testing, including advanced maternal age, abnormal ultrasound, previous pregnancy outcomes, and family history while controlling for availability of non-invasive prenatal screening, characteristics of race, age, number of living children, number of fetuses, religion, and city of residence, we employed likelihood ratio forward stepwise logistics regression, calculating adjusted odd ratio.
Results: For the 554 women included, only abnormal ultrasound and private insurance were significant predictors of invasive diagnostic testing, respectively (OR 3.62, 95% CI 2.33-5.62) and (OR 1.68, 95% CI 1.08-2.06). The two significant factors were jointly additive (OR 5.10, 95% CI 2.70-9.67).
Conclusions: In this study, we found that regardless of baseline characteristics, patients who were referred for an abnormal ultrasound finding were more likely to choose invasive diagnostic testing. As in prenatal maternal bonding, sonographic visualization has a great impact on patients’ decision-making surrounding invasive diagnostic testing. Despite equal coverage of invasive testing by private and public insurance, people with private insurance were more likely to proceed with invasive testing. It appears that insurance status may be a surrogate of other measures, like education or perceived access. This finding would require further study.
RNA Sequencing Reveals That Sustained Inflammation In Sarcoidosis Is Linked To Dysregulated Phagocytosis, Oxidative Phosphorylation And Proteasome Pathways In Peripheral Monocytes.
Abstract: Sarcoidosis is a systemic granulomatous disease that results in significant morbidity and mortality, primarily from respiratory failure. The mechanism underlying the immuno-pathogenesis of sarcoidosis is not well understood. Exposure to unknown antigens in a susceptible host results in inflammation, characterized by activation of macrophages and monocytes leading to subsequent T cell activation. In the present study we identified the genes and pathways that may be dysregulated in the sarcoidosis via RNA-sequencing. RNA-seq libraries were prepared from mRNA isolated from the peripheral monocytes of 10 healthy controls and 10 sarcoidosis patients and sequenced on the Illumina NextSeq500. We found that 2446 genes are differentially expressed (DFE) in sarcoidosis monocytes as compared to healthy control monocytes (log2-fold change of 0.6 and adjusted p<0.05).DFE genes are enriched for phagocytosis, lysosome, proteasome, and oxidative phosphorylation pathways. Genes involved in lysosome, phagosome, and oxidative phosphorylation pathways such as TC1RG1, LAMP2 are upregulated in sarcoidosis monocytes. We also found that genes involved in monocyte activation and leukocyte migration pathway are upregulated in sarcoidosis monocytes as compared to control monocytes. However, the genes involved in the proteasome pathway such as PSME3, PSMD14 are found to be downregulated in sarcoidosis monocytes as compared to control monocytes. Thus, RNA-sequencing results demonstrate that sarcoidosis monocytes exhibit increased phagocytosis and lysosomal gene expression, this may result in the increased uptake of extracellular particles and lysosomal degradation. However, proteasome degradation is downregulated in sarcoidosis monocytes, this may result in the accumulation of intracellular proteins, which could be the cause of persistent inflammatory response in sarcoidosis patients. In future we will investigate the specific genes involved in different pathways and their functional mechanisms in relation to sarcoidosis. This dataset will be useful for further investigation of potential targeted drug treatments.
Enzyme Free DNA Digital Circuit Design: A Majority Logic Based Approach
Abstract: With the recent developments in electronics and nanotechnology, it is possible to design implantable integrated circuits for the treatment of different deceases such as sleep apnea, Parkinson's disease, epilepsy, gastrointestinal disorders, and providing correct dosage and timely delivery of drugs etc. The circuits used in these implantable devices are made up of silicon transistors. Recently, a great deal of interest has been shown towards molecular circuits, especially, nucleic acids based circuits as a replacement for silicon transistor based circuits for implantable medical devices. Different logic operations can be performed through the use of DNA strand displacement operations. In this poster, we are discussing a majority logic operation using DNA strands attached to an Origami substrate. The previous designs available in the literature use the diffusion of DNA strands to perform different logic operations. Such non-localized designs lack the scalability and have a limited speed of operation. We are giving the simulation of three input and five input majority gates in visual DSD and also discussing the generalization of majority gate for future expansions. The scaling up of the circuit is also possible with a dual-rail design. We are proposing a new AND-OR-Majority dual rail logic for localized DNA circuits. This is a new field and we hope this work will motivate researchers to think of more exciting Boolean circuit models and synthesizing tools that could be used to build enzyme-free DNA circuits and ultimately design computation and decision-making nano-circuits inside living organisms.
Identifying Signals of Local Adaptation in Transcription Factor Binding Sites
Abstract: Following dispersion out of Africa, humans inhabited a wide variety of environmental regions, varying in climate and available resources. Ensuing success, partly due to phenotypic changes, provides evidence of genetic adaptation to environment. Populations can adapt efficiently to a new environment through selection on standing neutral variation. In particular gene regulatory regions may adapt most quickly in these situations. This study looks for evidence of adaptation to certain environmental selective pressures in regulatory regions. Specifically, by integrating data on SNPs with signals of adaption to local environments, defined across 21 environmental variables (EV), with data on SNPs predicted to have an effect on transcription factor (TF) binding (effect SNPs) from DNase-seq data across 153 tissues. We found that binding sites for 67 TFs are enriched (p-value <0.05, after Bonferroni correction) for effect SNPs with selection signals for at least one EV, totaling 296 TF/EV combinations (~1%). Of the significantly enriched TF/EV combinations, EVs Humid Tropical Ecoregion, Foraging Subsistence, and Winter Radiation Flux were significantly overrepresented. Four main clusters encompass 35.79% of TFs enriched for effect SNPs with selection signals, with CREB1 and ETS1/ELK1-4/ETV1-3/ERG clusters most represented. For 9 EVs, TFs enriched for effect SNPs with selection signals were also predictive of response to environmental perturbations in cellular assays. For example, TFs with selection signals for Summer Radiation Flux (Nhp6a, E2L1) are predictive of cell gene expression response to Vitamin D. Additionally, 19 TFs enriched for effect SNPs with selection signals were also enriched for GWAS signals. Interestingly, TFs (ATF, ATF2, STB4) whose binding sites are enriched for selection signals related to plant and cereals based diets and for SNPs associated with blood cholesterol levels. These results provide evidence that adaptations occurred at TF binding sites in response to environmental selective pressures, causing changes of gene expression and possibly organismal responses.
Cataloguing exonic benign and pathogenic mutations by in situ saturation mutagenesis in mammalian cells
Abstract: Clinical whole genome and exome sequencing are discovering vast number of genetic variants among human populations each day, however our ability to interpret these variants is limited. Only a small number of causal variants that lead to Mendelian diseases are well characterized. To address current limitations, we aim to develop a reliable yet affordable system that can functionally characterize all possible single nucleotide variants (SNVs) of a given gene with one single massively parallel assay. Shown here, we have successfully generated targeted ssODN allelic series by amplifying from array synthesized oligonucleotide libraries via in vitro transcription and reverse transcription (ivT-RT). Using CRISPR-Cas9 genome editing system, we were able to create in situ saturation mutagenesis in native genomic contexts with high efficiency. Targeted deep sequencing and data analyses demonstrated that we could separate pathogenic from benign variants by functional selection. Future work includes continuing optimizing the assay, testing scalability, constructing a bioinformatic analyses pipeline, and extending application to regulatory regions.