Bold Predictions for Human Genomics by 2030: Session 3

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon welcome to the nhgri seminar series on both predictions for human genomics by 2030. uh you can find more information about this seminar series at this website genome.gov slash both predictions okay let me see move forward here we go uh so last year we published a 2020 strategic vision um you know this document was designed i mean was completed after extensive consultation and discussion with many people in the field including the two speakers today and overall the vision is for improving human health at the forefront of genomics this is now supposed to be encompassing the entire uh human genetics and genomics field but mostly focusing on what nhgri will be doing as you know predicting future is risky as a matter of fact most of the impressive genomic achievements in the history when viewed in retrospect could hardly have been imagined 10 years earlier but still it's fun to make predictions and so in this document we had 10 bold predictions for what human genomics will be in 2030. um most of this probably will not be fully attained but this is supposed to be an inspirational and aspirational document to you know inspire people to strive for something that's not possible today and also provoke discussions on what might be possible in the forefront of genomics so to kind of unpack or expand those one sentence both predictions and not also start discussions a seminar series was designed mostly in credit of chris gunter who is here with us and uh so it started in february um so today is the third installment of this seminar series this will run through june 10th 2022 and again you can find all the information about these seminars on the website the format for each seminar are two pair you know two speakers each gave 25 mini talks followed by moderate discussions and then question answers from the audience by the way please feel free to submit your questions through the question and answer button please don't use the chat button and these questions will be answered by the end of the talk but you don't have to wait until the end so again today's talk is regarding the third both prediction the general features of the epigenetic landscape and transcriptional output will be routinely incorporated into the predictive models of the effect of genotype from phenotype and we have two fantastic speakers tom gingeris and julie laplanin delta gingerus is professor head of functional genomics and cancer center member at coast spring harbor laboratory he received his phd from new york university followed by postdoc research and staff scientist appointment at coal spring harbor he then moved to west coast initially with a position in salt institute and then went to biotech companies before returning to coal spring harbor in 2008 he was the vice president of biological sciences at ap metrics his current group studies where and how functional information is stored and regulated in the genomes and these efforts help explain the biological and clinical effects of disease causing gene mutations in humans and other organisms he has been the leader in and in cold mountain called modern cold products of an age dr lavalanin is an associate professor at in university of columbia university and also a core faculty member at the new york genome center since 2014 she received her phd from the university of helsinki uh finland followed by post-op research at university of geneva switzerland and also stanford university she has pioneered the integration of large-scale genome and transcriptome sequencing data to understand how genetic variation affects gene expression providing insight to cellular mechanisms underlying genetic risk for disease her research focuses on functional genetic variation in human populations and its contribution to trees and diseases dr levilanian has made important contributions to several international research consortium in human genetics including 1000 genome products and gtex project actually as a matter of fact next month dr leveling will be beginning a new role as the director of sci life labs national genomics infrastructure as well as the full professor in genomics at the kth royal institute of technology in sweden so i believe tom will be the first speaker um the podium is yours let me um stop sharing here and so you can start your slides okay thank you paul thanks for the introduction and the opportunity to take part in this program of bold uh predictions i'd like to say at the outset that the there's a relatively high bar been set by the previous four speakers and uh and i hope to be able to match that anyway so let's begin the um the bold prediction number three as read by paul basically states that the general features of the epigenetic landscape and transcriptional output will routinely be incorporated into the predictive models as they impact genotype on phenotype this is as you look at this the details of this prediction it's clear that it's bold but it's also clear that it's somewhat daunting specifically it's composed of three independent uh components each of which have several areas of challenge and it's the it is my intention and uh gold for this presentation to focus on these challenges as a means to move beyond this particular uh bold prediction into others the first of these areas uh which there this prediction is composed of is the collection analysis of uh personal genomes and by that i mean uh the generation of phased biolic genome sequences it also consists of a collection of relevant transcriptional and epigenetic profiles ideally using long read sequences analysis and to gather both sequence and sequence and modification data at the same time the second feature the second interdependent component is the use of predictive modeling that approaches this this data set integrates it and begins to look for relationships with known pathways such that the outcome is a proposed phenotype and the third uh component of this uh long of this bold prediction is um as mentioned by a variety of our other previous speakers is that the phenotypes that they are going to be detected or predicted will in fact uh occur at many many biological levels which we'll discuss in a few minutes but what exactly what exactly do we want the uh the outcome of this bold prediction to look like what is the goal of this uh this prediction in a sort of substantive way in that goal is in the ideal situation samples from a symptomatic or asymptomatic individual is obtained and is of uh and can be obtained either from an anatomical source namely one of the organs or a a source that's easily accept accessible which will then serve as a surrogate for the affected uh organ or tissue this sample will be used uh will be used to gather the genome sequencing of the provider and provide also information in terms of the transcriptional profiles and epigenomic profiles these data will then be analyzed using computational aggregate algorithms uh to determine how the sum total of these data point to one or more genomic variants as the cause of the anomalous transcriptional or epigenetic phenotypes that may be contributed to the complex phenotype the the um the this this uh set of goals really has several unresolved and unappreciated challenges for both precision medicine and precision genomics and it they will also lead i think to additional kinds of opportunities for bold predictions the the unresolved and challenges and the unresolved and unappreciated challenges uh concert collecting um uh collecting transcription epigenetic data by by many consorts here these consortia have had a long-term interest in collecting basic data as to the uh functional areas of the genome and how they're regulated that they include uh encode gtex roadmap and and antex the the con the consortium efforts uh are also uh these consortium efforts also have been in the business of looking for genetic um causative mutations now what the union of these two kinds of efforts is currently under uh undergoing and it's this is uh the uh this bold prediction really serves as a way in which to bring together this these two uh very important efforts now in light of these efforts um and in light of these goal uh these resources that have been collected over the years and the goal laid out by the prediction number three many challenges have emerged that need to be addressed if this prediction is to be actually realized and what i'd like to go over is a brief summary of the challenges that have emerged uh uh uh upon thinking about what is entailed in this old prediction first of all the identification of generic variants giving rise giving rise to um phenotypic results uh as measured by changes in the level of expression and epigenetic modifications uh is a real challenge and has been a challenge for a very long time the um the the second challenge is that that there are the fact is that many genes have multiple functionality and there and they also have multiple isoforms some of which are are responsible for the different functionalities of that gene the third uh challenge is that this there is an increase availability of having normal tissue don't us normal tissue available for analysis and study this includes the brain heart kidney things that which are not easily accessible in the normal individual in order to study a baseline of profiles that will in fact constitute what is normal uh the definition of the sheer definition of what is normal is actually also important because at that it is that state uh which is likely to be a range of states it is that state which we're going to consider how to evaluate the data that will collect both from genomic and epigenomic states the uh for the fifth uh variable uh point uh challenge is the differences in the transcription and epigenetic profiles that exist in samples that have been analyzed from living in postmortem samples there is a considerable amount of data particularly when it entails analysis of difficult uh accessible tissue that come from post-mortem studies and it's this challenge i'd like also to talk about and finally the environmental influences on somatic uh epigenetic changes and uh and the pathways that lead to those changes uh as caused by the environment is quite is quite important and and while it has been a subject of considerable interest for a long time the uh the processes involved in this is still quite unknown the let's so let's walk through these challenges that i just enumerated uh briefly and talk a little bit about uh each one in order to get some clarity as to what what is meant the first uh the first of these um uh challenges was basically the identification of the variants that give rise to uh to finish uh phenotypic variation and the depiction on the slide here is of two genic regions uh uh where mutations have been identified and they they have been identified as uh present in um in in the gene and also identified as a site where uh epigenetic modifications are important in the expression of that gene this is important because in the um in the two genes that we have here there are two points i'd like to highlight first the phenotypic effect of genomic variation is dependent upon knowing where to look beyond rna expression and epigenetic modifications and where in what biological level is this is this phenotype likely to be exhibited and if that's not known then in fact the variation that we see is dependent on making predictions alone rather than actually having physical uh results to fall back on the second point that this this slide is intended to identify is that that complex phenotypes are often um caused by multiple genotypic changes and although there's only in these examples only one site cited most of the uh challenge that's going to be uh facing the the future uh um the future accomplishment of this uh old bold prediction is to be able to identify all related uh uh changes that in fact contribute to the uh phenotype of interest the second challenge is basically the issue of multifunctional role of genes as as uh is complicated and it is uh it is the uh determinative faction factor that these that some of these multifunctional elements are affected while others are not one feature of this challenge is that there's oppression a the presence of expression levels in for of different isoforms and different cell types so the same gene can have obviously different isoforms but those isoforms can in fact vary in their expression level depending on what cell type it is investigated the most most novel isoforms are expressed at fairly very low levels uh they're somewhere between ten to a thousand fold less uh uh lower than what the major isoform is but but the fact of the matter is that roughly 43 percent of the express uh uh 43 of of um genes that have multiple expressed iso forms in fact uh have uh these lower expressing isoforms as the major expressed uh isoform in many other cell types so it it makes it somewhat arbitrary to say that there is a predominant isoform because it's very much tissue dependent and it may be that is those isoforms lead to other uh uh different um phenotypes i wanted to now uh address the issue of what is normal because because the re several of the features that we'll discuss challenges we'll discuss later will depend upon getting a sense of what is operable what is normal in in each uh cell type or each organ that we're investigating phenotypes can occur in any of these biological levels from the level of the protein being made up to the level of subpopulations where environmental influences uh uh have effects on the overall expression levels and the phenotypes that are present so if the individual phenotypes will be different and will be different in each of these biological levels then then we have a task into in trying to understand which of these levels we're going to use in order to identify the effects of major mutations finally you can determine the range of expression and the loci of epigenetic modifications and genes of interest as part of this baseline this part of this normalcy and that's going to be important because in many of these instances it will be alterations in the levels of expression and the position and presence of modifications the there is the uh the idea of normalcy comes into play uh when uh when asking uh oneself where are you going to find normal samples it is true that in the the nih and many other funded agencies funding agencies have made a great progress in providing samples for a variety of different study uh types of studies but the fact of the matter is that many of these studies and and sample collections deal primarily with specific disease states and the idea for normal controls in for these disease states is uh at least only part of the the uh collections that are being uh brought together in addition uh uh of this there are many uh centers many surgical centers in the united states most uh most fairly large hospitals have such uh surgical centers and the these surgical centers uh are routinely operating but and on individuals in which uh normal tissue is part of the uh resections are part of the normal uh procedure but those those normal tissues are removed and and usually just discarded in this this leaves us a a a resource that is really untapped and one thought is that many surgical centers uh could with in incentive being incentivized by nih uh basically look to prov uh keep these normal tissues and make them available uh either to a central resource depository or to make the uh uh these resources locally available to those who requested the the bottom line is that that these are valuable resources to provide a baseline understanding of what uh diff what each gene and what each regulatory element in those uh in that genome is uh how it's operating and so you could think of this as sort of leaving no tissue behind in the um in the next the last uh challenge that we that we're talking about today is the differences that trans at the transcriptional and epigenetic profile obtained from living and post-mortem individuals in a set of studies that we have recently published uh what we did was to look at the uh the performance or the expression levels of uh all of the genes of the entire genome of several individuals about two dozen and in in these two dozen individuals half were actually patients undergoing epileptic treatment surgery and in that surgery the the normal tissue is unavoidably removed as part of the treatment that represented a opportunity to look at gene expression in those tissues and compare it to individuals who had uh who had died and and had donated their their tissues for analysis and all of these individuals were then uh examined both at the rna expression level and epigenetic level you can see from this slide that the expression levels of of these of these genes look very similar in the top four uh samples where there's quite uh fresh samples half of these in each of these panels is composed of uh individuals who were deceased and which are in red and half of which are came from living donors and in the case of housekeeping genes which are the first upper uh panels a to d those genes are almost identical in both living and uh deceased individuals in the case of the post-mortem uh samples the uh these there were about 2 000 genes that in fact were affected and differed between the two states this is also true when you look at uh the rna editing at the uh the three three three and five prime utrs there is an appreciable difference in the genes that are not housekeeping genes these variations that we see uh for the most part are not only a loss of expression uh presumably due to degradation of the rna but also by uh some genes that in fact remarkably increase their expression and and the cells are remain quite viable uh and uh and it's these uh these genes that often are uh will will uh somewhat affect the outcome of uh uh understanding what the um what the effects uh what the behavior of certain brains uh sighted genes are so it is important i think at the end that we understand that the selection of tissues not only be normal but not come from a state which is very challenging to a very large number of genes if they come from postmortem individuals finally the um the uh the environmental influences on somatic epigenetic changes is a a well-studied uh area and the signals and pathways leading to genomic specificity as to where these modifications occur after exposure to environmental conditions is really an area that's quite challenging the mechanisms the pathways and agents that are responsible for identifying the locations and the type of modification that goes there is still under a uh understudied and very valuable area of study this is a challenge that will in fact require a variety of approaches in order to solve now having gone through these um these challenges i'd like to actually suggest that these offer opportunities to see additional progress come forward and i'd like to go through that for some of these uh challenges for example the the the challenges that we talked about in terms of uh in terms of things that could perhaps be approachable by 2 2030 consist of uh issues like the identification and it and engineering of uh gene iso isoformic regions of genes in a cell type specific manner and and and these genes being selected by uh being clinically important uh during development or during a disease state this this uh this prediction of what we could do could then provide a fundamental understanding of how how different isoforms are function and how they actually uh operate in a normal metabolic or in a diseased state we the another prediction is that in in light of the need for a larger access to uh tissues that are nominally normal one could suggest that nih the nih-funded tissue collection mandate to mandate to request all participating medical centers to con contribute normal tissue that are our consent for genome and rna and epigenetic sequencing and use these data to better define what normal is and then the last prediction is that you could the uh identify genes whose expression profiles of all of all cell types and organs that are affected by um uh put the by postmodern conditions uh all that the uh the use of these data in fact should be uh corrected and in doing so what will provide a a better way in which to understand uh how how they operate in very specific types of tissues in the the bolder more more bold uh versions of predictions come in the form of two the identification of all causative genetic variants giving rise to changes in levels of expression and epigenetic marks by identifying populations within the normal expression profiles and that the location in cell types in the location that is to say cell types and organs of all express coding and non-coding genes and to do this the prediction is that one could take advantage of the ongoing and developing work of in vivo sequencing and chip analysis uh and could provide a level as to where these uh uh where these uh variations are occurring and where these phenotypes uh can be seen and finally the the prediction that uh is uh um involves the uh environmental influences on somatic epigenetic changes could in fact uh come in the form of identification of cellular signals and pathways leading to the specificity that is to say uh which ones are which modifications are occurring at which sites that uh that goal could in fact be approachable by the development of markers that is to say rnas proteins lipids from easily obtainable biological samples rather than samples that would have to require surgical intervention but use them as surrogates act as surrogates for markers that you you would like to study in less attainable organs or tissues now these these tissue these uh uh predictions and these challenges i think offer an opportunity to think ahead and to think of ways in which we could uh move the the uh uh uh this the fields forward if we in fact uh could uh achieve some of these uh uh predictions it's important to note that not all of these uh bold predictions that are at the end of this presentation cut require novel uh novel technology or novel inventions many of them require a decision and a commitment to uh in fact uh provide resources that uh would be helpful in solving some of the challenges that were uh discussed so i i'd like to end by acknowledging uh the my colleagues uh both at cold spring harbor and uh at harvard and uh yale who have contributed to the data that was used in and generated in these in these studies that i mentioned and and mostly for the ideas that uh have often been uh treated among all of us as we think about the data the massive sets of data that we collected in these different consorts here so thank you for your attention and i look forward to answering any questions that the audience might have thanks a lot tom i will hold the question and answer session until uh later so um maybe you can stop sharing the screen and 2d it's the next speaker all right hi everyone good afternoon and uh thanks for thanks for having me here as a part of this very exciting uh seminar series so um just to get to get kind of like right right into this uh when i saw this this uh prediction and when i was asked to to talk about it i first started thinking that there is multiple interesting premises that are baked into this this uh statement that i'm not going to read because it's long that i wanted to kind of dissect uh today and discuss whether these are true what what do we know about this and and how do we actually make this prediction a reality so the the first kind of um um sort of something that is implicit in this in this um prediction is that it talks about predictive models of the impact of genotype on a phenotype but also suggesting that we will need uh epigenetic and transcriptional data to make this work so that's basically implying that genotype data alone will not be sufficient to predict physiological disease uh phenotypes in in humans and and that's an interesting proposal that that i'll inspect a little bit later and then again if we are saying that we want to include some other phenotypes not just genotype and phenotype that we we need other layers of of biological data then uh that transcriptome and epigenome data would be those data types that would be informative are useful data types here uh implying that they are at the very least correlated with genetic variants and physiological traits and potentially even mechanistically mediating those genetic effects on disease traits and then uh the third aspect is that to say that this will be routinely incorporated into these predictive models implies that we would be able or maybe we are already able but but at least in the future that we would be able to measure these molecular phenotypes at sufficiently high scale and precision for these data to be actually useful so so i'll be discussing what's what's the current data supporting these premises what are some of the other key insights that we have learned and how do we make this prediction into reality and what are the other components sort of like around this this pro topic that that need to be um kind of where we need to push as a community to make this this happen and tom already touched on on some of these these uh points but i hope to expand on on some of the aspects so when it comes to uh the prediction of phenotype from genotype especially the complex trait um space has many fundamental challenges that we are now very well aware of as a field so of course now um after uh whatever 15 years of chi chi was we know that uh the heritability of of complex traits is distributed in teeny tiny genetic effects across the genome and that these variants actually account for just a fraction of the phenotypic variants in complex traits even though i mean it's probably this is an nsgri uh seminar series i'm a geneticist we love to think about genetic variants but it is not all uh that matters in complex traits and then we also know that these um the most of chivas heritability is in non-coding regions of the genome with likely regulatory functions and and and the sort of interpretation of these these uh variants has been quite complicated and in fact if we would start to think that we would want to have like the perfect in silico interpretation prediction of the functional and phenotypic effects of of these non-coding variants uh that would actually require pretty much perfect knowledge of cellular molecular biology and genome function and we are very far from this when we think about that you would just see a variant and you would say that okay this is it affects the binding of this kind of transcription factor and the enhancer activity in this way and leads to this fall change um um effect in expression of a nearby gene and perturbs this pathway that then changes the cellular function that leads to some physiological function we are extremely far from this and we're not going to get there in in 2030 um uh alone so so that's sort of like just sort of like a black box prediction of just taking genotype and getting to phenotype i don't think that it's going to work we do need those additional data sets and and insights that that i'll i'll talk about today and then why do we care about these predictions anyway of course there is the the sort of the big goal of of pretty much all biomedical research of of being able to provide better diagnosis and treatment to to individuals uh who suffer from from uh some disease and the traditional medicine paradigm of course be is that you basically have the phenotype data and then you infer uh make some inference of what would be the appropriate diagnosis and treatment the precision medicine uh paradigm at phenotype and sorry a genotype and environmental data to this to to hopefully provide better uh diagnosis and treatment and then something that we i guess could call precision molecular medicine or something has also incorporates gene regulatory readouts either chromatin state or rna sequencing gene expression etc to to have an even better insight into what is what is going on and what can we do about it and so what is the status quo here and does molecular data this kind of precision molecular medicine framework actually work so in in the rare disease space we are in a situation where the class is kind of harmful when it comes to genotype to phenotype predictions so exome external genome sequencing can now lead to diagnosis in about like half of rare severe mendelian disease type of cases so that's that's fantastic like this is extraordinary success and has absolutely changed the lives and save lives of of many many people but of course fifty percent is is not a hundred percent the class is still half empty uh for various reasons regarding like detecting some more complex uh structural variants and then having more more complex genetic architectures but also identifying variants as disrupting gene function or dosage um not every disease-causing variant is sort of like a stop co premature stop codon variant that we can annotate quite easily and it can be quite complex and here um there is pretty decent data that rna sequencing can help and the basic problem is here is such that to be able to really do genetic diagnosis in a rare disease um situation we basically need to have two things if we think about that the sort of the situation that works quite easily nowadays is that you can identify just based on the good old genetic code what is the gene disrupting variant in the coding region and then you can also kind of put that in the spectrum of population uh variation in that gene uh tom referred to many times to the kind of that we need to understand the normal to be able to understand disease and that is absolutely uh what is the basis of these rare disease studies um but when it and that would then help you to say that okay this patient having this kind of a variant that is an outlier in the population likely or or at least potentially contributes to disease but when it comes to variants that affect gene expression or other traits of of related to gene regulation first of all it's difficult to identify those variants and it's also difficult to sort of really have a sophisticated framework for for kind of like what is the spectrum of normal variation in terms of let's say gene expression and here splicing analysis has been one of the early cases of of success where um in rna sequencing data it can be quite clear um um or relatively straightforward just to see that there is actually an aberrant splicing pattern in a patient that is absent in in a number of controls and this may help to identify for example uh intronic variants that that would be quite quite sort of obscure just based on genetic data alone we us and us and others have also pushed this uh further in in terms of identifying variants that may affect our gene expression levels uh in in such a way that we used healthy population rna sequencing data from gtex and allele specific expression analysis to really sort of for every gene to draw those spectrums of of how much this gene's expression varies in the normal population for genetic reasons and then one can go to a patient and actually put the patient in the spectrum of of that normal normal population distribution and identify uh outliers and we've shown that this this this can really have a high specificity and sensitivity in in muscle dystrophy and myopathy patients and now we're working on applications in congenital heart disease and als um as well and this framework together with with others have been incorporated into analysis that really tried to use many different types of transcriptome readouts to better interpret rare variants we were part of um analysis using using the most recent gtx data set looking at a healthy population cohort rna sequencing and genotype uh whole genome sequencing data from multiple tissues looking at at different types of effects that have genetic variants can have on transcriptome traits and in a very interesting preprint that just came out i think a week ago it was shown that transcriptome data can give you a 16 boost in your diagnosis rate over whole genome sequencing detecting many different types of of perturbations and i think that these these examples and these insights are proving that transcriptomic data is already useful in in clinical genomics and then this this bold prediction is is already becoming true in that space however in complex disease prediction um unsurprisingly the situation is more complex so um thinking we think about sort of um disease or phenotype prediction in complex disease of course they they um the main method that is now being being used or or studied is polygenic scores which have a lot of promise but there is still a lot of question marks in terms of their clinical use do do they work when do they work when is it good enough to be actually medically meaningful and one very major problems is this problem is various biases that this is this course can have in terms of their their so transferability for example across ancestries and then also other other groups and there has been some exciting uh new new research showing that some of these biases can potentially be overcome by by overlaying genetic associations with regulatory elements thus getting better better um kind of insight into the causal variants and avoiding some of the biases caused by linkages equilibrium and i think that there is there is a lot of potential there and i think that going forward the idea that incorporation of tissue and cell type specific functional information into polygenic scores could potentially help to partition complete complex disease risks to these distinct components we think about most complex traits if you take let's say type 2 diabetes that can be caused by by kind of dysregulation or misfunction of many different organ systems and being able to sort of partition different individuals disease rates in terms of like you have a problem with your lipid metabolism you have a problem with your insulin metabolism etc could have a lot of a lot of potential but we're not exactly there yet but uh let's see maybe by by 2030 i'm i'm sure there is also a very exciting um area at least in my my opinion in in terms of using um these molecular phenotypes to incorporate genetic and environmental risk as i mentioned a few slides ago heritability of complex traits is far from 100 there are major environmental effects in complex disease and if we are actually able to incorporate those risk factors into the same framework as genetic factors this can be very powerful i mean the whole um sort of idea of using using genetic data to develop drug targets is based on the same paradigm that genetic and environmental risk factors are partially mediated by the same molecular pathways and and this and and here transfer transcriptional and epigenomic readouts can really help because they should or could capture both types of effects unlike genetic data alone and this could be one of those things that actually makes this this prediction that we're talking about reality um there are some interesting early early studies that that have some promise in terms of showing that rna sequencing data can inform in an upcoming flaring rheumatoid arthritis actually driven by a specific cell type um and and also looking at case control differential expression in chivas genes where a large part of the like the differential expression that that is seen it's much too big to be driven by the genetic variance there are some other factors as well that that tried this so so there are there are attempts there are interesting sort of frameworks that are being developed but much more data is needed but i think that there's also major potential in terms of leveraging the the ability that genetic data has in in pinpointing causal disease mechanisms and then thinking about an environmental component that is a modifiable component of of disease risk and thus be able to potentially develop better in interventions and tom talked about this at length so i'm not going to go super deep into this but but i think it's it should be clear to all of us that that to understand disease we really need data of what is the normal and these kinds of resources that that um the genetics and genomics community has has been building have enabled a vast amount of the studies that we now often kind of take for granted there would be no chios without have map and no whole genome sequencing studies without a thousand genomes and exact empowering rare dis rare disease studies gtx human cell atlas etc um really building that foundational understanding of the regulatory genome and there are much there's much much work that needs to be done uh in this space to to just create more sophisticated data of various types of molecular functions that that vary in human populations and and and and thus empower specific disease communities to use this data to explore specific questions however there are some major major issues that we really have to address as a community if we want to use these these resources to their maximum ability one of them is that population diversity captured by these resources is very limited at the moment so so there are like a thousand genomes of naturally explored uh global populations uh kind of like a couple of handfuls of them gtx has some has uh captures kind of like the average american diversity but but we are far from being able to really really understand um or like characterize population diversity in in functional genomics data and and as uh the g was and genetics communities now very fast building bigger and bigger resources in in terms of genetic variation across the globe and then its contribution to diseases we need to make sure that those functional genomics data data sets are also there to help to uh interpret and analyze these these data sets and a related question is that data availability data dissemination integration visualization is a seriously difficult problem for functional genomics data because these are kind of messy and hazy data sets in a way that that terminal genetic variation is not uh with major sort of batch effects and other integration issues but but unless we are actually able to bring these datasets together and disseminate those into the community and make them sort of available across across different different consortia we are really sort of shooting ourselves in the in the food and not be and not being able to leverage the power of these these resources so this is also a major area where we must uh invest as a community um but also another area that that tom also referred to is that we just simply need more data we need to scale up the sample sizes in terms of multi-ohmic uh data sets especially in the complex disease trade space but i'm thinking more sort of like normal populations basically whatever uh if we have learned anything from from the history of human genetics over the past let's say 25 years is that uh in the early days of chiwas etc on the discoveries a little bit so and so a lot of struggles but when the sample size became sufficient where there was actually good statistical power amazingly discovery started to emerge and when we think about molecular uh data sets at a population scale um studies like gtx etc while they are big they are far from from those kinds of samples of tens of thousands even hundreds of thousands of individuals to really make well very well powered uh in fact and there are some some attempts to fix this uh chocolate is producing a lot of a lot of um rna sequencing data mostly from blood samples uh in my lab we have been working on this and this is an interesting project whether we're now wrapping up where we have tested four types of non-invasive uh samples to do uh rna sequencing using sort of a low-cost smart sig-2 uh protocol that is that is that was initially kind of developed or or is much used in single cell space and and here uh we've collected uh hair follicles saliva buccal swabs and urine from from a number of donors and then done rna sequencing on these and the exciting thing is that compared to just sort of standard hex cells top-notch rna from from cell line we can actually get almost comparable data from hair follicles and from urine despite the very small numbers of cells and and especially in urine the starting material is low but with these modern library methods we can actually get excellent quality rna sequencing data from these samples that capture cell types that that these blood samples that are typically collected cannot capture and we can we have shown that the hair follicle of cell types and that data is very closely related to skin and in in urine and and buccal swaps we get sort of mucosal uh uh tissues that that makes sense there's also some kidney signal in urine and i think that there's a lot of potential in these types of sample types but also obvious technical challenges buckle swaps sometimes work great sometimes not saliva is absolutely terrible this is very kind of just uh example of that this is this is not always always something that works it works easily do not try uh saliva or nasik at home one of the one of the challenges thinking even further ahead is that eventually we may need to push these types of data sets to single cell resolution and actually think about how to do single rna sequencing in thousands and thousands of samples from from uh disease or trait relevance some informative uh biospecimens um these types of non-invasive kind of swab and poke type of samples that we have been taking thus far is not going to give us sort of brain molecular phenotypes etc but this is an area where we clearly need to invest yes as a community but uh just to kind of switch gears a little bit for the last couple of slides i want to basically make the point that prediction is really not enough even if we have the perfect black box to predict variant to phenotype we would still want to understand those mechanisms i think i have made a relatively compelling um argument that that this kind of black box prediction of variant to phenotype is just simply not going to work especially in complex diseases it's just yeah we will not be able to build that box and also we are scientists we should be interested in in mechanisms and wanting to understand how and why certain genetic variants uh somehow affects molecular and cellular functions in a way that contributes to disease phenotypes and then if we want to actually develop interventions drugs and other types of interventions to to do something about this then we need to understand those mechanisms and luckily we have a very rich data set to pursue different layers of of mechanistic questions in terms of what are the causal variants how do they affect let's say transcription factor binding enhancer effects what are the target genes in sys what are the target genes and pathways and networks in trends what are the relevant cellular types and states and then even further to towards sort of physiological phenotypes and similar functions um and and i think that there is it's just going to be an extremely exciting time for us us using these different types of approaches using uh these kinds of large-scale multi-ohmic data sets uh that we have been building continuing on that but then also incorporating that with experimental perturbations of the genome and its function with with uh tools like like crispr and i really strongly believe that no approach is going to be a silver bullet all of these approaches have their own unique advantages and disadvantages and it's only only with integrated approaches that we can really build a good understanding of genome function and i want to mention you um a quick example of this title work and this is from a very recent preprint uh from from a collaboration with neville sanchez labor led by our our postdoc john morris where we basically took uh plot trade she was integrated that with encode etc data of of potential regulatory elements fine mapping and then did crisper eye inhibition of of those purity regulatory elements and then the single cell rna sequencing to see um which genes are affected by by uh silencing of these these uh genetic elements where uh uh potent potentially causal archie was very undeciding and in terms of um identifying the cis target genes in in these loci we were actually or i was personally surprised by how well this worked for 42 of the loci or the or the variance that we tested we actually discovered this a significant gene in in in cis and the vast majority of these these loci lacked an eqtl signal showing that we're really discovering something that is complementary to the data that we have had before and a particularly exciting example for us was to was to see that in addition to just capturing those of those uh target genes in cis which has been a major challenge for chi was we can also get to the more complex question of uh affected pathways and uh for this um very interesting locus um where we have a gfi 1b transcription factor we actually had two g was loci one in an intronic and one in a in a downstream um enhancer that both affected uh the expression of gfi one b and cis and then they both also had a major effect on gene expression across the genome with uh the stronger enhancer having hundreds of significant gene targets um across the genome and those target genes were actually organized in a network that that had sort of three three specific clusters that seem to um represent different uh sort of functional functional components with one of the clusters representing more that the direct targets of of this transcription factor and then another cluster the cluster c here um seems to be um has really has something to do with him biosynthesis uh which is consistent with with this transcription factor having a major role in in blood blood traits and and uh studying uh specifically uh blood blood trait uh gus and an exciting thing was that that we had um very specific uh enrichment of of chivas hits so independent g was hits across the genome in the target genes of this gfi 1b um uh chivas locus and this is suggesting that there may be this kind of like convergence of independent geos effects for the same traits in specific uh cellular uh pathways that may then be particularly interesting in terms of the the cellular biology behind behind the traits that are being studied so to wrap up what does the future look like i believe that with these kinds of approaches uh and addressing the challenges that both myself and tom have talked about i think i think that we can really incorporate molecular traits uh as a part of precision medicine and and improve our our understanding and also treatment of personalized disease risk that is driven by genetic and environmental factors we will be uh developing deep insights into molecular and cellular etiology of human traits using both these sort of observational population studies and also perturbation studies and experimental uh tools and and all of this requires that we really build a sophisticated toolkit for highly informative in silico inference so that we can have as accurate sort of priors and predictions um as possible in terms of observing a variant of interest or you have a chiwa study and then your chivas loci and being able to sort of have good sort of predictions of what could be the functional mechanisms that are that are being perturbed or the functional effects of these variants and then also you have a sophisticated toolkit for experimental follow-up of these discoveries and with that my sort of the small additions to the prediction would be that that uh in addition to using the features of epigenetic and landscape and transcriptional output to understand or to predict uh genetic effects on phenotype i would also want to understand genetic and environmental effects on on phenotype and thus build a holistic understanding of of the diversity of human traits and the underlying genetic and molecular processes and one of the um sort of venues or places or organizations trying to do this is the international common disease alliance that was launched somewhat somewhat recently where um us especially in the mechanisms working group are really asking the same types of questions that that we have been talking about uh today and with that i'd like to um thank many people in in my lab uh current and former members um my collaborators in various consortium projects uh other collaborators and and icd colleagues and and sources of funding thanks very much thank you very much julie and yeah both of your presentations are you know very impressive and um as tom mentioned this is uh my impression is it's quite challenging to fulfill this both prediction and even daunting and it seems like it requires multi-discipline approach rather than just to you know to to to you know sequence the genome it's just one kind of one-dimensional that's a dna sequence but here you're talking about so many different levels and so my quick question would be you know do we need a new technology new approach that's you know several orders of magnitude faster and more comprehensive because for each person you cannot just do one sequencing you have so many different tissues and so many cell types and so many level of sequencing is do and um by software sequencing you know uh whatever so a lot of levels so i'm just wondering you know using the current technology can we get it down you know in 10 years or even you know 50 years well if i if i start i think that in terms of sort of um assays to analyze molecular phenotypes whether it's the transcriptome or or epitomic features there is obviously going to be advances let's say long read rna sequencing direct rna sequencing is going to be very important but i don't really see those as the major bottleneck i think something that would really change the game is if we actually had good cheap fast practically feasible ways to differentiate cells and basically take cell samples from an individual and obtain other cell charts from that because those like brain biopsies they're just really not going to ever be a popular thing to do and for several different diseases and phenotypes we just don't really have accessible cell types to analyze i i would i would add to that which i think is really quite correct i would add to that two things though one is i think we're moving in this direction with the understanding that you know and a human or an animal is quite a complicated machine and it's it's more than the sum of its parts and therefore i think there is going to be a sort of effort to do two things one is to get as much information in situ in the organism itself as much as we can because that's where the interactions with other cell types that's where the effect of environment that's where that's where it's seen and in addition to that i think we're moving in that direction because we with the development of organoid systems where cells are placed into an environment with different other types of cells and allowed to re architecturally reform structures that they normally do in vivo we begin just begin to approach a situation where the complexity of an organism begins to reveal itself and so again looking at these different methods sequencing rna methylation determination you know protein analysis as much to be done in c2 as possible and do it in a situation where it is as naturally interactive as possible i think that's where our largest uh you know progressions are going to be made thank you another you know question i have you know it's about um so-called normal you know i'm quite impressed by both of you mentioned that it's a key component you know if you move forward if you cannot define normal then how do you correlate you know disease states and um but here the normal is probably a very large range you know i'm thinking about you know physiological terms you know blood pressure for example you know you know cell count blood cell count and all of this there will be a range and also again um thinking about you know different populations and there will be variations and so um it will not be a single normal maybe not a hundred maybe not a thousand so again um so we have to consider all these factors right i think that's absolutely correct i mean the the normal isn't it it isn't as if it you fall off a cliff at this point right even if you're dealing with a spectrum of values you know it isn't as if you know you know as in as in the case of blood pressure or something once you get above this number you're not normal anymore because we get these bro differences in physiology depending on what's going on you know with the with the organism itself so i i i do believe that's right but i also think that normal normal uh behavior or normal uh activity of genes and modifications so what uh has to go back to this idea of interactions the interaction of the context to which the the sample is that's being uh evaluated is is going to be everything because it is that's that's where that's why you have this range because the cell or the organ or the tissue is responding to this conditions it can sense so i i i again i think the complexity is is even somewhat more daunting by the fact that doing these experiments on uh individual cells or a collection of cells in a tissue or an even more heterogeneous collection in an organ all right that is that that's going to be i think somewhat more uh problematic because we're going to have to rethink all that data or reconsider it when we have systems that are more lifelike in in that situation yeah i mean i would add i i agree with with all of that i would add that when it comes to the sort of definition of the normal or when it relates to different populations and different environments and all the sort of human diversity um that is it is certainly important to to characterize and understand that that diversity and and sort of like that we don't think that some sort of i don't know very specific population somehow represents all of humanity however it is also it also happens quite easily that people really kind of focus on the differences and not on the similarities and when it comes to sort of like molecular function of human cells or human physiology a lot of it is shared across all humans and and and there is just a lot of a lot of shared components that we also kind of must must keep in mind to kind of like not over over emphasize those those differences that we still should should learn and appreciate yeah in terms of accessing you know large databases you know being thought about you know collaborating with all of us and for example yes yeah i i think that's a very interesting there's an interesting um rfa out looking at diet uh you know in that in that population thousands of people or tens of thousands of people are going to be you know they have volunteered to be part of this uh diet study to see what a precision diet looks like for an individual and those studies i i think are uh somewhat i wouldn't call it the wave of the future but they are realistically trying to deal with the numbers that are significant or statistically significant and the variation of individuals in the population i think that is you know i think that is really a step forward thank you um i think there are quite a number of questions so we should give them a chance and if i have time in the end i can ask you additional questions so chris would you please um you know read the questions on yes definitely so a lot of appreciation for your talk so thank you again to tom and tully for talking to us today so you've got a number of questions about what normal is so we're going to come back to that even though you've addressed some of them but i want to start actually with one of our most recent questions which is how do you think that clinical researchers can specifically contribute to achieving this whole prediction well i think obviously the sort of the first first thing that comes through comes to mind for for a biologist is by giving us samples but obviously it is a much more much more nice and and complex question than than that although the sample access question is important and that is something where we absolutely must work together i'm i don't have an md i have no access to actually go and poke at living individuals well except for our non-invasive uh study because it's it's it's not invasive we were actually able to collect those samples without without medical involvement with the blessing of an irp but um but i think that when it comes to especially the sort of moving more towards the precision medicine space that is thinking about implementations and and the kind of effect sizes that are not just p values but where you're actually talking about biological and medical importance and let's say prs whether some sort of difference in risk at which point does it become medically meaningful i think that there needs to be very very serious dialogue between sort of basic researchers genomicist biologists and and medical practitioners yeah i i i could i would like to add to that that uh i've had an a really remarkably lucky a set of interactions with many clinicians and the thing that has really affected me the most is the amount of sort of data not data but behaviors that they use in their clinical diagnosis things that use seemingly are not important in terms of when we talk about molecular biology but makes sense once we understand the molecular biology of what's going on in a particular condition so it's the the ability that people who are ill uh uh with this particular disease for example become hard of hearing you know and and a a symptom like that which may not make any sense you know to a molecular body i'm interested in cancer why is that you know important then somehow these these kinds of of clinical observations which are usually uh acquired in the in the course of rounds in the course of dealing with patients and also passed down from one generation of doctors to another these in pieces of information are invaluable in some ways because they explain they they offer the opportunity to form a model of what that what that phenotype is compared to the molecular uh processes that are going on i i i think that is one way in which uh you know the interaction between clinicians and uh and people who who work at the bench you know uh has has been and could continue to be very valuable and that's an area where patient groups contribute as well right by defining helping you determine the symptoms and everything yeah i think those are great answers so getting back to the question of normal i think a lot of the questions that you're getting are a little bit along the lines of what thule said earlier that people focus a lot on the differences and and they're asking questions about that but we did have mallory who asked what can and what what what can slash must we do as a community to ensure that data are collected uh to make sure that we have multiple axes of diversity so how can we do better to be more inclusive of individuals since you're both involved in big projects maybe you could talk about that yeah i you know again it's it is one of those situations where it the limitations is often based on access the um the ability to recruit if you're part of a large project the ability to recruit is basically defines what kinds of diversity you can uh you know bring into that uh situation uh that's why i think the example of the all of us situation is very valuable because it starts with the premise of needing to have very large populations all right and and in doing so when you start with that premise i think uh it it offers the opportunity to be able to say all right we're not limited by any number of people that's not the concern all right the question is if we need a hundred thousand people how do we afford that how do we organize that how do we you know control uh you know have the right controls and kinds of things that that becomes the more relevant question at the end because you start with the premise of needing to have as large a diverse group as possible yeah and i could i could add to that in terms of uh also thinking about a a global perspective and and and like building these big projects one of the things that that icda is working on is sort of um at least somewhat unified consent and recruitment and biospecimen collection and other types of protocols to make it easier for investigators in different countries to collect data that is then interoperable and integratable um to be able to actually use this data for it for in bigger and bigger studies and i think that in addition to just sort of uh let's say doing better outreach and incentivizing minority populations let's say in the us to participate in medical research uh we also need to sort of make the barriers lower for investigators in developing countries to be able to engage in this these kinds of studies whether it is sort of resources protocols access to sort of let's say the inner circles of where science happens um yeah yeah i agree with you that those are such important points there so uh another question about normal we got two related questions so i'm going to put them together the first one was laura who's asking what is the relevance of age in defining a normal phenotype should age be stratified to control for expression that might occur in your lifespan and then mark also asked for tom in your talk you use normal in particular because because brains at advanced ages rarely are truly free of pathology should we really be thinking of normal as a dichotomy and more of a qualitative trait so can you specifically address age and then it sounds like also with some focus on the brain well i guess i could say a couple of things about age since we've been studying it quite quite recently from in one of the chalk med cohorts in the mesa cohort where we actually have longitudinal samples that are 10 years apart and we've been looking at sort of age interacting regulatory variants and and sort of how does gene expression methylation change with age and it's complicated and and it's also probably one of those areas where we where it's not going to be enough to just have molecular phenotypes from a complex tissue sample in this case blood blood cells because cell type composition varies between age it also varies between phenotypes and sexes etc and that explains a major component of the of the differences and i think that this is one of those areas where where cell type insights into cell type composition will be really crucial to actually understand what is what is going on and even the most sophisticated molecular assays reading those molecular phenotypes are going to easily lead you astray unless you understand the cell type context i i you know the i think the question is is a good one it it in the case of age you know we're all living old much more uh in a prolonged life um we you know we we have to think about this as um i think it was mark he said uh as a as a quantitative continuum there we where our comparisons are within a stratified group and what and it goes back to the question in that stratified group what is normal what is operatively normal it may not be normal in any other strata stratification but in that group it's normal and maybe the functionality of that group is not the same as others you don't remember as well or you don't um you you're not as rapid thinker or things whatever as you get older but in that group in that stratified group that's normal that's not different and therefore as was suggested there is a continuum all right and it very much is dependent on age and it also depended on that phenotype the manifestation uh in of whatever uh phenotype that we're talking about so i i think that is um important to understand and and i i and i think uh in in terms of what uh tully said earlier it that it's the similarities that also will mark how how much deviation is going on within a stratification so if in that stratification there are x number of phenit of of uh biochemical and molecular processes that are being monitored and most of them are similar and others are not then we know where to look to see if that is constitution that constitutes not being not normal because we now have a place to look in a larger population so i i would say that's probably yeah that's great yeah thank you so again we have two questions which are related to each other they're both about identical twins so the first one is what's the level of correlation of gene expression in identical twins versus unrelated individuals and the second is uh could order of exposure to environmental factors in identical twins affect their phenotype i i remember i'm reminded of these this paper uh these papers that were about they must be about 15 years old now or so when the when the the spanish groups were studying uh identical twins and um and it was remarkable because it was very clear that young young individuals identical twins um had very similar uh uh uh dna modification and expression profiles at a at a very young age but as they got older and particularly if they were separated all right then that that similarity uh broke down quite considerably uh if they were and in some ways it was seemed to be related to the fact that uh different behavioral environment different habits and behaviors all right have their obvious effects on the physiology and and the genomics of the individuals and so in large measure if the individuals are are in the same environment and and um and have very similar kinds of exposures to things that are risk uh prevailing then they will have very similar kinds of reactions because basically the groundwork has been set up for the reactions to have to happen if they if they see a different set of environments and that's not only the external environments but internal environments the foods they eat the uh the uh the cleanliness of the areas so the the infections they acquire and so forth all of that will lead to variation even among uh identical twins so i i think it in large measure it's you know it is it's a system that was used very much to emphasize the importance of environmental change on the overall molecular behavior of the of the individual yeah i don't think i have much to add to that what was the second question chris say the first one was about how much correlation actually is there in twins versus unrelated individuals and the second one was could you concede that um and i think tom addressed this that the order of environmental exposures would affect their phenotype yeah no way but again here i want to emphasize the importance of cell type composition so if you take blood samples from one twin and and the other and the other one had a cold a couple of weeks ago there is going to be differences in cell type cell type proportions that will manifest as differences in gene expression levels and we know from for example from gtx that like the major component the major source of gene expression variation is self differences in cell type composition so if you had a very specific cell type extracted from both twins i think that those numbers would go even up from what we know from most studies thus far so specificity would be the answer although you'd want to have some information about what percentage of cell type that was at the time yeah to be able to answer that so uh maybe one more question before i turn it back over to paul which is this one's from dina closing the genotype phenotype gap requires integration of functional data to recapitulize to recapitulate real life disease pathology how can we feasibly achieve this especially for complex diseases ending with the easy question there go ahead tilly complex this year well like what what was the table that i had with these kinds of like five times six and i only got to the cellular level there and they didn't even sort of address the sort of like like similar function like insulin excretion and physiological phenotypes so yeah no i think we just need to do a ton of work and using all kinds of approaches that will be complementary and and also including some of the future future kind of approaches that that we've discussed um uh today i don't like we don't we don't have like the pipeline the toolkit the method sort of build that will get there and i think that we're also very much sort of like there is a there is a bunch of approaches but kind of what is exactly the best in different types of settings is not entirely clear this is also something that that icds is sort of working on taking a bunch of sort of kind of like flagship diseases like example disease and trying to take those apart and then if we can actually use that to develop generalizable lessons on how to do this across very diverse set of traits and diseases and i think that as i said i think this will include both the sort of observational population type of studies where you collect cells from from actual individuals and also invisibility with different types of model systems yeah thank you and paul i'll turn it back to you unless tom wants to add some of that no i'm fine thank you sure always so just maybe a final question or comment you know again we'll talk about the differences versus you know community among populations you know diversity and things like that i think you know um my question will be for this sport prediction do we you know so it seemed to be at that academy or some kind of tension between the two and on one side i agree you know for big data look for population trend and normal range all that stuff so you want to look at what's in common and how do you define uh disease stage versus you know a normal stage but but for when you apply this to precision mechanism for example then you really want to look at each patient as a unique person you know unique set of genotype unique set of environmental factors and but how you efficiently doing this on each patient especially with so much data and how do you apply to individual from a clinician point of view paul i need a little help on this so is the question that the data types are large and diverse how does a clinician you know how how do you how do you you know have efficient data collection from each patient and and also how you apply the knowledge for population point of view to a single patient yes yes right yeah you know again i think you know clinicians have a lot to teach us in this because they do start with the supposition that each individual is unique all right and that that what they learned in medical school could be could be contradicted by this individual you know in some some some very tangible way um i think that's that approach is probably a lesson that you know molecular biology should should uh pay attention to that is to say we we you know we tend to treat things in a much more um uh you know in a much more commun communal sense because we're looking for bottom line answers or bottom line uh explanations right and when we when we go back and look at the individual that you know things that we would look for to explain a particular clinical state or whatever you know would start with these uh this sort of general bottom line uh summaries right but then i think when when those those are usually uh you know the first things i look for but then what what molecular biology can offer is variations that you are seem to be you in terms of frequencies seen and where they're occurring in the genome or for alternatives to be added to the body to these bottom line uh summaries and i think that that's sort of a that's sort of an approach which i think uh requires um you know a a a path a a a process by which we we as molecular biologists can interact in a way which provides information about the uh the behavior the phenotype genotype relationships that we uh happen in addition to the ones that have been well characterized and i think that kind of information is probably the one way in which to uh in it you know translate or synthesize information which is very complex and very very numerous thank you very much um both for your wonderful presentation and insightful comments and discussions and uh i'd like to thank the audience and thank you chris for the you know uh you know uh designed this whole process and the seminar series and uh susan for the support and you know admin support and gerald william and alvaro for it support and thank you very much have a good day thank you very much you
Info
Channel: National Human Genome Research Institute
Views: 1,502
Rating: 4.9000001 out of 5
Keywords:
Id: AP9pWqUXlrM
Channel Id: undefined
Length: 89min 27sec (5367 seconds)
Published: Fri Apr 16 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.