GENE ONTOLOGY using TOPPGENE - Free tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so here I'm going to tell you about how to analyze the gene ontology of a list of differentially expressed genes using top gene suite so top gene is run by the Cincinnati Children's Hospital Medical Center and I like I like this tool because it's quite intuitive it's easy to use the output data is straightforward to understand so the tool that we're going to use is top funders link at the top and you can see the first thing you need to do is to enter your list of differentially expressed genes so I have my genes in a spreadsheet in Excel and these are the ensembl IDs so you could input all your differentially expressed genes they upregulated and downregulated ones but i find that the data from gene ontology is much easier to interpret if I split the list up into upregulated genes separate from downregulated now with an Excel that's easy enough to do so this column here is my change in gene expression log to full change and I've set up a filter on this file and I can go to number filters greater than arm and because it's a log scale anything that's greater than zero means is upregulated so I type in zero click OK and now all the lines show genes that have a log to full change greater than zero so they've been not preggers today I can copy the ensembl ids and paste them into a new worksheet and that's what we have here and then I can copy that column and pop it into the top gene web tool now these are ensembl gene IDs so I need to change the entry type at the top don't worry about the fact it says training gene set here that's not relevant for the use that we're using today so we just click Submit once you've uploaded your list of genes you get a page that looks like this and what this is saying is out of the in this case twelve hundred and forty seven genes that were uploaded it recognizes twelve hundred and twenty four of them it knows what they are what their names are what their gene IDs are there are always some where it doesn't recognize what they are so it doesn't take them any further in this gene ontology analysis you also have the opportunity to alter some of the settings for the statistical calculations and once you're happy with all of that you can click start this is what the results page looks like and so we can see that it has analyzed each of those genes and looked at the associated GU terms and so any gold terms that are statistically enriched in your list of genes are given out here in tables and it's divided them into the molecular function go terms the biological process terms and the cellular component tones and so we can click on this link here and that shows all of this statistically enriched to go terms in this list so here we have the go ID and the name of that go term and then we have the p-value and different ways of calculating the false discovery rate and Corrections for multiple testing the numbers here so genes and annotation these are the number of genes in the whole database that have their school terms so there's up to seven hundred and five genes that would match here and in our input list there were 81 genes that have included that go term this table is ranked by bonferroni with the most significant being at the top and decreasing as you go down so we can see here we're looking at molecular function and so the sorts of terms we have here tell you about the types of function of a protein so we've got channel activity cyclase savetti heparin binding protein kinase activity so these can be useful categories but they don't really tell you much about what that Pugh team is doing within the cell and so for that we can go down to the biological process category again I'll expand those and now we can see actually a lot of those proteins are involved in something to do with neurogenesis generation of neurons new on differentiation central nervous system development and so quite a lot of these go terms here are related to that general topic so we get a better idea of what's happening inside the cell these cells look like they're becoming a bit more neuronal we can then go down and look at this cellular component so this is telling us where these proteins are found within the cell and again that's consistent in this case with more neuronal cells we've got Google terms to do with sign ups and new on projection and synaptic membrane so you can see these different categories of go terms are telling us different sorts of information and you may well find that one type of category is more useful for the question that you're asking compared to others we can get some graphical information from this if we click on display chart we essentially see a graph of all those good terms plotted out so here we have that was that top go term neurogenesis and if you hover over one of the bars on the chart you get more information so the red bar is are the numbers of genes in the whole annotation so here 1867 and the blue bars are the numbers of genes in that category from your a differentially expressed list and the key thing to note here the different scales so total terms in the category is a much larger much wider scale than the genes in your set that you've uploaded so although the blue and the red look similar sizes that actually plotted on different scales the other thing we have on here as the those false discovery rates are parted and if we scroll down a wee bit further here we go they could start to increase so here we have one of those this is the bonferroni you can see if we hover it tells I spawn a phony and that bonferroni score is increasing as we go down and once it cuts crosses this cut off this red one then that is now no longer significant if it if you're taking the bonferroni scores into consideration as we go further down we can see the other statistical test starting to increase we've got the B&Y FDR and then further down we have the green one which is their BH FDR so it gives you essentially a visual representation of what we've seen on that previous page so I'm gonna flip back there and top gene also gives us lots of other categories which is analyzed so if we scroll down we have whether your genes are seem to be related to known human phenotypes in mouse or human whether they're enriched for particular sorts of protein folds and one which I find quite useful is pathways so if we expand on this so there are a few different systems that have categorized proteins into different pathways there's keggers reactive bio carter and again what this algorithm has done as it said okay are there particular pathways that appear to be enriched in your list of differentially expressed genes and so we've got two for axon guidance at the top there CAG one and the reactive axon guidance and you can see the reactant pathway includes many more genes for axon guidance than the CAG one does and if you click on any of these gene numbers it takes you to a list of all those jeans that were in that category so here was your initial ensemble ID and then we have the name of the gene and the symbol and the entre gene ID so you can easily get a list of any genes in any of these categories that you can download and I just wanted to add a few caveats about looking at this sort of NGO ontology analysis so when we do this sort of omics approach we are taking an unbiased approach we want to find out everything we can we're not looking specifically at a particular gene or a particular pathway so we're taking an unbiased approach however when we start to look at this sort of output from goo analysis it can sometimes feel a bit overwhelming and at this point your own biases can come in so you could look at this list of pathways and think oh well I'm not sure about don't know much about axons and I don't know much about extracellular matrix o grass I know something about grass oh that's good let's focus on the wrasse signaling pathway and that will be the answer to my questions but you've really got to be careful that your Auden bias from your background or your assumptions is influencing what you do next with this cool ontology dieter now you probably will have to make some decisions about what to follow up but you need to be very clear and open about how you're making those decisions and if you choose to follow up on the wrasse pathway have you got a good rationale for doing that is there biology behind that does it fit with other information you have so be very clear about the choices that you're making the other thing to be aware of particularly when you're looking at this sort of pathway data is many pathways involve post translational modifications so map kinase signaling rass signaling and they all involve phosphorylation you know for map kinase you have a growth factor binds to a receptor on the cell membrane we then have phosphorylation of the receptor for activation of wrasse and RAF and we have phosphorylation of the map kinase cascade and so we're simply changing the expression level of a protein within that signaling pathway doesn't necessarily mean that there's more signaling through that pathway this analysis this whether you're looking at our knee or whether you're looking at protein levels is not necessarily telling you about phosphorylation it's not telling you about activity through that pathway so you've really got to be very careful about how you interpret this data particularly if you're looking at pathway data the other thing to remember is so I suggested that I when I do this I find it easier to separate my genes into those that I've been up regulated and those have been down regulated so if we're looking at biological processes and we can see yes we have lots of neuron related activities going on it's reasonable to say okay yes there's probably more neurons and math dish or move on or related put cells in that dish but particularly if we're thinking about pathways pathways have positive regulators and negative regulators so just because particular proteins seem to be more highly expressed doesn't mean they are activating that pathway they might be inhibiting or doing regulating that pathway and they might be negative regulators so again just be very careful about how you interpret this sort of data other things that we have in the in this list we've got links to PubMed so this is where individual papers have uploaded lists of genes that they've analyzed and the software has related your genes to the lists and you can see if oh maybe my genes are similar to some to a list of genes other people have found we also have cytogenetics transcription factor binding sites and cout expression atlases if you want to download this data you can click on this button here download all and that downloads a text file which you can then open in Excel which will look something like this so you can see here are all the go tones what they are the different p-values how many of your genes fell into that category and then here is a list of all the gene names from your list that fall into that Google category and all the data from that page is there we've got the molecular function the biological process and it's probably easiest if you use the filter functions within Excel to help to organize this data so that you can interact with it usefully [Music] [Music]
Info
Channel: Genomics Gurus
Views: 6,002
Rating: 5 out of 5
Keywords: Genetics, Genes, Transcription, Epigenetics, Teaching, University, College, Public Engagement, Science, Lecture, Tutorial, Glasgow, Scotland, UK, University of Glasgow, Glasgow University, Katherine West, Gene Therapy, Precision medicine, gene expression, gene ontology, toppgene, gene transcription, RNA-seq, science videos, gene ontology enrichment analysis, gene ontology analysis, gene ontology tutorial, gene ontology how to use, gene ontology annotation, bioinformatics, pathway analysis
Id: 67LSI8ZA-nY
Channel Id: undefined
Length: 14min 2sec (842 seconds)
Published: Thu Apr 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.