How to analyze RNA-Seq data? Find differentially expressed genes in your research.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

IPA is a proprietary software, there are many other alternatives to such analysis. This is a solid tutorial. Maybe a bit outdated, since the analysis tools change/get updated every 6 months. Very good for beginners get a clear idea of the whole process. Thank you.

πŸ‘οΈŽ︎ 10 πŸ‘€οΈŽ︎ u/AbyssDataWatcher πŸ“…οΈŽ︎ Jul 17 2018 πŸ—«︎ replies

I thought I'd share this video on RNA-Seq data analysis that I just watched. It was helpful to me so I thought it might be of interest to others as well. The audio volume is a bit low unfortunately. If you don't wish to watch the whole video, the speaker gives some great ressources (MOOCs, publications, tutorial sites, etc.) at the end of the video if you wish to learn about RNA-Seq from other sources.

EDIT: The video description also provides links to ressources.

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/GillesXD πŸ“…οΈŽ︎ Jul 17 2018 πŸ—«︎ replies

I have seen this video by Candice Chun before, it’s great! Highly recommend.

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/icodescience πŸ“…οΈŽ︎ Jul 18 2018 πŸ—«︎ replies

I have a comprehensive open source R package with great documentation under review right now, as soon as it's accepted I'll tell you.

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/price0416 πŸ“…οΈŽ︎ Jul 18 2018 πŸ—«︎ replies
Captions
so I'm happy to be here I'm Candace Chu a PhD student in veterinary pet apology I'm a third year PhD student and happy to be here to talk about how to analyze our Nasik data I've talked this topic in the summer minutes i'm not once and that's amina i put a lot of effort in by the detailed step-by-step instruction but then i feel like it's very easily to make people feel bored so I took out all those details and I will emphasize are these four points so first that you have to know why we are doing RNA stick inside instead of using other methods and then you have to know how to design a proper experiment for the RNA C and then I think the main focus today will be how to analyze your data first you have to be familiar with a file format because there are so many file formats you need to know what they mean and what's going to be the input of the tools that you want to use and what's going to be the output file and then for the platform I mainly use the Linux and R as my platform but there is a user friendly interface called galaxy right now so it's like a website you're opening your browser and you has all the tools you just have to drag them from the left column to the middle window then you can do the analysis over there and that's much easier however I'm not familiar with them so I'm not going to talk about it today and for the tools I think the most important thing is that you have to know what which tools you're going to use and then you have to have the ability to be able to read through the manuals and know how to use them I understand what's the input requirement and the output result and I think the other main point of today's lecture is I'm going to go through the resources because they're you know once you type RNA seeking Google there are like thousands of result and which one you're going to look for especially as a beginner so I will share some useful resources that hopefully can save your time so start with the purpose what is our NAC so in the central dogma of the biology you know that DNA is transcribed into pre mRNA mRNA is processed into mature mRNA and then translate into protein so using RNA SiC is that we isolate the RNA especially people are focusing on mRNA in our tissues and we were trying to take a snapshot of the tissue like at this point because we all know that every cell in our body they have the same DNA but they eventually became different cells because they express different genes so I want to take a snapshot of our tissue and knowing what other genes that's being expressed at this point so maybe at this point you were wondering then why don't we just do protein profile instead of doing army so why no my question is why don't why don't you just isolate a protein and profile the protein instead of profile the arm yeah yeah that's it because the limit of current technology we cannot profile all the proteins in our tissue so instead we profile profile the RNA and that would achieve a more complete global for following about tissue and my other question would be why don't you do microarray and what do you want to do our nation's so what maybe a lot because we will watch it after disease have that lesion we will force utilize the pros so because absolutely in the know yeah it has something to do with the probe because using microarray you have to pre-select the probe that you want to use so for the transcripts you can detect you you have a set of target however for RNA sequencing there is no pre-select things so you just openly look at what are the things in the cells and you can discover novel transcripts compared to microarray and the other thing is that by doing microarray there's a by doing RNA sequencing there is a broader range of dynamic detection so x for example for my in my sample there's probably some genes they has no expression but probably for the same genes in other samples it could have like 6 million reads mapped to that gene so you have a border range of dynamic detection and then using rna-seq have higher specificity and higher sensitivity so the next experimental design so i'm going to take my own research as an example so my research is very long there are 80 words but don't worry i so I'm studying a genetic disease of dogs it's called x-linked hereditary nephropathy so it affects the glomerular in the kidney so here is the Eagle River and there are three layers of goal Miller's I study this disease will cause mutation in the middle layer which is glomerular basement membrane and the mutation will cause the protein in your blood to leak into the urine and that cause inflammation of the kidney which will lead to juvenile onset renal failure so these dogs usually die within one year and why is it important to study the genetic disease in dogs because you can simply just stop breeding them and I would like finish all the problems however in human there is a disease called Alport syndrome it has the same mutation as what we see in dogs so it makes the question very important since we can understand the human disease through studying our dogs and by looking at all the affected dogs we notice a interesting phenomena that some dogs they progress faster and some dogs they progress slower even though they have the same mutation so we want to use RNA sequencing which is RNA seek to find out what are the differentially expressed genes between these two groups of dogs and we want to know what are the pathways that these genes are involved so here's the overview of the exam experiment so you start with the experimental design you take out the kidney tissues and then you isolate the RNA and you check the RNA quality and quantity and you do the library preparation by yourself or you can center the sequencing facility and then there was sequencing your sample in this class is called flow cell and then you have the results as your raw data and your analyzed your your analysis well picking after you gotta roll it data so talking about the experimental design this is my design in the optimal situation you you will want to have at least three biological replicate for each group but unfortunately one of my control does not have enough RNA so I only have to control dogs and I have three in rapid group three in slow group and I took serial Keeney biopsy at different time points T 1 T 2 and T 3 so in total I have 24 samples so by doing this experimental design it allows me to compare different groups at a specific time point either either t1 t2 or t3 or I can compare same group at different time points so after I identify the sample that i want to use i isolate the RNA have to check the RNA quality and quantity so for quality if you are isolating RNA from tissue you want to make sure that you have intact RNA which means that it means they are intact they on a lie fragmented and how do are you going to check it is by using bioanalyzer it has an RNA chip or you can run it through a gel and usually if you're using bioanalyzer it would give you an RNA integrity number i N and I think of the highest number is 10 right yeah the higher you got the better your quality is and if you're doing a small Renee sequencing then you can use the angel and smart and analysis kit and hopefully it will give you a result like this as you can see here here are micron originated you can see a teeny tiny hump here that tells you at least your house this morning and for the quantity it depends on like which library preparation kit you are using it will have different requirement but usually you had to have this one microgram of total RNA or 10 to 15 nano gram purified this morning but you have to make sure that they are purified smartening so one way you can do it is to measure on the photometric based method like the ripe or green and I know that there is a machine code Cupid that we can use or you can measure it in the core facility they have fragment analyzer to check for it and based on my personal experience I measured it with both machines and I think your the fragment analyzer provide a better result that correlates with our successful rate in making a library preparation and definitely not trying to use nanodrop to measure it because usually the concentration is too low for nano job it's are you isolating RNA from tissue okay if it's yeah it depends on the concentrate it is higher than its rose yeah I think it's a 50 yeah yeah yeah just don't below 50 nano gram yeah because I'm isolating not for these experiments or these research for five for my ongoing research I'm measuring RNA isolated from by fluids so it's usually super low and for a library preparation there are several library profile you can choose from that depends on your RNA quality if you have high quality samples then you can use the mRNA library preparation kit but if you have low quality sample on the official website Illumina suggest you to use the total RNA library prevention kit so for my file fluid samples I use this kit the total RNA librarian River kid with ribose Oracle and one advantage of using this kit is that it can help you to remove the cytoplasmic and mitochondrial ribosomal RNA because usually they 90 percent of RNA are RNA so you want to make sure that you are not isolating those RNA and in addition to a library preparation you need to you also need to consider about the sequencing depth like how deep you are going to sequence so if you are this is a quote from our core facility if you go to its website you can check on it and one thing you need to consider about is this two different platform usually tells you like how many reads you can get from a single ring in this case the first case you can get 220 to 230 million reads per length so you have to do a calculation like how many samples you have how many Ling you are going to run because this price is per Lane one lane calls you $1,000 so it depends on your budget and how many weeds you want to get so for example in my experience I have 24 samples right so if you want I run it in one length then I will have 10 million reads per sample so in our case we think that's not enough so we divide our sample in three different lengths but we run in all England one links so it's not like a sampling one one Ling it's like 24 sample but even through all three things so that after we got the result we come by the sample from three lengths together and do the analysis so that I can decrease the bias between different links so you have to depends on what your going to do if you are going to discover differentially expressed genes probably you won't need too much D too much depth but yeah it depends on your application and one way to figure out is to go through all the literature's see like how many weeds do they did I get from their sequencing and see like how the results were quite so except for that you had to also consider either you are going to do single n SE or double and over pair and PE and you can see the price there is a $7,000 difference between that and that also depends on your application if you are doing small in a sequencing you know this morning they're super short so it doesn't matter you do single arrow table usually single n it's enough but if you're doing mRNA sequencing like in this research then pure n is much better than single in because imagine that you have a single entry which is 50 base pair long and you try to map it to your genome maybe you can map to multiple places however if you have parent you have to met both of them on your genome that would be correct so using paradin that can substantially decrease your false positive because single and connect with multiple places but maybe some of the places that's wrong it just happen to be able to map one end of your read not both end of your read and also you have to consider the read length because you see you have the option of 15 here and you have the option of 125 here and the longer you have the more expensive it would be so how do you know like how much is enough there's a tool online called Scotty you can help you to calculate the statistical power that you need before sub before some meeting your Grint so that just put your budget like how many how much money does a link cost and how much biological replicates you have just put all the information you have on that website and I will calculate for you then talking about the Illumina sequencing technology I was thinking whether I'm going to play this five minutes YouTube video English class but I decided to save your time go home and Google Illumina sequencing technology and it will show you how does it work after the library preparation after you got a cDNA how does a machine make it make it wrong on the flow cell and generate all the different reads right so here's the analysis part so for finding out differentially expressed genes there are three questions we are going to ask the first one question is after we got all these reads we want to know which Jynx do these reads belong to and it is in addition to the first question we also want to know like how many reads map to that specific gene and after we have all the numbers now we want to know whether the different sample groups Express genes differentially so if you have read any RNA seek paper you will be familiar with a tuxedo suit it's a very popular pipeline you use the top hat to do the alignment and you do use a comlink to do the transcript assembly and you do use the Coptic to do differential expression it's very popular and people are still using it to publish their data however there is an updated version of the tuxedo suit so you can use the hiset to to do the alignment and use the string tight to do transcript assembly and use the ball gan to do differential expression and I think using this type of tool has the advantage things that they are built together so is easier for you to take one output and fit into the next tool as its input and get your result back but the downside of it is that you don't have much flexibility in doing your experiments so after dig a little deeper I few months ago I decided to do my own approach so my approach would be keep the high side - from the tuxedo suit as you know alignment tool and I will use HT SiC to assign those alignment reads to specific gene and then I will use the EC - to do different short version and notice that everything including high set - HT sick I did it in the supercomputer or the working station you're going to work it and then I export the TSE file into my own computer and do the sick in our or our studio so let's take a look at the raw data the fescue data by now so if you are doing a sequencing with our facility you will receive a result as a link and you just click on it it will bring you to the website and you can see all my 24 samples are here and remember I mentioned that my sample are distributed in three links so we are looking at link 2 and if you look at Ling one all the information would be the same and also notice that how big my data is so it's like 40 or 50 gigabytes for one lane so I have three links is 150 gigabytes and there is no way I can store those information in my Mac so I'm going to use the supercomputer to store the data for me and it's a picture from IBM super super computer center I don't know like how ours look like so so just open the terminal on my Mac and login after I log in I just have to going to the folder I want to save my fast to file and when I open it the fast to file look like this it's a little bit scary but if you pay attention you can see that it has basic unique unique a unit for all four lines so take a look the fast you format first you have the header and you had your actual sequence so that depends on like how houses sequence house the read length that you chose we chose 125 so you have you see there is 105 nucleotides and there is a plus sign just a plus sign I don't know elissa me and you have base qualities I'm going to talk a little bit about base qualities so for calling each base as atcg the computer has it's a confidence level about like how confident I am to calling to call this as in a and the higher confidence you have like the lower error you have the higher quality score you have and it's denoted by different symbols here so as you can see here this has pretty low quality as a pawn sign it's like below 1010 and if you're a visual learner and you're doing sequencing doing sequencing with our core facility it would also send you the fast to report website so you can just click on it and you can see the ideal should be everybody is above 30 in the green zone like here I will be satisfied with this but if you re has a lot of nucleotide in the yellow zone or the red zone then probably you should chop them off first before you do any alignment so assuming that everything is fine you have the fast cue file and now you have to download the FASTA file and GTF file from the web and I'm talking about this because I think most of us are most of us are working with species like human males and red or dogs those species that have a reference genome so if that if you have references you know that you have to download it from the website and map your read to that genome but if you don't then you have to take another route and that's beyond the scope of my lecture but I will provide you some resources that you can look it up by yourself all right and there are several facilities that provide a genome files but my personal preference using the ensembl dog genome that you can see here a sample yeah I know there is a UCSC as well however a example genome or it's a gene ID is widely used in very many following applications so if using example first then you don't have to transfer your ID into another system that would be much easier for you so what I did is I type in sample space dog space genome and I click on FTP download this thing and then it will take you to its website and just have to look at doc ok here and you need the DNA which is a faster file to just click on it and you also need a gene annotation file because the FASTA file is just a DNA sequence like a TCG go on on you will never know like whether it's a gene coding sequence it's a genius a protein coding sequence it's an exon or it's an intra but the annotation file GTF I will tell you from chromosome X position 1 to position like 108 whether it is an exon or whether it's a protein coding sequence so download a file from the website and if you scroll down to the same website it will tells you what a file format is what is a fast stop file why is the GTF file what is a gif file so if you have any question just go to the website and it will teach you how to read the file format okay so I have the file in a supercomputer is right here I'm going to take a look here is my GTF file once you take a closer look you can see the first number is the core muscle number and what tells you is simple gene starting from what position to what position it's a sequence in a plus it's on the plus string or the minor string and what stitching ID so this is the ensembl ID guy I'm talking about and for the detail of the GTA format please look online for yourself okay so now you have the fast fasting file and the FASTA file and the GTO file now you can start to embed your reads to the to the genome and at this point the question we are asking is which genes do these reads belong to so every time you want to use a tool either in supercomputer or in your future working station just type its name hit enter and a computer will won't you that a there is no index blah blah blah and no output file is specified but it will kindly tell you how to use the tool so in this situation you have you have to type the tools name first space followed by options space - X put an index file - you want - - if you are doing parent reads - you if you are using single and reads and - s to specify its output if you don't do so it will do the default standard output as it's filing so you know some people may thinking like I don't even know how to use iOS 10 how can I know how to use high set - then you have to google hiset - and go to his official website and take your time to go through hopefully it's well written manual mm-hmm yeah it's just a pain in the ass those people are so smart they spend so much time like writing good programs but they just don't know how to write self explanatory menu oh really oh yeah that's very helpful yeah that's why there are so many I - Toria website online so most of the time I know it's not a proper approach but sometimes if I want to use a new tool I'll just type hi set to space tutorial and some people just make tutorial website for you and you just follow their instruction okay so going back to my actual comment I have the name as specify in the usage I have all the options and for now the only thing I can remember is I put P 20 minutes i 1:20 course of the supercomputer to do my job at the same time so you'll be faster and I can remember what I did here and then I specify the index file so you see here the first three and the second paired read file here and I specify the output name I want to call it aligned that same so remember that the output file is either to be sent file or the been file the bin file is basically the same as the same file but it's a binary file so it's a computer language once you open it you won't be able to read it but you can read the same file so take a look at the same file here make it bigger first you have the ID and you have a bunch of numbers and your mapping situation and the sequence of your math 3 here and you also have a quality score here and a bunch of others and if you love to know how how does it look like just Google the same format and somebody will teach you how to read it but anyway except for the same file it will also tells you like how many reads mapped to your genome so in my research I got roughly 90% of my reads mapped to the genome so all the orange and blue are the reasons I'm up to my genomes and can see I got roughly 30 million reads from each sample and the blue ones are the resize uniquely mapped map reads means that they only map to a specific location in genome non multiple location all right so now you have the information the next step is to assign those aligned reads to a specific gene and you want to know how many reads align to a specific gene the goal of the tool is to get a table like this this is a call a count table you have genes in your rows these are all written in a sample ID and you have all your samples in the column so that you know in the first column the first sample has 7 700 reads mapped to that specific genes and as you can see here sound of the genes they don't have they don't have much reads map to it maybe 1 2 3 and something there are highly experts they have like hundreds of reads map to it so how we are going to do it again just type in the name of the function and a computer will tell you how to use it and you have all bunch of options so maybe at this point you would think about like what do I have to care about the options I just want to leave it blank and just provide what I need to provide but the truth is different options has different main meanings so in the example in in the example of HT sick the default is Union so whenever you have a read map to a specific gene gene a if there's only GA it will coaching a but in this case if the re mapped to most of the gene a but a little bit of gene be in the default is going to call it ambiguous and it will now take it into account so if you want to keep reads like this you better change your option into intersection Street then it will count it as ta so that depends on your application so you will still have to read through the manuals and pick the one that is best for you so again have the name of it have all the options and have the align file at the GFF file which is the same as the GTF file and I make it the output as a txt txt file here so go back to the supercomputer I have all the txt file for all 24 samples and I compile it all together into a gene CSV file so in this file I have example identifier here in every row and I have my sample 24 samples in 24 columns so after you have this you can export the txt file into your own computer then you can do the dc2 in art and try to answer do different sample groups express genes differentially so how to use our Google are click on the first link download it and here is it out how our look like I think it's pretty ugly and I don't do I don't like to use it so I usually use the our studio instead so Google our studio click on the first option download and open it is a little bit more complicated but I like it better so you have a window for plots if you're generating any plots it will show up here and you have a window for the computing sections so that you know that what computer is doing right now and you have a window for your environment and this is the most important part here is your R script so you put everything you want to do here and you select them and then you hit enter and the computer will run it all right so instead of in addition to use our I think the most important part of using R is that there are a lot of open source packages that you can use and there is a website called bioconductor on that website on the front page there is a thing called explore packages so you click on it and then you can see all the packages that you can use they're all free and something are like super super awesome great tools and just have to type our earn a sink and you can see here it tells you we have 130 tools for you to use for example here here is the updated tuxedo suitable gone we can see here and you go down you can see the dc2 this is what I'm using right now and the edge are the other popular tool of RNA sequencing analysis so people are wondering whether you used ec2 or edger and there's a paper how many biological replicates are needed in an rna-seq experiment and which differential expression tools should you use and a conclusion of that paper is if you have less than 12 replicates 12 biological replicates then probably use use agile if you have more than 12 then probably use you use dc2 however the good news is they don't differ a lot so either one of them would be would be good to use so here is the demonstration of how I'm using DC - in our first you have to type library and put the ecig to your is like open and a function in R and then you can see once you type it the computer is running you can see here so it's running as LD stop so you just have to select the thing you want to you want to execute and hit enter a computer will run it and it will make rap for you so I'm making a PC a graph and you feel like if you feel like oh this is too complicated then you just focus on the time point so you can see T 1 T 2 and T 3 so all my T one's simple they cluster together key to sample a class or together into three samples a class or together except for these two and if you want to make different graph you can make key Mac as well it's pretty easy and can also make Venn diagram and these are all like pretty high resolution pictures that you can use directly in your paper so that's one of reasons I love to using what love to use art and I encourage you to learn how to use art as well it will definitely help you a lot I mean this is like modern scientific research time right now just please don't use the graph you make from Excel or don't try to make Venn diagram in PowerPoint please don't just don't know how to explain that whenever somebody using then using PowerPoint to make Venn diagram like they are drawing the the circles and make the let each other just don't do it please so here's the result of my research so by looking at the key map you can see that all the T 2 and T 3 samples of the affected dogs the slow the rapid they all cluster together so this will suggest me that twofold to focus on T one because in the later time points maybe their expression are too similar they couldn't be separate out easily so I will focus on my research at T 1 and if you look at the PCA plot you can see at T 1 almost everybody clustered together except for one control dog and one rapid dog they are here and T 2 the rapid dog and control dog are clustered together but there is one slow dog here and T 3 it is very clearly that control dog closer together and either rapid or slow they are pretty similar so maybe look at the early earlier time point it would be more valuable in my research it's just a either the he map for the PCA they are all just expiratory exam of your data so just do it before you do any analysis and have a basic idea of how your data look like and then you can definitely get a list of differentially expressed genes and here I not only have the identifiers I'm also able to transfer them into the common gene names that we are more familiar with and look in the Venn diagram if I'm comparing a rapid group versus a slow group at an early time point unfortunately there are only two differentially expressed genes however at t2 there are 65 differentially expressed cheeks and t3 they're all behave very similarly so there is no differentially expressed genes if I look at rapid group versus the control group at t1 they are not much difference and t2 the gene increased and t3 there are much more genes and it's the same for the slow group so this this would be a phenomena that will put into my discussion to talk about like how this will happen but it is enough so what can you do with your ASIC Dallas's you can have a list of differentially expressed genes and you can put the list of genes into gene ontology and pathway analysis or using the collagen ingenuity pathway analysis as well okay so it's pretty simple to do the gene ontology analysis just go to the Panther DB org website super easy because if you are using if you are using the sample genes then just have put just have to put your ensemble ID in the middle and select your species like my stall so I will select dog here and select your analysis I will use a statistical of a representation test and just click running as you can see here here is my analysis type its over-representation test and as you can notice is recently updated so it's very new and then you can select what type of now one of the things you are looking at right now so if you're looking at pathway just select half way and it would gives you the result so based on the genes you put one thousand one hundred oh six genes into assistance and 1101 those 1100 genes these are the pathways that are all per over-represented in your list of differentially expressed genes so you can put this in your paper and discuss why this gene why this pathways is over-represented and how is that related to your phenotype right very easy and the other approach is using the IPA so you can download it in your mac and run it in your computer and I PA is fortunately we have purchased the license for IP so just hyped emu ingenuity analysis click on software license super easy and you can get a license and once you realize and just download IPA in your computer and it's easy to use just upload your differentially expressed genes and it will give you a result of top canonical pathways top upstream regulators and we will tells you top disease and bio functions and one limitation of IPA is that it only has data for a human mouse and rat so if you're doing research of the other species like what I did I do dog research so you have to adapt both approach so I use both gene ontology and IPA and using IP ages to provide additional information and if you don't know how to use IPA they provide free webinar like one hour long website teach you how to use it and every month they have like rotate rotating app Reb nom so we have missed the first one in October but we still have the second part tells you how to format and upload your RNA seek data into IPA and can just by click on it and you can do your analysis and write your paper so pretty simple now here it comes the most important part the resources I'm going to cover any outside courses online courses especially you like to take courses on Coursera because I feel they a lot of usable courses there and I like to watch YouTube videos as well and some tutorial websites forums and more most basically the papers and remember Google is your friend just Google all the time once you have a problem just google it and I pet that like five years ago somebody already asks your question and somebody already provide the answer just google it so one thing is good of ours supercomputer center is that just Google TMU space supercomputer and you can go to this website and here it has training if you click on it it will gives you regular workshops and special workshop it has a CDN buying form ethics workshop somewhere in 2016 just click on it you can download all those all the slides and for the regular workshop I checked it for you guys there is a next-generation sequencing workshop November 2nd hope it won't be too late for your project you can go to the workshop and are in funded alright so for online courses my favorite one is this one command-line tools for genomic data science it teaches you how to do analysis in the supercomputer setting in the Linux or UNIX system and it will use the most commonly used of all tuxedo suit to guide you through the whole process even though it's old but I think a lot of people are still using it you can still use it but or once you have the basic concept of how to use it you can apply it to other tools so you can enroll right now and it will start on October 24th alright so here's a slice that I got from that course so obviously you can learn the rnc kanasu workflow and you would do the analysis actually by yourself and you also teach you like how specific tools works and different file format FS cube format the Quality Score Bend format gif format so I really love this course and the other resource is the YouTube video so there's an organization called bioinformatics it's a Canadian organization they frequently frequently has the on-site and formatic course but mostly in Vancouver so if you don't want to go there you can just go to the YouTube and watch their videos and it's pretty new it's uploaded in August and what is nice about this is they also round the tutorial website triple W RNA seek a wiki it's very nice and so you can watch the video and do the tutorial at the same time and what is better is that those people they even publish a paper so you can watch paper I read the paper watch video plus doing the tutorial website all three at the same time so it's very helpful and there is another paper that I really like a survey of a best practice for RNA see data analysis this is a really good paper and it goes into details about different steps like how to design an experiment how to do the analysis I think is really good and if you want to learn how to use dc2 in art this is a very detailed paper that talks about by step by step it has all the scripts in this paper just copy and paste and you will know how to analyze your data so remember this guy this is the this is the author of the ecig to mykola I think this guy has no life because every time when I went to the bioconductor foreign ask questions he will reply like in one hour every time so that's why that's one of the reason why I love to use CC - because every time you have problem just ex online he will answer it for you yeah nice and then I like to read articles on RNA seek block it will put on new paper new research results like several results per day so you can look at different papers and there are many many tools that be have been created like everyday so it's a good place to look at and what is nice about it is it has a Twitter Facebook account so you can like it in the Facebook and the one the first thing you will see when you open the Facebook every day is scientific stuff makes you guilty about hoping the Facebook and this you must have the you must have an account in this forum because it's the biggest RNA sig analysis forum the bio stars you really if you're doing any RNA six up you really really really have to have an account on that website and people asking questions on that website and there are a lot of experts will answer your questions within one day so it's very useful and I personally created a post on that website so if you go to by our stars' org slash P slash this number you can find my post I have a collection of up to date RNA seek analysis training courses papers and I constantly update them if I have nothing to do so I have on-site courses here have linked to all the YouTube videos and I also have tutorial website online courses of my suggestions and different kinds of courses and articles experiment design papers about workflow about pipeline or papers about the general concept of RNA sequencing so for our nursing beginners you can start from reading those papers it will give you a basic onset concept and you can go into the detail workflow and everything and I don't know that you if you are going to use it to analyze their RNA seek data but I don't I know that Randolph has a training website has a training tools and it has a training data and once you follow everything you can get a list of differentially expressed genes and I don't know if you're using the same same website as I did but if you're using that for your final project just feel free to go on that website because I've been through the whole process and I feel like the instruction that I have on the chop King's webpage is like - way too simple so I put a lot of details on it so you can follow my instruction to finish that training and I also have a github page than being working on I try to teach people how to do our nest egg analysis using my pipeline I said to HTC DC - I'm still working on it and hopefully I'll have time to update it after my statistic exam so once I have updated it you should be able to follow all the instructions to do the analysis by yourself and finally I want to talk about this book so being involved in the RNA sig analysis which led me into a different world than doing the conventional biology scientific stuff and I feel like the people in the computational biology work because they they you know there are computer users there are nerdy peoples so they use a computer a lot and they post questions and post answer everything online and for the tool they created they make it open source so for everybody can download it for free and everybody can make it better so I think that influenced me a lot and I start to see a different world than the conventional biological experiment so I highly recommend you to download this book it's free but if you want to donate the money to dr. Jeff Leake you can pay on that website he's an associate professor at Johns Hopkins and this book will talk about how to review papers how to read papers how to plan your career and how to give talks and I think he talked about other stuffs like how to write a scientific blog or how to promote yourself in the scientific world so as I mentioned this is this is a modern time to stop doing things in the stupid and efficient in efficient way just try to make yourself smarter and whether you're going to stay in academia or you're going to go to industry industry I think this book would be beneficial for everybody so that's it my name is Kandice chuh and my office VMA 2 2 3 I have a personal website tennis shoe Debian calm and I have an email account I'm putting this picture because my cat he died from a random failure last week so I want to like honor him by giving this talk and I've been thinking about I was the meaning of everything and I encourage everybody to think about like what's the meaning of life of your life and how can you make the world a better place so
Info
Channel: Candice Chu, DVM, PhD
Views: 138,517
Rating: 4.9282098 out of 5
Keywords: RNA-seq, differentially expressed genes, differential expression, RNA sequencing, HISAT2, R Studio, HTSEQ, DESEQ2, DEG
Id: xh_wpWj0AzM
Channel Id: undefined
Length: 57min 35sec (3455 seconds)
Published: Thu Oct 06 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.