How to Make an R Heatmap with Annotations and Legend

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in this video we're going to be learning how to make an advanced heat map using our a heat map can look like this sometimes when your samples have multiple types of information it helps to have a concise way to represent what kind of variables you got so for example we have pink for treatment and gray for control if your treatment has a time course it also helps to have colors for different time periods in the treatments so let's get started first we need a couple packages we have G plots something I described in a previous video on how to make a simple heat map we're also going to use heat map plus a similar version of heat map function and we're going to use our color barrier so we can have more color options so I'm going to upload some sample data going to use my example script here I'm going to share this in the comment section ok so this is my sample data it's a CSV file and looks like this where we have our mysterious entities and these row names which represent myself only we have control data and treatment data if you have a tab delimited file you can use read table where you denote the separator as a tab in both functions you can use row names equals 1 to do not your first column under your own name anytime you're interested in how a function works just use the question mark before the functioning and here we have some documentation on how to read CSV works so sometimes it helps to have an efficient way of picking out what kind of sample you have just based on the name rather than memorizing or figuring out what kind of index you have in your data so the way I like to do that is using the gruffalo function say we only want to pick out the samples with the named control in it the first parameter of the grapple function is the expression we're looking for the second parameter is what we're scanning exactly we're scanning the row names of my sample data so let's use this works and here you go we have samples that are only control the only of the control data now say you want only treatment data at two weeks well we don't have two weeks let's try three weeks okay so it's similar except you change your expression to treatment three weeks okay let's see and here we go we only have treatment data for three weeks now say you want to pick out your samples only with the GSM number between 100 and 120 to do that it's very similar to group so we have some data what you can do you can define what digits you're interested so we're using regular expressions here zero to two and we want the third digit zero but that's implied so see Jason data here are so we have only data from 100 to 129 so that's just an example of how this works now we're going to take put it into action we're going to create a list of colors for our data so I have the function R you read it out here I'm going to create a list called condition colors and it creates a list of colors based on what sample is in that position in the data so anytime my function sees the word treatment it's going to use the colors numbers now these six character color codes I got them from a website called rapid tables calm here you can see what code represents a different color we also have a chart here for predefined colors and these are the ones that are uses so I'm using just pink and gray so that this function you should be creating a list that's the same length as the number of samples I have so as expected I have a hundred samples and a hundred color codes if you want to see how this looks I'll bring it up here this as you remember the control data is at the bottom of my table so that's what I'm you see in here so now we're ready to make our first sheet map test so since my my data puts the samples in the rows and the variables in the columns I actually prefer opposite way for my table I like to have the samples as the columns so I'm going to take the transverse of my data and this needs to be America so I'm going to use AZ matrix okay and here's my heat map function heat map - now before we use this function we need to upload all the packages to our library so I'm going to copy and paste here live so as you can see it adds G plots to the library and this is something you need to do each time you restart are installing packages is something you only have to do once unless the package updates but library uploading is something you need to do every time so let's see we have condition colors listed as our column side colors that's what column side coats need and let's see here we go and here it is our samples are here and our variables are here as you can see the control data clusters I'm using the average cluster function this is up to you what kind of method you want there's medium there's average there's Ward clustering you can just use the default if you like if you don't know what kind of clustering you want to use so you'll see it's a different pattern so here we only have one row of annotations let's say you want to have more more information you want to have another row to talk about what kind of treatment time has gone on for each sample in that case that's when I would use the heat map plus function so before me to do that we need to create the list of colors for our treatment times so insufficiently we have let's list out what treatment times we have become zero weeks one week three 8 and 24 okay and one two instead of manually sticking out the color code I want to use Brewer to automatically generate a list of five colors for my 5 time periods so broom and pal is a function and then I put five four five treatments on time and so you to name it that once that one is the name of the color palette and drawing from here we go okay so let's see how this looks yeah here are five color codes we're going to use okay so now I have a function the automatically generates a list of color codes so here I paste it together a string basically concatenating the word weeks to each of my color numbers my treatment number so I can find it I like to specify weeks because as you know if you have 22 weeks and two weeks if you just say - gruff will is going to pick out both twenty two and two but if you only want two weeks it's good to have the character in front and the characters in the back to define what you want so if you do this there's no mistaking twenty two weeks for two weeks okay so here we are okay sighs you want to fig there oh okay so here we are we have chicken phone let's click you look there you are okay so because I have two things two color annotation boxes I need to bind them together okay so I'm going to use see you bye okay so now take a look it binds them together so the sample has these two colors to assign to it and I want to give them name cloning and you remember my first column is our condition my second column sure treatment now I'm ready to use the heatmap plus function okay so here it is I like to have my he maps with blue and red you're free to change those colors and then I have my margins defined and my title that's what main is okay so see okay so here it is you see eight and smoothies looks great so now you have to add the legend that was done using the legend function now you can define where you want your legend to sit on your heat map you can say where that you want the location based on keyword here are your keywords bottom right bottom any of those I like to have it on the top right personally so yeah so we have two legends we have one for treatment time and one for condition so I I chose my my location here these are the coordinates on the heat map 0.8 on the x-axis and 1 on the y-axis as the coordinates and I have my treatment so my treatment I want to put the word weeks my current less Hajus the numbers so that's why I do concatenation here and then still defines what those boxes are going to be sold they're filled with the colors that we already defined in this on list okay so treatment color options and then see X is the font the size of your legend so add the legend and there you go there's your legend and then we want to add a second legend for treatment sell here and you can define the vector of name and here's the color according respectively so it looks kind of weird here but I swear when you export it to PDF it looks better so that's why I'm going to do so PDF boards jump off you all do dev off after you call one of these file format functions so here you are okay so it's going to look like this pretty much if you want you can adjust the location this is 9 try that yeah so you can play around with that and I wanted to cover one more thing that's as I say you can now define how you cluster so we have that here we define our average culture you can also define what distance metric you want to use for questions so you create a distance function okay so see kind of this unit this is how it looks with my define dysfunction is how it looks without so you can see it changes how the question works it's up to you how you like it or you can always use the default so thank you for listening to this video if you have any questions please feel free to post in the comments below
Info
Channel: HowToDataViz
Views: 64,003
Rating: 4.8696537 out of 5
Keywords: R Studio, R language, Heatmap, Legend, Heatmap Legend, Statistics, Bioinformatics, Annotations, How to Make a Heatmap, Clustering, Color Brewer, grep, regular expressions in R, regular expressions, subsetting samples, subset automatically, color bar
Id: T7_j444LMZs
Channel Id: undefined
Length: 15min 46sec (946 seconds)
Published: Sun Jun 11 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.