1. Phylogenetic analysis of pathogens(lecture - part1) -

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
right so what is phylogenetics so examining the evolution relationships of organisms in its in very terms here we have a sequence alignment for genes which are found in a relation set of of viruses so we need to do the alignment because you need to identify what the sequence variation is between those different viruses in order to be able to estimate what the relation as' is for those for that group of viruses you can see sometimes it's complicated by the fact that there might be missing sequence so in general you will identify what the variations are in the conserved sites across a particular and gene sequence but you the algorithms can also take into the fact that there might be missing data at various points in in the alignment this usually DNA sequence based but it could also be based on amino acid sequence or other phenotypic traits which differ between a set of organisms but we'll be focusing largely on on on sequence data so this is a phylogenetic tree based on those viruses the virus aligned sequence alignment that we saw in the last slide so what can we get from the phylogenetic tree so we see these branches the length of these branches represents genetic distance okay so going horizontally we can see that these viruses here verses 9 and 10 are more distantly related to all of the other and viruses so it's the length of the horizontal lines which is important in foreign genetic distance and this down here is the key so point zero seven refers to nucleotides per site in the in the alignment okay so that gives a measure of the scale of the genetic distance between each of the virus groups okay feel free to interrupt at any stage if I'm not being clear or if you want to ask any other questions okay and this roughly speaking is also a measure of the time since these viruses diverge so they the greater the length of the branches the likelihood that it's a longer period of time since divergent from the most closely related sequences but we would need in order to able to calibrate the tree in order to put a time frame on it we would need to have a sense of the rate of mutation that had occurred across this the phylogenetic tree so we need to have an estimate of the mutation rate but overall the the length of the branches is approximate to the length of time since divergence so this is the same tree and we've indicated here in green these are called the tips of the branches also some people would refer to these as the leaves on the branches lots of tree analogies here and these of course are what we've actually sampled so these are the virus sequences so we know quite often more about these viruses so we might know what kind of host they are isolated from when they were isolated what type of infection they were isolated so we would have potentially a lot of metadata about these and these are kind of snapshots in time of that virus isolated from an infection what the tree allows us to do essentially is to go back in time and predict when common ancestors existed so what are the ancestral States for these viruses and these ones which are called nodes in the tree okay so these are internal nodes sometimes the tips of the leaves are also called external nodes but I prefer to leave the term nodes to describe those which are internal to the tree okay so these nodes which are labeled here a B and C because of the fact of this genetic distance is proportional approximately to time we can say that the ancestral virus which is indicated by at this node a existed prior to the virus a predictor that B and again prior to that at C and virus C here is is ancestral to all of the viruses represented in this clade here or this group within the file a phylogenetic tree the other thing that's important to have is a sense of the level of statistical confidence that we have in the shape of the tree and particularly the positioning of the nodes which contributes a course to the to the shape or topology of the of the tree I can do this by a number of ways bootstrapping is is as one way another way is by basing posterior values so for bootstrapping essentially that allows us to get a prediction on the level of confidence so one is a very high level of confidence in the positioning of that node and the topology of the tree this is a slightly lower level of confidence associated with them this is even lower again this is more likely to happen by chance than it is by the actual real topology of the tree and the real relationship of the virus the virus sequences so essentially if you see a bootstrap value of one that's very high level of statistical support for the positioning of that node and the branch pattern in in the tree okay and what I've shown you is kind of a standard rectangular format of of the trees you can get other formats this is exactly the same set of sequences yeah so these are other different formats of phylogenetic trees and I'll point you to this website which my colleague professor Andrew amber who's based at the King's buildings has put together which very nicely goes through how to interpret a phylogenetic tree you
Info
Channel: The Roslin Institute - Training
Views: 139,806
Rating: undefined out of 5
Keywords: Phylogenetics (Field Of Study), Biology (Media Genre), bioinformatics, phylogenetic analysis, data analysis, The Roslin Institute
Id: t1vAhQvukRY
Channel Id: undefined
Length: 7min 17sec (437 seconds)
Published: Fri May 29 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.