Protocol 6 - DNA Sequence Analysis Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
before you begin download the sequence viewing software appropriate for your computing system in this example I am using a Mac so we have recommended using a Mac based program called for Peaks check the protocol for our suggestions for PC tablets and netbooks and consult your IT department for your institutions installation procedures next you need to download the sequence data we will distribute your class's data using the canvas platform navigate to the modules tab and scroll to the laboratory module within the module there will be a page called sequencing results download your class data by clicking on the link the data is downloaded as a compressed dot zip file the Mac automatically uncompressed is the file but PC users will need to right-click and uncompress the file in order to access the data keep in mind that the data is normally held in a Downloads folder unless you specifically save it elsewhere the data files have long complicated names but the designation you should pay attention to comes after the 10 digit barcode you can find your sample by finding the sample number or the sample name followed by the target gene and the direction of the read for this analysis we are specifically interested in a b1 files which can be read by the downloaded sequencing software there are several ways to open the file you can double click it if it has been linked to the software open the file directly from the software or drag and drop the file into the software let's look at some data when you open the file there's a lot going on visually remember this data represents the detection of fragments differing by one base pair in size during the process of a PCR reaction that includes dye deoxynucleotides the line traces represent fluorescence levels of the tagged dye deoxynucleotides as they migrate through the sequencing machine the length of time that it takes to migrate through the capillary gel matrix corresponds with the length of the amplified fragment the gray bars behind the line traces represent the quality of the base call at that location these quality bars may be represented differently in other programs but most will have them if the call is of good quality the bar should fill the height of the window on the top of the window letters represent the base called by the software at each nucleotide location so what is good quality and what is poor quality good quality sequence reads have distinct steep and smooth single line peaks representing well separated fragments as they pass by the fluorescence detector in the sequencing machine if Peaks overlap are jagged or contain broad shoulders then the software base calls are no longer reliable in this example the entire file is of poor quality where you can see overlapping lines jagged lines and broad lines this could be due to contamination of the sample multiple PCR products or improper priming or process of the sequencing reaction either way this data is completely unusable upon opening another file we can see what good quality traces are supposed to look like distinct steep and smooth single line Peaks the base call at the top of the window should reflect the color of the trace below it however there is almost always poor quality sequence at the beginning and end of even good quality reads highlight and delete the area of low quality calls to be deleted this does not eliminate the raw trace data but simply removes the base calls by the software before moving on scan the sequence by eye to ensure that the software is making appropriate calls one thing to look out for is artifacts these aberrant traces will reduce call quality and may tempt you to trash the file however if you looked at several data files you would see that this trace exists in every single file indicating that it's not poor quality sample but an error in the technical process of sequencing what can be seen is that behind this shouldered artifact our strong quality Peaks in these artifact regions is important to determine based calls manually as we visually inspect through the artifact we can check the calls above made by the software g-gee T G a G and here is an error this T call by the software is really supposed to be an a double click on the base call above the error in order to correct it save the file to keep your changes
Info
Channel: The Jackson Laboratory
Views: 42,200
Rating: 4.8588233 out of 5
Keywords:
Id: iqAmkNSu3oI
Channel Id: undefined
Length: 9min 33sec (573 seconds)
Published: Tue Aug 25 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.