Are Small DNA cM Segments Valid for Genetic Genealogy Research?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if there's one debate that seems to get different genetic genealogists riled up it is about small segments and whether or not we should even be looking at them so let me try to tackle a couple of issues with small segments today howdy welcome to family history fanatics where we love helping you climb your family tree and have fun along the way small segments happen in dna and if you go on to facebook in any of the genetic genealogy groups you're going to find debates that are going on between people about whether or not they should be looking at small segments and whether or not these small segments can actually be used for genetic genealogy so let me start by going with a basic question what is a small segment to begin with how is it really defined and with any new field in a lot of cases these things are somewhat fluid and there may not be a exact definition so i'm going to start with a definition of hey a small segment is less than about seven centimorgans which seems a little bit mealy mouth there because i'm using less than and about in the same sentence some people may say it's less than ten some people may say it's less than six but somewhere around there we all i think agree is where the small segments start the exact point where they all start we don't agree but somewhere between this you know five to ten small segments start so i'm just going to use less than about seven centimorgans the other thing about small segments is that this is where the probability of false match begins to be larger than the probability of a real match now i'm not going to go into everything about a false match today one thing to look at for a false match is if you have that match and neither one of your parents has that match then it's a false match and that usually happens in these really small centimorgans with that definition let's go and let's take a look at some data now i went through and i found four data sets not all of them are in fact none of them are using the same method to come up with their information they have different algorithms to calculate this some of them are using different data sets in order to accomplish this and some of them define things a little bit different ways what i've tried to do with this is i've tried to basically simplify it down into calling a fault segment a fault segment and then using the information that they have to identify what they consider percentage wise what a fault segment is so let me start with a table that you can actually find on the ice og wiki and this was from john walden uh some research that he did that was reported to tim jansen so we'll just call this john walden and tim jansen's data what they found is that at around 13 centimorgans there's really no fault segments but as you go progressively lower than that fault segments start to increase at 10 centimorgans it's only about 4 percent which is still really low but by the time you get to seven centimorgans it's almost a third or really a third of all of the segments or false segments and it gets progressively worse as you go down from there to the point where at three centimorgans you're looking at 89 of those segments are false segments so that's our first data set now family tree dna just recently came out with a white paper that explains some of their methodology and they had a similar table with would you guess actually somewhat similar although varied a bit results again at 13 centimorgans they find about 1 percent our fault segments which is very comparable to what john walden and tim jansen have found and that goes up to about two percent at ten at seven it's only ten percent as opposed to the thirty three percent but there you can see this trend is still it gets progressively worse in fact there are four centimorgans and three centimorgans show a lot more fault segments than what john walden and tim jansen found in fact at three centimorgans 97 i mean that is almost all of the segments you're looking at are faults only one out of 33 went out of 30 now one out of 33 segments are going to be true segments that's a lot of matches to go through to be able to find different things that are going to be helpful for you now with ancestry dna they didn't have a nice paper-like family tree dna or a nice table like john walden and tim jansen so i had to extract some of this from some different graphs they had with the explanations so some of this may not be exactly the same as what we were looking at before but i think it's close enough and what i found is that obviously the trend here is the same centimorgans are seeing about three percent false segments going down to six centimorgans it's about 50 false segments and then we have 23 and me they also looked at this again in a different way and i had to extract some of this data from just tables that they had but starting at six centimorgan they're finding about fifteen percent fault segments all the way down to three centimorgans where they're seeing as much as seventy percent false segments it's a lot of data and it's good to actually look at it all together now there's obviously variability between each one of these groups because of how they looked at it what algorithms they're using what data they're using to be able to come up with this information overall though one of the things that we can see is that there is a definite trend once you hit seven centimorgans that the amount of fault segments goes up rapidly above seven centimeters particularly above ten centimorgans there's very few fault segments that anybody is finding and so i think that we can agree from these four different data sets that as we go down in the size we're going to see more and more false segments and if you want to know why fault segments might be bad then you can watch a video that we have on this channel about false segments now previously i've covered about the extra effort then that having to sort through these fault segments is to where your genealogy work you're going to be running down a lot of rabbit holes now blaine bettinger calls these fault segments poison i wouldn't go that far i would call them a total waste of time because when you think of some of these at three centimorgans where if 97 is the correct number that means you're having to go and look through 33 of these fault segments to find one true segment if you just assume that you're spending an hour on each match you spent 33 hours only one of those hours was actually fruitful the other 32 hours were really totally wasted so i don't want to go into that anymore here but what i want to look at is something else statistically so if we have this small segment we'll just say at seven centimorgans so what can possibly happen with this small segment well there's three things one this small segment cannot be passed on to the children two this small segment could be passed on intact and then three this small segment could be passed on recombined in other words it becomes an even smaller segment instead of a seven centimorgan now it's maybe a six or a five or a four or three centimorgan segment now looking at this you see that there's three possibilities and you might be thinking okay well there's you know a 33 chance of any one of these happening and you'd be wrong it's not 33 percent so let's look at this a different way we have here green represents that segment so in this first block that you see that totally green that means the segment is passed on totally intact the next block is white that means the segment is not passed on at all and then the other 12 are how that segment might be divided up if we're just going by centimorgan so we could have the first little centimorgan of it and the rest not passed on all the way up to all the first part not being passed on but the last centimorgan of that being passed on so here's all the possibilities of what could be passed on without dividing up centimorgans into half centimorgans or quarter centimorgans and in this case we end up with 14 possibilities so remember i said before there was three possibilities but if we really expand these out there's 14 possibilities and simple statistics of saying well 14 possibilities divided by 100 was b seven percent of any one of these happening and that would be wrong as well it's not that simple so let's go back to what a centaumorgan is so a centimorgan the definition of a centa morgan is a little confusing i have made a video about what ascent morgan is but in essence centimorgan is one percent probability of a recombination happening so another way of looking at that is a centimorgan is a 99 probability of no recombination happening so in our hypothetical seven centimorgan what we have is we have a 99 percent times 99 times 99 times 99 seven times 99 probability of no recombination happening if we do the math here what that means is there's really a 93 percent probability of no recombination happening in other words all of these recombined segments these much smaller segments being passed on there's only a seven percent probability of any of those happening any of those there's only a seven percent not seven percent each but seven percent total for that entire group so 12 of the 14 possibilities only have a seven percent chance of happening which leaves 93 between the other two and this is where we can actually do some simple math and just divide by two i've rounded up so each one has a 42 percent probability of happening there's a 40 probability of this seven centimorgan segment being passed on intact there's also a 42 probability of none of that segment being passed on so interestingly enough if we take a look at a great grandparent they have a 42 probability of passing on this seven centimorgan segment to a grandparent and that grandparent also has a 42 probability of passing on that seven centimorgan segment to the parent and that parent has a 42 probability of passing on that seven centimorgan segment to the child to you in essence if we actually multiply all of those together what we get is 7.4 percent in other words with a seven centimorgan segment it is more likely that you received that exact segment intact for three generations so it's the exact same seven centimorgan segment that your great grandparent had that is a higher likelihood of happening than you passing on a recombined version of that segment in other words a shortened version of that segment to your child and this is one of the reasons why when we look at these small segments we can actually see that they get passed down from generation to generation to generation it's because at that point once you're down to around seven centimorgans it's much more likely that it's either going to be passed on intact or it's not going to be passed on at all then it is that it's going to be divided up so that's one of the things that can happen with these small segments is being divided up into an even smaller segment is very unlikely and so starting roughly at about seven centimorgans you're going to have more of a chance of actually finding these passed down multiple generations intact then you are going to find that segment being divided up in succeeding generations if you still have questions about dna then consider joining us for our live streams where we answer your dna questions or become an fhf extra member where we have even more training and live streams to answer your questions now if you'd like to learn how dna is passed down through generations this is actually an older video i made you can watch this video up here but if you want to learn something else about dna then why don't you watch this video down below 87 is the correct sorry is past or sorry okay my nose is itching
Info
Channel: Family History Fanatics
Views: 3,018
Rating: 4.939394 out of 5
Keywords: How many small DNA segments are legitimate, The Small Segment Debate Is Over, Are Small DNA Segments Always False Positives, Small Centimorgans, Small cMs, False DNA matches, Small Matching segments, shared centimograns, family history fanatics dna, genetic genealogy explained, genetic genealogy, centimorgan explained, centimorgans, dna research, what is a centimorgan
Id: aFSNXlYFnVI
Channel Id: undefined
Length: 13min 52sec (832 seconds)
Published: Wed Sep 29 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.