Explaining Election Fraud (as a Data Scientist)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so last week i was minding my own business just reading twitter some really wholesome late night stuff when i thought about how election fraud actually worked and i wrote this on linkedin republicans are fighting to prevent voter fraud optimizing for precision democrats are fighting to prevent voter suppression optimizing for recall are we actually going to get to a proper f1 score and so i thought that this was smart but then this guy commented on my post i realized that i didn't properly understand precision and recall because i wasn't sure if he was right or not i just knew that they were trade-offs i just had an intuitive idea that applied to the election and i knew that somehow voter fraud and voter suppression are directly related to each other so today let's dive into exactly how they relate and then how the different partisan sides are drawn depending on your beliefs and what you believe about the election if you haven't already please like and subscribe on my channel really helps with uh boosting the youtube algorithm towards enjoying my content uh and really helps me make more videos first let's draw out all the different ways that election fraud can happen here i'm drawing a confusion matrix to understand each scenario the human in this box represents a legitimate voter that ends up voting in the election this alien represents an illegal voter that gets caught voting in the election slash does not actually vote in this case the alien is being suppressed this box in the upper right is now a legitimate voter that can't vote in the election this would be voter suppression we know this happens in the united states and then in the bottom left we find an illegal voter that votes in the election which is essentially voter fraud we assume that this also occurs as well okay so given these scenarios we know that everyone wants legitimate voters to vote in the election and illegitimate voters to not vote in the election so exactly where do the partisan lines actually get drawn the difference between republicans and democrats is what either side believes to be a bigger problem let's say for example that we have 100 votes total in the entire country to make this example simpler republicans believe that there will be a ton of votes that will be fraudulent for simplicity's sake let's say that they believe that half of all the votes being cast are legal and half are real legitimate votes democrats believe that there will be a ton of votes that are being suppressed for simplicity let's say that they believe that all votes are real but half of the votes are being suppressed by voter restrictions and the other half of the people can vote fine now we can actually see where data science is applied we now have a confusion matrix where we have every single scenario of how votes can be cast and suppressed and fraudulent in the election we can apply the definitions of sensitivity and specificity to our confusion matrix we have true positives true negatives false positives and false negatives the way i try to keep the simple is that the second part the positive or negative describes the actual state of what is true and then the first part is the identification so for our example if a legitimate voter votes an election it is a true positive but if a legitimate voter does not vote an election then it's a false negative where the negative represents the voter did not get out to vote and the false indicates that the vote was suppressed on the other hand a false positive means an illegal voter successfully votes an election given the positive means that they successfully voted and the false indicating that the vote was not legitimate let's go back to the equations of precision and recall so how do they actually relate to what the democrats and republicans actually want to optimize for recall is a number of true positives divided by the sum negatives plus true positives as well precision we know is the number of true positives divided by the sum of true positives plus false positives this is why republicans are interested in optimizing on precision republicans are interested in minimizing false positives which is voter fraud and bring it down to zero because precision and recall are direct trade-offs this causes more voter suppression which are the false negatives while they are idealistically not specifically interested in suppressing votes depending on who you ask if you optimize on only getting legitimate voters you're going to make voters jump through hoops prove that they're they need to grab their potential ids maybe drive an hour away to a ballot drop-off to prove that you are legitimate democrats on the other hand are interested in preventing voter suppression which means that they're trying to minimize the value of false negatives this in turn will then increase the value of false positives which results in more voter fraud because they're trying to make it easier for anyone to vote this will obviously bring the increase in voter fraud up when you lower the restrictions for actual voting the only way to actually optimize between false positives and false negatives is what we know in data science as the f1 score which is the harmonic compromise of both sides so this is where politics actually comes in and data science goes away depending on each individual's belief you'll believe that either voter fraud or voter suppression is the bigger issue if you believe there are more cases of voter suppression than voter fraud you're likely siding with liberal values if you believe there are more cases of voter fraud than voter suppression then you are likely siding with conservative values why is this actually the case this needs a deeper dive into a discussion about political science which is outside of my domain knowledge but it's up to you to do the research and understand what the actual big issues are this election is really important so i'd like to end this kind of explanation and hope that it really helped with your understanding of how data science is applied to this issue on election fraud and hope that you'll do more research and actually go out and vote for our november 2020 election of this year thanks for watching and please like and subscribe for more videos on data science from me bye
Info
Channel: Jay Feng
Views: 7,092
Rating: undefined out of 5
Keywords: election fraud, election 2020, data science politics, political data science, data science for beginners, precision vs recall, true positives, f1 score, data science, data science interview questions, fivethirtyeight
Id: AGyFUOEhSP4
Channel Id: undefined
Length: 6min 25sec (385 seconds)
Published: Wed Oct 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.