We're All Data Scientists | Rebecca Nugent | TEDxCMU

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this is my mother the one on the right so my mother graduated from the University of Texas with a degree in English she then received her master's in Library and Information science and was a librarian for several years now after taking a break to raise small children she returned to the classroom and was a middle school teacher in English literature language arts reading in Latin for over 20 years she was a longtime coach of championship future problem solving teams an award-winning educator to hundreds of students and can spot a grammatical error on a street sign from about 300 yards she's an extremely smart woman she's also absolutely convinced that she has no idea what I do for a living so the word illiterate or literacy can be used generally but it's most commonly associated with the inability to read or write at an appropriate level there are millions of people who struggle with illiteracy there are programs all over the country that offer assistance in training to have difficulties with illiteracy as a source of discomfort for first many a source of shame it's not information that people volunteer in numeracy on the other hand is a different story and today I'm applying a numeracy more broadly to difficulties with mathematics or statistics or any of these related topics so that's not a word that is commonly used but it's more readily self-identified so we hear oh I am NOT a math person so Paul earnest at a from Exeter University this is a philosopher in mathematics who focuses on social constructivism which is the theory of how human development and knowledge acquisition come from social interaction with others and he has the following quote a widespread public image of mathematics is that it is difficult cold abstract theoretical ultra rational but important and largely masculine it also has the image of being remote and inaccessible to all but a few super intelligent beings with mathematical minds in contrast to the shame associated with illiteracy numeracy is almost a matter of pride among educated persons in western Anglophone countries and then from my personal experience it seems like every person I sit next to on an airplane has a story to tell about how terrible they are at statistics and how they cheated on all of their exams in college so to be illiterate as a problem and to be enumerate is is what is cool well is numeracy actually a problem yes the answer is yes every few years the results of the latest national adult illiteracy and numeracy survey get released and there's a frenzy of articles trumpeting and sounding the call to devote more substantial resources to a numeracy which includes educating people about how terrible this is and the importance of this issue yeah this is not so much helping the cause yeah so can you imagine Mattel creating a Barbie that giggled as she said I can't read right that's just not going to happen yeah but this is somehow okay all right oddly this is not really helping our cause either in a different way so the characters from the popular TV show The Big Bang Theory are gifted in mathematics and physics and they're portrayed as kind of these loveable quirky nerds and we laugh at their interactions and how they solve problems but for many people in the country this is what a mathematician looks like and so some of us are laughing with them but many people are laughing at them so this this is not helping either our other option is this we see the computer genius creepily surrounded by computers staring at all of the data searching searching possibly stalking searching for information until they find that one significant piece and that's the character that often saves the day but that is not a man you would take home to dinner yes so what I'm going to push back on a little bit today and claim is that what our issue is really about the self perception of a numeracy so how do you feel about your ability to do math and statistics because research has shown that there is no type of math person and that when we self assess our skills it's often done in relative comparison to other skills so for example if I'm strong in rhetoric or language acquisition I automatically downgrade my skills and math regardless of what they are in vice versa if I'm good at mathematics and statistics then I might assess my writing skills as being less than they really are so who are you are you tEEN TOP Barbie are you a lovable physics nerd or are you this guy yeah and that's what everybody is always self-assessing who they are and in actuality you're not who you think we all have slightly incorrect perceptions of our skills so let's let's get back to my mom here for a second while I would be a little cautious in saying this directly to her face my mother is wrong so my mother understands mathematics she understands probability she understands statistics she understands all of these things she does them every single day as she's making decisions throughout her life for example when she goes to cross the street yes she is standing there she looks both ways and perhaps she she's a car coming and as that car is coming she's building a probabilistic model in her head of can I walk into the street right now as the card changes direction slows down speeds up makes different decisions she is constantly updating the probability of what's going to happen next and at some point she optimizes she makes a decision and she steps into the street doesn't get hit by the car it's probability and statistics at its best another thing that we commonly do all the time if we're walking down the hall we're walking in a crowded area we continually assess the space around us we predict whether or not we're going to run into somebody and then we make kind of a probabilistic decision of is that person going to move this way I'll move this way and you're constantly adjusting as you move and when two people both move to the same spot there's kind of a oh sorry and kind of laugh because you know you've guessed wrong somehow you update your model and then you fix it and you move back to the other side these are complicated statistical processes we do we also make decisions like do we need a coat today do we think it's going to rain these are non-stop probability and statistics modeling decisions we are making all the time so if I'm having this conversation with my mom let's look at something that might be kind of more in her in her wheelhouse so this is a poem the road not taken' by Robert Frost and this is an example of a type of annotation assignment that you might see in an English class or a writing class etc this is the stuff my mom excels at so there are a couple of ways that we could think about this assignment or this piece of analysis the first is I could just think about analyzing the theme write the words in the actual poem and when I'm looking at I might think all right I see the word yellow does that mean that it's autumn I'm going to start looking for other words that match that potential theme I start looking for things like do I see more positive words do I see more negative words do I see questioning do I see repetition of words all of this is data that is being processed and helping you build a model toward what is the actual theme here what is happening inside this poem how am I going to interpret it these are all variables that have you been keeping track of the entire time that's one way to look at this assignment another way might be how to actually teach students how to annotate so how to find sort of strategies that contribute to better learning outcomes and I mean things as simple like where did you write on the page did you write close to the words did you write far away from the words did you underline everything was everything important which words did you circle what time of day or night did you actually do these annotations all of this is information that contribute to how a student is learning this poem or learning the skills that we're trying to teach with this poem so how I might handle that problem is all of this data that's being collected I might input as variables into a spreadsheet I could build a kind of a complicated prediction model using some statistical software package it might take me weeks to do correctly my mom the English teacher does this in her head she just reads what the students are writing thinks about all of those data and the variables and creates a prediction model for how the student is learning which features might be more important for her to teach the next day in order to improve their chances of learning so she's doing this all of the time she's not thinking of it as doing statistics or data analysis but she's doing this all the time as are the rest of us the amount of data that we process analyze model collect etc every day is staggering so by one set of estimates from UC San Diego it's up around something like 16 hard drives worth of information for each person each day that's unbelievable so how can people who process analyze model collect etc that much data every single day and also make fairly rational choices all day long based on probabilities how could they possibly think of themselves as not math people yeah so what is going on here and and why am i choosing to talk about this right now why do I think this is more important to be considering right now the explosion of data science all right so data science as a field as ubiquitous now there's constant press coverage and interest from students from industries from parents who incessantly ask me will my children have a job when they graduate data science is one of the hot buzzwords right now let's take a look at what's happening in terms of an education standpoint with programs so we have 40 bachelor's degrees right now that includes minors so that's majors and minors we have 93 certificates and a certificate could be viewed as kind of a supplementary thing so you're majoring in one and then you have a supplementary certificate there are 19 doctoral degrees and there are almost 400 master's degrees in data science and these are in departments like statistics information systems business machine learning computer science many of these data science degrees are interdisciplinary across several departments this is amazing this isn't like the last 10-15 years this is an incredible number of degrees in programs and with respect to the data scientist it is a sought-after job it is considered a top job to have in America and there are lots of opportunities but what actually is data science so here's one graph here's one proposed kind of picture of data science we have math stats and algorithms on the top we have software engineering on the Left data communication on the right this is a Venn diagram so as the circles overlap those areas are people who have all those skills from the overlapping circles and you notice that data scientists is in the middle so a data scientist is someone who theoretically has math stats software engineering and data communication you might notice that the intersection between software engineering and data communication is empty and are you can make of that what you will I'm not sure what that means I personally I personally like this graphic a little bit better there we go this graphic we have computer science we have math and statistics and importantly we have subject matter expertise I'll come back to that and having all of those is the mythical unicorn yes so when I take from this graphic as it's very complicated to think of what a perfect data scientist is so what is somebody who has all of these skills in some sense it's very flexible and interdisciplinary and to be able to find someone who has all of these skills and call them a data scientist is is probably unreasonable my concern is is that Edie as data science is exploding in order to keep up or compete the number of programs has been proliferating so universities are starting data science programs to attract students to help them give them the skills they're going to need for industry but at a national level the conversation is centering more on what are the foundations of data science so how do we understand and how do we understand and define data science as a discipline so for example the National Science Foundation in the National Academy of Sciences have recently devoted a non-trivial amount of financial and manpower resources to this topic and as a result we have we might we can build the programs but will the people actually come to them who's going to join these programs who's going to think yes that's for me and who is going to think oh I'm going to opt out of that and my concern is is that this is going to lead to a stronger bifurcation of programs and people so people who do data science and people who do not do data science and that decision will be made based on some self-perception of skills that's inaccurate so for example everybody does everybody crosses the street everybody builds a probabilistic model and give getting through their day in some sense okay so everybody is able to communicate and talk to people working in data science they do this all of the time now let's do you think that I am letting the people who think they're good at math Institute 6 etc off-the-hook know if you do not understand how people think and write and create and behave then you are also part of the problem is going to be necessary for these disciplines to communicate with each other so I'm not advocating for example that everyone take a data science course or that everyone be in data science I'm advocating for collaboration and communication across the disciplines I'm advocating for everyone in this entire room to think about your skillsets and what kind of person you think you are and really take a hard look at that because you probably know more than you think you do and you can communicate with these areas in a stronger way than you probably believe you can and when you come to several roads diverging we want you to be able to see yourself on several paths and not be forced to take one of them so coming back to my mom the woman on the left she has a undergraduate degree in math and statistics she has multiple graduate degrees in statistics she works in statistics she spends a lot of time on a computer she is a data scientist the woman on the right she has degrees in English literature speaks Latin she analyzes texts teaches people how to write teaches people how to read critically she is also a data scientist you are all data scientists and imagine what we could accomplish together thank you [Applause]
Info
Channel: TEDx Talks
Views: 41,710
Rating: 4.9190478 out of 5
Keywords: TEDxTalks, English, United States, Education, Data, Higher education
Id: YMnqPTLoj7o
Channel Id: undefined
Length: 16min 24sec (984 seconds)
Published: Fri May 19 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.