Code Interpreter for Data Science Is Here & What It Means For Junior Data Scientists

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
as a junior data scientist I have a vested interest in all these new AI technology tools and a few months ago code interpreter was released and was in Alpha and now finally it's available in beta to all gpt4 users and I am very excited to see if it's as cool as everybody has promised I figured that a lot of you guys would want to do some initial poking and prodding around with me and then because I know this will be a comment I get at least once down below I will be talking at what I think the impact will be on Junior data scientists and prospective data scientists so let's take this journey together okay so what is code interpreter basically code interpreter is a version of Chachi VT that can handle code file uploads and also can produce images and I think audio as well so it's much more multi-modal and because it can handle code there's a lot more data analysis and data sciencey stuff you can do directly in chat gbt without having to then take it into python so walk through a single kaggle project and see just exactly how powerful this tool is to activate this it's pretty simple you need to be a GPT plus subscriber so you literally just go to gpt4 and you can see there's nothing over there but when you come down here to your profile picture settings beta features just flick that on and then now when you click on the drop down you have the option for code interpreter so you can see before here there was no option for an upload but as soon as I do this I get this little plus symbol so let's get a data set where better to do so than kaggle and just in case you are a beginner and you don't know what kaggle is it's basically a website where you can get free data sets on almost anything like it's ridiculous that this resource is free actually let me not say that because with Twitter and all these people every time you complement the fact that it's free they start charging for it so I expect it to be free but anyway you can get data sets so let's go with the sales data set because that's just like a solid start okay let's go with this one so you literally just click on that and then you can download here after you know you've made your profile and what have you okay so now it's downloaded and actually what I would like to do is not give it any context at all let me just give it the CSV and see what it comes back with okay that's pretty cool so you can see that it has basically just read in the data frame and read in all the columns and told us what exactly they are I'm curious where it got the definitions for some of these or is it just using common knowledge to know what these things mean let me take a look if there's any detail within the data set that's interesting so it looks like it's just inferred what these mean because within the Excel sheet I couldn't find anywhere where it defined it and those exact words and even on kaggle the definitions are slightly different to watch how GPT has given so that's pretty cool already let's step into the mind of a complete newbie so it's asking us what analysis it wants us to perform on this but let's ask it okay cool so it's basically giving us options so again this is my first use of this but instinctively it feels like it's doing what would happen if you were to describe all of the columns to it before we could upload stuff and then just ask it what is possible to do with this this data set but then now you can just read the CSV directly and then infer and tell us okay cool so let's dig deeper into this uh customer segmentation and again I want to give it the minimum possible instructions just to see how good it is at inferring and deciding what is the appropriate course of action okay so it's asking me to confirm yep that's fine okay so it's presented us with this table and says from this table we can draw several conclusions female members made the most transactions and had the highest total expenditure I mean I would have to do the analysis myself but if this is just taking averages I mean it's good but these are quite definitive statements based on just using the average and nothing else but pretty cool that I could do that let's see how it worked it out maybe there's a deeper explanation okay so basically like open AI has told us what's happening is that instead of just giving us the python code it now has the ability to execute that within chat GPT so that's what it's doing here when it says show working which is so it basically the group did which gives us those four different combinations female normal female member male normal male member and then for a number of transactions it used the sum on the total column average average okay that seems legit but personally in the real world I would dig a little bit deeper rather than just going with the averages because there may be other factors that come into this but let's see if it can visualize this for us oh that's interesting so it made a mistake and then it's running it again to try and get the correct answer that's cool that's cool man so it's brought us what is a Seaborn this looks like Seaborn so it's brought us a Seaborn visualization and basically these are pretty basic visualizations but they are correct one thing I would say is why are both of them green wouldn't you want more of a contrast there just to make it more evident and something else I've learned is that as much as possible avoid using red orange green unless there's a specific need for it because in our mind just because of how we've been trained that always means positive neutral uh negative but dude I'm not complaining about this what I'm imagining is if you're a super small business either a one-person business or a startup and you don't have strong data skills or you or you just don't have the time to do this this can be super cool to just drop in there so imagine if you had a Shopify store or like an Etsy store this would be sick okay let's do a couple more things because I want to focus on the discussion on the impact on the data science market for junior data scientists but let's do some temporal analysis before we go on to that interesting so again it's just giving us different options which is really cool because it's not just prescribing what we should do it's giving us options so I guess you have to be able to decode what is most useful to you I think here I would like to see how the customer Behavior has changed for the different segments over time since that's what we're working on before so I'll just be lazy and copy paste that interesting so it's saying this analysis assumes the data spans a single year I wonder why I can't just read the data and check if it does um let me look at the original Source what is this okay so evidently there's only three months worth of data I didn't look at the original data set but again this is super cool but I hate this visualization I would never give somebody this because first of all say what month it is and then a line graph for just three months I mean fair enough but you know what do you guys think is it in my head the weird thing about data science is that this isn't wrong but everybody has their own style of data science of being a data scientist is kind of like being an author where hey this sentence isn't wrong but I wouldn't write it like that so I wouldn't visualize this like this but it's super cool guys uh who are we kidding but can I export this okay interesting it won't let me explicitly export them which I guess I don't need to because I could just save image so that's not really necessary so hypothetically we've sourced our data it's done the analysis let's see how well it writes a report this is not groundbreaking because I'm sure a lot of people have been doing this already and this is just basically a summary of what it's told us before but what I'm really curious on is the next steps that's interesting because up here it said that in most months normal customers have slightly more transactions than members which is what we're seeing there that's the number of transactions and then down here it says that members tend to spend more per transaction so we should launch initiatives so does that mean even though I haven't asked it to it's checked out what the average spend per member per transaction is or have I missed something okay interesting so it's actually done this up here in this code I had missed that but that means that if I hadn't seen the working being done oh no yeah it did do it before scratch all of that but but let's get on to the most important part which is what does this new tool mean for for us as Junior data scientists or even people who are still considering hopping into the field while at the moment is very exciting if you've been following the channel you know that I work at a startup So that obviously means we can't afford to hire an army of data analysts and data scientists so this can just make my data science department super super productive because we can spend more time doing the higher value projects so like all AI tools provided in the past few months super impressive can't even deny that but I would need to do more digging in before I can make any definitive statements because this was my first look at this but let's get to the Crux of this video which is if you're still in the consideration phase or early stage what does this mean for your job prospects and to be honest I'm in the same situation I'm a junior data scientist now well the first thing we need to address is sensationalism whenever something happens like this in the AI world if you hop on Twitter the thread Bros will be out in full force and the thread Bros are basically people who say things like data science is dead we don't need doctors anymore charge EBT will do everything for us but then whenever you do click onto their profile you'll see they've never worked as a data scientist they probably haven't worked in Tech maybe they're a marketer or something like that but the end goal of all of those tweets is to be a sensationalist as possible and get as many engagements for them as possible which is why in their replies it's only other thread Bros replying to them saying Oh yeah thank you this is so cool when in reality those other thread Bros haven't read what the main thread bro has said what I'm trying to get at this is that be selective with who you receive AI news from because a lot of these sensationalist sources will just cause you to have a bunch of anxiety around your future prospects look for people who give you practical level-headed advice or actually experts in the area I mean I'll try link a couple of sources that I trust down below but just calm down first of all however there there is no denying that Technologies like this will have an impact on the industry in general and will cause a shift in certain aspects a few weeks ago on the channel I had an interview with Dave ebler who is on top of all of this AI stuff and he and I came to a similar conclusion which is that in Prior years as a data scientist he could heavily lean on the fact that he knew how to code because almost nobody knew how to do that and you might not have had to be great at anything else just because that was such a rare skill set like imagine if you worked in the UK but all your business was with somebody in China somebody who knew Mandarin would have such a big Advantage because they can help translate in this example nobody speaks the other language but but this one guy but that person who has the ability to translate wouldn't really have to do much else just because that one skill is so valuable but now everybody can speak basic Mandarin at least so they still do have an advantage because they're Pro but now they have to add to their skill set basically just like how we as data scientists now have to expand that skill set and find other ways to add value to the business this can be using the extra time to provide increasingly complex or more useful code it could be increasing your output or it can be growing your domain knowledge so that you're not just super knowledgeable in data science you also know a lot about your specific domain so you have a specialist skill set and to be honest once you get out of the schooling system coding is still super important but I was having a discussion with my mate Ed who's also a junior data scientist interview coming on the channel soon but we both spoke about how coding is literally just a part of being a data scientist and you only realize that once you're outside of the schooling system where where you're no longer just worried on getting the highest Mark possible so all this is doing is accelerated in that little piece where you have to go from just being focused on being able to code to being able to implement that in the real world and essentially this is something that I will be trying to teach or to get across in my new newsletter which I've just learned that I'm excited about because I'm rarely going to be emailing you about hey this is the best way to code hey this is how you specifically do XYZ instead I'll be giving you the other skills that you need as a data scientist from a more higher level perspective because that is what's going to separate you from most other data scientists because if we have a room full of data scientists right now everybody there can code everybody knows maths so it's these other skills that will help set you apart so that's what that newsletter is going to be about and if it's interesting to you it's free it's down in the description but moving on the other thing with data science is that it's deeply embedded with machine learning and AI so who exactly is better placed than we are to be able to learn the underpinnings of these skills not just putting it into a chat box on GPT but to understand the underpinning of these skills so that we can bring more business value in a unique manner I think we're well positioned to be able to do that and on top of that more and more data is being integrated into the world I mean most companies do not even use data science so now these skills will mean that more companies will be interested in using data and who has the best opportunity of being the person to be able to do this for them in a competent manner so what I'm saying is that we have to keep leveling up as data scientists so make sure that you keep learning those fundamentals to a good level but also keep one eye on what's coming down the pipeline with AI as well as other tools so that you can integrate them quickly and help yourself become more and more effective as a data scientist if you're new around here I'm data Nash Junior data scientist documenting my journey from being a newbie to one day being a lead so that you can avoid the mistakes I made and get to where I get to or higher in a much quicker manner if that sounds interesting to you hit subscribe
Info
Channel: Data Nash
Views: 4,646
Rating: undefined out of 5
Keywords: data science, data analytics, data science job, data engineering, tina huang, study md, ali abdaal, ken jee, code interpreter, code interpreter chatgpt, code interpreter chatgpt data analysis, How I use ChatGPT as a Data Analyst, fireship code interpreter
Id: k9yTdvS2lgk
Channel Id: undefined
Length: 14min 24sec (864 seconds)
Published: Sun Jul 09 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.