ChatGPT Code Interpreter vs Noteable Plugin: Which is best for data analysis?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi there my name is Chad Skelton uh in previous videos uh on this channel I've been investigating how the notable plug-in for Chachi PT Plus can be used to do some pretty sophisticated data analysis and data visualization without knowing how to code or or knowing any technical skills like that one of the reasons I started playing around with uh the notable plugin was because another tool that I had heard about called code interpreter which did something similar allowing you to basically tell Chachi PT to run python code for you and do data analysis data visualization it was in Alpha and was not available to most chat CPT users and I was on the waitlist for a really long time and finally this notable plugin came around it seemed to have a lot of the similar functionality and so I started playing around with it instead if you have chat GPT plus though you now have access to code interpreter just like you do have access to all the plugins and I thought I would in this video talk about some of the differences between the two tools because I've been playing around with both of them trying to do some of the same things I did with the notable plug-in um in the chat GPT code interpreter and I've discovered some interesting strengths and weaknesses of each of the two tools and and been thinking a little bit about how some are better than others for certain things um just a reminder if you have chat GPT plus so unfortunately this is only available to those people that have the paid version of chatgpt the way to enable both the code interpreter and the um notable plugin is to go to your settings and click on beta features and make sure that both of these toggles are on you turn the plugin toggle on and then you turn the code interpreter toggle on neither one will be on by default and then when you start a new chat you're going to want to make sure that you're in gpt4 not GPT 3.5 and then if you've enabled those you'll have a choice if you want to use code interpreter you just click on code interpreter plugins it's a few more steps you want to click on plugins and then I've already got the notable plugin enabled but if you don't you just click on here scroll down your list of plugins which if you're just starting out will be none click on the plugin store and then in here you can search on notable and it's note with an e and then able note able as opposed to a notable person and then you'll have a button here that just says install and there might be a few other steps to link your account on chatgpt with your account on notable so I'm going to talk about the differences between the two but for those that are not familiar with either tool I thought I would start with a really basic thing which is what can code interpreter or the notable plugin do in terms of giving it data asking it to do some analysis asking it to do um some data visualizations and the data set I'm going to be using which is the same one I used in my first notable plugin video is this data set that I worked on when I was a journalist at the Vancouver Sun and which I've used in a lot of my training which is bike thefts in the city of Vancouver this is a data set that's about 7 800 rows long and it's uh every single bike stolen in the city of Vancouver the date and time when it was stolen whether it was a bike worth less than five thousand dollars more than five thousand dollars the district that it's in and location latitude and longitude so in code interpreter once you've selected that model you'll have a little button here and this allows you to upload a file to chat GPT which is probably the biggest thing that's different than what you're used to with chat gbt so you can upload CSV files Excel files you can upload PDFs you can upload image files and then ask chatgpt to do things with that file so I'm going to upload that uh bike thefts dot CSV file um and you can ask really specific questions of chat GPT and then when it gives you charts you can ask it to change the charts and all that kind of stuff but I think in terms of showing people what chat GPT can do and this is true of both the code interpreter and the notable plugin is to show you how powerful it is with very very limited prompting so in this case I'm literally just going to say please analyze this data set for me and make some charts and that is all that I'm going to do and as in previous videos uh when things were taking a little bit of time when it's thinking all that kind of stuff I'll fast forward it one little note here though with both the notable plug-in and um the code interpreter you'll see this little working and then finished working button and you'll see this show work button uh if you click on that in the case of code interpreter you'll actually see the python code um that um chatgpt is run and then the output of that python code as we'll see in a second with notable it's a little bit different it's showing you the API call in the API response but basically you don't have to show that but if you're curious about What specifically chat GPT is doing this is a really useful feature okay so we can see that uh it starts by talking about what the fields and the data set are um and again this is one of those things that that I think shows the power of this tool there's nothing in the data set itself I can just show it to you here that actually tells you um what uh that uh data set is okay so uh or sorry what the fields in that data are right so you know what a human being could guess what they are but there's no metadata that's telling you what those fields are but you'll see here you know it's actually making some pretty good guesses about what that is right date time the date and time the district occurred but you know District the police district where the theft took place I've even told it this told that this is policing data right and it's sort of making that um conclusion right it provides some basic summary statistics how many unique values there are what the range of those values are stuff like that and then it says let's create some visualizations and again remember my only prompt here please analyze this data set for me and make some charts right so I'm not told that anything about the data I haven't even told it it's in the city of Vancouver you know it's please using data any of that sort of stuff and it jumps right into the analysis uh it shows the number of thefts per year you know Rising pretty uh substantially in the last few years now I know because of this data set these are actually kind of um outliers in the data set this data set is actually just for the last five years so these for some reason they might be data entry errors or something like that but it's showing us that data here um it says let's analyze the data by District shows which policing districts have the most bike thefts um by time of day now this is interesting night morning afternoon and evening that's interesting because yeah there's not actually a field in here for time of day so um it's interesting that it would do it that way it's telling you you know what the periods are for that so morning is 6 a.m to 12 p.m afternoon is 12 P.M to 6 p.m uh the highest occurred in the afternoon and then it says let's create a heat map to visualize the geographic distribution of thefts this is something notable is done a few times as well it takes latitude and longitude fields and makes a scatter plot which I honestly find a bit strange as opposed to making a map but it's something both tools do and then it provides a nice summary thefts are most common in the afternoon evening certain districts experience a higher number of thefts tests are concentrated in specific Geographic areas and actually it's pointing out the data for the years 2001 and 2023 may be incomplete I don't know why it's giving us those dates because those dates are not oh 2001 is but 2023 is not um you know so pretty sophisticated analysis here um you know I can prompt it a little bit more but not by much by saying can you do some more analysis and charts for me please okay and it's actually telling me kind of what it's going to do it's going to look at thefts by month theft by day of the week the variation of thefts by hour um so we've looked at these broad categories but we haven't looked it up more specifically it provides me a nice chart here showing thefts by month and then it does a little bit of analysis here again not rocket science but making a point that you know um the thefts are higher in the summer months of July and August and this is probably because more people cycle when the weather is nice looks a day of the week showing relatively flat and then it says let's look at thefts by hour showing me some of the the spikes in uh when thefts are most likely to occur and again providing a nice uh summary here okay so again pretty sophisticated analysis pretty sophisticated uh charts only prompts please analyze this data set for me and make some charts can you do some more analysis and charts for me right um please that's it right so so making some and these are these are you know I can I can tell you from when I did my stories of the Vancouver Sun based on this data set when I teach this data set these are the the this catch catched most of the findings um that I focused on in my analysis with the exception of the geographic because it didn't make a map so it didn't really talk about some of the geographic patterns in the data but other than that it picked up on a lot of the other uh interesting patterns okay so now let's take a look and see uh what we get uh when um we ask notable to do the same um analysis okay so uh the first thing um to say about sort of one of the benefits of code interpreter is it's super easy right like you basically just upload your file ask it a question and Away you go um whereas notable does require uh at least a couple more steps there so first you need to obviously say that instead of using the code interpreter I want to use plugins and then it's already here selected notable but if for some reason notable is not selected you would just select it and again if you haven't already added it you can add it from the plug-in store the big extra step you need to do with notable is you need to sign up for notable account it's free takes about 30 seconds it's not a big deal usually then you need to also um have a project so notable is set up into different projects and you're going to want to have a default project and tell chat GPT that that is the default project and the way that you do that is you copy the URL for the project and say please use this as my default project and then if we want to analyze files there's there's two ways that you can do this in notable one is if you've got the file somewhere else like on GitHub or some other server you can just point the the UR you can paste the URL into chatgpt and and notable can can fetch it from there but if not what you need to do is just upload the file using the upload button here and then it will be in that project and you can see here that in my notable project I have that exact same file bike thefts dot CSV so we're going to try to use the same language and just sort of see uh how chat GPT does things a little bit differently okay so I've copied and pasted the exact same prompt that I had for code interpreter please analyze this data set for me and make some charts but of course that's not going to work exactly here because I haven't actually told um I can't I can't upload the file right so so I do need to prompt things a little bit differently here by actually telling notable where the data set is so in this case I'm just going to say please analyze the bike thefts dot CSV data set in my default project for me and make some charts and I find referencing the default project specifically is helpful because it just reminds notable okay I'm going to be doing the analysis sometimes you get into a bit of a weird situation uh with notable um and code interpreter actually I find where Chachi 50 forgets that it has this ability because um up until recently Chachi PT couldn't actually do code couldn't actually fetch things from the web all that kind of stuff and so sometimes it just seems to forget that it has these added functionalities um as a result of plugins but other than that basically the exact same prompt let's see if it analyzes the data any differently than code interpreter okay and similar to code interpreter you have this little sort of thinking box that will stay closed but that you can open up to kind of see what is happening under the hood and again the the style here is a little bit different because what we're seeing is the API call to notable and the response from notable right so it's a bit of a there's an extra step here right so it's not the code being written directly into Chachi PT it's kind of like making this request of a separate service and then seeing what response uh it gets back so it looks like this first one is basically just looking for the file uh this next one um it looks like it's creating the notebook and so uh and again I I talk about this more in the other videos I've done so I'm not going to belabor at this point but one of the things about notable is it's a Jupiter notebook and so it's basically a document that has sections of code in it and then the output of that code and then you can both edit those documents and share those documents and then the last one is it's pulling um the first few rows of the document and and so in in essence is sort of the same thing this is python code it's just that it's kind of sending the instructions too notable to write this python code in the notable notebook and then it is sending back this response and then one thing we get which is different than what we saw with code interpreter and this seems to be very typical of the kind of workflow of how things work in notables we see the first five rows of the data whereas you may recall in the code interpreter version we were getting sort of more of a high level summary about the number of values and the data and the range and all that kind of stuff and now it's creating some charts based on the data and these are quite similar to what we saw before so we're seeing the number of bike thefts per year number of bike thefts per district and then it kind of stops in this case and this this I find happens and it's sort of idiosyncratic with both code interpreter notable where it just stops doing things um sometimes because it sort of reaches the end of its own thought process no that's the right term for AI uh sometimes it seems to stop like mid paragraph which seems to be sort of a limit on the number of words or tokens that it has in my experience whenever it does that for either reasons you can literally just say please continue or and it'll it'll pick up where it left off case and I was saying sure let's continue with the analysis we can look at the time of day when most bike thefts occur and the whole time that um Chachi BT is doing this within my testing project it is creating a workbook and this is called biceps analyst 12 because I practiced these things many times before I decided to record my videos that's why I have so many of them but if I click on that you'll actually see um those chunks of code and the output of that code um in the notebook and this is kind of a permanent or until you delete it document that I think is actually in the same way that code interpreters kind of biggest Advantage is that you can just go you can just upload your file and just start doing it I think this is probably the biggest advantage of notable is that it's creating kind of a a record of what you've done outside of chat uh GPT and I'm certainly not an expert in in notebooks but from what I know about notebooks one of their advantages is that you can add things you could add a bit of text description you could remove some code you can share things within chat CPT itself as well we have these new share buttons but that only shares your sort of chat kind of at the state it was at when you hit the share button and as we all know from working with chatgpt you often have a lot of false starts things like that so you don't necessarily want someone to see all your work and all its you know messiness um so if you sort of make three or four attempts at doing something and the first two fail you know using this notebook you could kind of clean that up and delete those failures and then you know when you want to share just the high level analysis or the visualizations uh with other people that you work with you can do that within the notebook there's also some some odd stuff that I've noticed sometimes with chat GPT that images will sometimes no longer be there if you go back to an old chat you know a week later or something like that so that's another sort of advantage of notable at least for now that you know these these images will stay in this uh document right so you don't have to sort of worry about about losing your work okay so I asked it to do some more analysis um it says sure let's continue with the analysis we can look at the time of day and it did a little analysis by a number of bike thefts per hour so what I find interesting about this is that the analysis is very very similar the charts even look quite similar in terms of their color and stuff like I'm just going to flip back to the code interpreter version well actually here you can actually see what I was just talking about the the images are already uh Gone um so I can't directly compare them but you can rewind the video and see for yourself um but uh you know the colors are very similar um the types of analysis that it did uh were very similar um although weirdly like they're just a little bit different like like a code interpreter did that sort of time of day one although I have to say in in previous tests it did not I honestly don't know how much of that is that code interpreter is going to give you different results than notable or it's just this kind of idiosyncratic thing I've noticed about chat GPT in general that any kind of prompt in chat GPT will give you a different answer this morning than it does does this afternoon right there's a little bit of Randomness in there um and that might have something to do with it but it's also possible that just the the way in which the output comes like I'm assuming in code interpreter um chat GPT kind of has access to the actual kind of code and then the actual response whereas in notable it's kind of limited to what it's sending back and and particularly with large data sets I could imagine a situation where there's some limit to how much can be sent back uh to chat GPT and so that might affect its responses but generally speaking um it's it's handling the the general prompts again very generic you know please provide me some analysis and make some charts um in a pretty similar way um and that's not all that surprising in that my discussions with the folks at notable I've clarified for me that the kind of heavy lifting thinking and Analysis and coding is done by Chachi PT right that the notable API is really just sort of the bridge that allows uh chatpt to talk to a notable notebook uh and then get some response uh that that so it's not surprising that if we're using Code interpreter it's coming to similar conclusions it's it's making similar assumptions about how it should analyze the data you know if you ask it it'll make some charts it's going to make very similar charts so there's a lot of similarities uh between the two tools uh again with the biggest difference being code interpreter just lets you upload a data set without having to set up any extra accounts or anything like that whereas notable does require a little bit of extra work to set up those accounts but then has a separate external document uh that you can edit that you can change that you can add things to and that you can then also share publicly or just share with a few people within your organization but there are another couple of interesting differences I've noticed that that I wanted to sort of draw you guys attention to as well so one of the things I've been playing around with a little bit more in his discussed in my last couple videos is is sort of using uh chat GPT and his functionality uh to do some things that otherwise I could not do right so some of these visualizations that we've done so far I know how to do them in Tableau or a tool like data wrapper or something like that but there's some more advanced techniques that I've sort of heard about I'm aware of like entity extraction and sentiment analysis and fuzzy matching and stuff like that that um I don't know enough about coding uh to do myself but with a tool like chat gbt that can code for me I can start to play around with and one of the examples I want to show you is this idea of entity extraction which is looking at a document and pulling out the most common names of people or places that are that are in that document and for this I'm going to use as an example a pdf version I found online of the return of Sherlock Holmes and I've downloaded that I'm going to upload it here and I'm gonna tell chat gbt please Analyze This document and give me a list of the 10 most common named entities maybe even the 10 most common people named in it okay so please Analyze This document and give me a list of the 10 most common people named in it let's see how it does okay so it says first I'm going to extract the text from the PDF we look at what it's doing here it's extracting the PDF it says this document appears to be the return of Sherlock Holmes by Arthur Conan Doyle sounds good so it's going to use named entity recognition um which is a subtask of information extraction which can identify organizations locations time Expressions Etc to do that it's going to use something called the Spacey library for python so it gives that a try and we can see it right here trying to load in the space c library and load in a specific what's known as a tokenizer to be able to do that work but we can see here we're starting to get some error messages and then chatgpt tells us that there was an issue loading the required named entity recognition model unfortunately due to the environment restrictions in place here I can't download new models so let's take a different approach um we're going to use a chunker that classifies as tagged words to try to sort of count the most common words so it tries to do that so you hear this tokenize thing looking for words in there gets a bunch of error messages and then it says I'm sorry but there seems to be a memory issue while running the named entity recognition with the nltk package given the restrictions of the environment it seems that this task might not be feasible using this method a more powerful Computing environment would be needed to handle large documents like this one particularly when performing complex tasks like name Deputy recognition says you may want to try to do this on a local machine or a cloud-based environment and it gives you some code that you could try yourself but if you don't have python installed on your home machine or some other way of accessing Python and I don't you're kind of stuck right so basically it tried to get a python library that it thought would be good for this task and interestingly it knows that that's the right library to use but kind of Chachi PT knows it's a good Library but code interpreter doesn't have access to it so it gives it a try it doesn't work and then it says let's try this other thing but it's like ah this is too much memory I can't do it and it's interesting because I've I've heard some references to how big the files are that you can upload to code interpreter and they can be quite large um I believe they can be like 100 megabytes and this is nowhere near uh that size it's uh let me just see here I think it's like one megabyte um so that's not really the issue it's not that the file is too big to upload it's just that the analysis task that I'm asking you to do on that document which fair enough it's it is a pretty big um collection of stories about Sherlock Holmes it's not able to do in the memory environment that it has so it sort of runs into two two problems and I have to say in my playing around with code interpreter uh over the last uh week or so those are the two things I kept on sort of bouncing off again that if I was doing basic analysis I'd be fine um but if I was trying to do something a little bit more unusual often the solution that it was trying to achieve itself like it knew what libraries would help me to fix that problem it didn't have access to them in the code interpreter or the data set that I was using was too big and it would start to run into these memory uh limitations and I don't know enough about python to know exactly what's going on but I know that there's these sort of defined packages called libraries that can do different tasks in Python and obviously however code interpreter Works doesn't have access to some of those libraries so it didn't work in code interpreter let's now see if uh notable is better for this task okay so I've switched over from code interpreter to plugins I've enabled the notable plugin I'm going to use the exact same prompt I used before please Analyze This document and give me a list of the 10 most common people named in it but I can't upload the document in notable that's its one day limitation what you can do in notable is you can actually if you're wanting to do analysis on a data set or a document that's already online you can just give it the URL for that document so if you've got a bunch of data files on GitHub or some other server or you're wanting to analyze a report that's on a government website you can just put that URL right in there and chat GPD itself doesn't have direct access to that through a web browser at the moment but it can send the instructions to notable which then can pull that document on off the internet and I found to be honest that notable sometimes even when I've uploaded the files to the proper project in notable sometimes has difficulty finding the file I noticed that's a particular problem with PDFs not so much with CSV files so this is an nice backup option if you're having trouble getting the notable plug-in to find the file for you so let's see how this goes so at first things don't look great because what happens is um chatgpt comes up with a great strategy for how to do it very similar to the strategy that code interpreter was going to come up with download the PDF convert it into text to named entity recognition count the occurrences of the entities it knows what libraries to use it comes up with a good script to try that um and all that kind of stuff but then it says here that unfortunately as an AI language model I don't have direct access to the internet to download files or use external libraries that aren't pre-installed in my environment um and this is something that I find comes up a lot with these plugins is that Chachi PT doesn't know what it can do and and it's sort of so used to not having connections to the rest of the world uh through plugins and tools like code interpreter that you sometimes have to kind of sort of push it along a little bit and so in this case I'm going to say please use the notable plug-in to do this okay so I'm basically just saying you can do this like give it a try so let's see if that helps okay so uh with my little please use notable it gets the self-confidence it needs to actually start using the notable plugin to make these requests uh doing for me what it was asking me to do on my own it comes up with a little technical error where it was trying to read the PDF using a an old program or a deprecated technique so it moves on this sometimes happens with bigger tests so it's saying the cell is still executing so what it's doing is it's sort of checking on notable and it's saying you know it's not done thinking yet um so it'll say this is probably because it's it's a pretty big task it says you can check the progress of the cell execution by visiting the notebook at this link so you can go to the notebook and in cases like this where it's still sort of thinking I just then go to the bottom of the page and just look for it to be done and you can actually see here that it's it's finished thinking um so I can go now back here and I can just type in that please continue and says the analysis of the document is complete here are the 10 most common people named in the document homes 436 Watson 186 Mr Holmes 186 lestrade Hopkins Milverton um you know speaking some bit of slightly odd choices here Mr Stanley Hopkins Peter Kerry says this analysis is based on the named entity recognition model from the nltk library some names may be missed and some non-name entities may be incorrectly identified as names so what's interesting here is it actually did not use the same library that code interpreter tried so why don't we just prompt it a bit here and say can you please try this again but this time use the entity what's it called here named entity recognition in Spacey foreign it comes around this this model again this is and this is where code interpreter got tripped up where it says it appears this Encore web SM model for space he's not installs this model is necessary for English language processing tests and Spacey including named entity recognition let's install it and this is where um we ran into problems with the code interpreter was not able to install that model it said you know we don't have access to that in our environment and it just wasn't able to to complete the task right whereas in um notable it is able to do that work and then it just takes a while for it to complete the work it says the cell is still executing named entity recognition can be a time consuming process and again we could go check and see how it's doing or we can just simply say please continue and if it's done it'll tell us and if it's not it'll say just wait a bit longer please continue okay and it gives us a slightly cleaner list it looks like of uh actual people so Holmes is the most common then Watson lestrade Sherlock Holmes you know it's treating that as separate I guess because sometimes it's got it on you know second reference probably just the last name Etc um and so you know this I think illustrates uh what I've certainly has been my experience so far playing around with these two tools um as discussed before the biggest Plus for code interpreter is just press the plus upload your data set away you go uh whereas with notable you have to kind of set up an account and you have to sometimes upload your files to a project and it's a little fiddlier with with that sort of stuff um the big sort of Broad and I think continuing advantage of um notable is that you have whatever that analysis is whatever the visualizations are in a separate document that you can edit that you can add things to that you can send to people all that kind of stuff but then the other thing which has been really noticeable for me is that the number of libraries that code interpreter has access to seems to be somewhat limited and so as you start to do some more advanced things or start playing around with different things you start running into that and again because there seems to be a little bit of a separation between Chachi BT the large language model and um the code interpreter sometimes Chachi BT knows the solution it knows how to do it but the code interpreter doesn't have access to the specific tools it needs whereas notable seems to have access to a wider range of libraries and then the other one is that sort of memory component where um more often when you're dealing with larger data sets you seem to run into the limitations of how much memory is allocated to you in code interpreter whereas that hasn't been such an issue so far for me with notable um now the these tools are always changing it would not surprise me at all if over time openai gives chat GPT access to more libraries gives users a sort of a bigger memory bucket to work with but for now I would definitely sort of play around with both tools start to learn a little bit about the strengths and weaknesses of each and for sure if you're running into barriers with code interpreter give notable a try because there's definitely some things that it can do at the moment that code interpreter cannot so hopefully this has been a helpful video I'll be making more as I go along and play around with these tools a little bit more
Info
Channel: Chad Skelton
Views: 5,403
Rating: undefined out of 5
Keywords:
Id: nBJaSaPIb0k
Channel Id: undefined
Length: 31min 10sec (1870 seconds)
Published: Wed Jul 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.