Tips and techniques to move from Excel to Python webcast

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to this presentation this live event michael kennedy here so great to have you here with chris chris moffitt he's got a great presentation put forward for us put together for us that's going to help us escape excel hell and use excel i use python more than we might excel maybe just so everyone out there just to make sure a little quick tech check if you can hear me throw a thumbs up or something into the chat i can hear you you you can okay if you can hear me chris i'm sure we're all good so all right awesome i've got just a couple of things i would like to share thank you kevin everyone's like where's the thumbs up button so i have just a couple of real quick things i would like to share with you here on my screen so let me pull it up so here we are at tips and techniques for moving from excel to python and our presenter today is not me it's chris moffatt you can follow him on twitter where he's at chris1610 and chris is probably most well known for his practical business python blog and all the work he's doing over there his open source projects and of course he's been on talk python to me a couple of times to talk about excel was super popular apparently people are excited to not use excel as much and do more python or blend these things i think is also a way to think of it so uh yeah so welcome chris um thank you thanks for being here i'm gonna flip it over to you and first i'll give you your video and then when you're ready let me know i'll flip it to your screen okay so good morning uh good afternoon good evening depending on your time zone really appreciate everybody joining us today i think this is going to be really good discussion and what i'd like to do is i've got kind of some some code examples i i figure it's best to show you uh versus the versus just talking through things so we'll uh hopefully have a good discussion uh with some concrete examples of how you can use python to replace some of the cumbersome tasks that you do in excel today so if you want you can go ahead and share my screen all right it's up all right so uh like i said i want to really focus on code so i only uh have one uh powerpoint presentation to walk through and mainly because i want to orient everybody on how to think about this problem so as you can see on my slide i have my excel hammer really high quality graphic that i put together there but i i think it really you know what it's meant to illustrate is excel is a tool that does a ton of things in an organization and it may not be the best approach for all the phases and as i think about how i've used python and various tools in in my world to solve real world problems these are some of the categories that i that i focus on and what's really important when you start to figure out how to leverage python in your world that you want to tackle the right problem so you could focus on gathering data wrangling your data cleaning the data analyzing and presenting it these are the different categories you need to be smart about where to use python and addressing the right pain points is really a critical part of getting started correctly so what i wanted to do in the rest of this crowdcast is walk through some very specific examples of maybe where excel falls down a little bit and python is a better approach for solving some of these problems so um that will or throw out something for the audience as well if you guys have a question please use the ask a question section and don't throw it in the chat because the chat like can fly by and we'll miss it also if you do in the ask a question section in the recording which yes there will be a recording it will actually tag the recording where your question is answered so you can go back to it all right take it away chris perfect so um michael does this look like it's big enough for everybody that they can read it or should i blow it uh blow it up a little bit more it looks maybe i just hit a command plus that look good for everybody hope so all right so so here's here's what i want to do i want to go through some like i said concrete examples of things where um python really um excels compared to um excel so the first one let's let's take a look at a file i have here so it's called 2019 customer transactions it's 96 megabytes which is not um terribly big but it's it's big enough and so when i try and open it with excel i think most people are going to understand what's what's going to happen here we're kind of cramming through this file and then showing up on my other screen that essentially it didn't load all the data so it only loaded the 100 and or the little over a million rows that excel can handle it looks like it loaded everything but everything is not there so let me show you how you do that in pandas so we we've got our jupiter notebook here one of the things that i do you know just kind of my editorial on how i like to manage things because when when you're working with files so much in the windows ecosystem or mac or linux but especially on windows file paths are kind of painful so i like to use path lib so in this example i've got my raw file that 2019 customer transaction file that i tried to load in excel and you saw that in excel it took i don't know 10 20 seconds didn't even load at all but when i use pandas it loads it really quickly and you can see here that i have the 1.1 million rows it's about 83 megabytes of memory and you know just if you want to maybe do a quick aggregation of that data so we want to see okay well how many books pencils or pins did we sell and for for those of you that maybe aren't as familiar with pandas i did a quick group by category aggregated the data so that i can sum the invoices and then actually get the number of unique invoices and then style it so my point here is think about if you were to try and do this in excel one you couldn't open that file as as it stands without doing some manipulation but two excel tends to get bogged down when you have these large files and then three if you look at this so how many lines of python do i have so i've got two import statements i'm defining my file i'm reading it the head and info just kind of tell me information are not really required and i've really got one aggregation function so if you start to think about how you do this in excel or how you'd explain to someone in excel it's not necessarily the most complicated thing but it's not very efficient so pandas and python working together is a much more efficient process for working with large files so hope uh hope that made sense to everybody let me uh go through chris let me ask you a quick question how much data could i load like if a million rows is too much for excel yeah i mean essentially you're constrained by how much the the easy answer is you're constrained by how much memory you have in your computer um with core pandas and then you do have options to do other things where you can do like out of memory manipulations or analyzing data so you can even analyze data you know more data than you have memory so on my laptop here which probably has i don't know 16 gigs of ram which is probably a fairly low end for a laptop these days you know i could probably load several gigs worth of data and manipulate it relatively easily nice so that is very helpful all right so let me um so the other example i want to go through let's do another example of how you can consolidate information so one of the other things that excel that i expect you've all seen with excel is you get data and you have data across multiple tabs so here i have sales data and i purposely chose kind of a small subset of data and i've got different sales data on different tabs for different months and this is a scenario i see all the time hopefully you've seen this as well where okay now how do you combine this if you get this excel file what would you have to do mostly what you would do is probably a cut and paste maybe you would copy each of these tabs and then um and then paste them all together and do your analysis but with pandas you have a lot more flexibility so let's let's go ahead and use a clean notebook and same sort of process where i use path liberty to find my file and this is a little little trick that pandas does which i think is pretty interesting uh when you read in an excel file you can normally specify what sheet name you want to read but if you specify sheet name equals num none sorry i need to close my file wouldn't be a demo without some errors right so then if you if you close the file now it reads the the file in and what it's returned is if you look at all dfs it's now taken each sheet and read it into a dictionary and then i can use pandas to concatenate all those together and now i have one data frame that has all my 300 rows all in one place so my point here is just to start to give you some ideas of some of the hopefully common manipulations that you do with your data that you may not even think are are really that hard but if you're doing this a lot on a daily basis or if you have people in your organization that are doing this maybe not um really understand there are options like python out there that you can use once again it's really just a couple lines of code to bring in that data set combine it and now you have a single data frame and you can do all the manipulations you need to in one place and then kind of piggybacking off of what we talked about on the first example this was obviously a very small file but there's no reason you couldn't do this on really large files as well chris another sort of follow under that is like if you've got say monthly excel reports and you want answers for the year you know it's easy to just throw those all together right exactly the other other thing i'd like to emphasize that you've talked a lot about on the podcast is you know when you're doing this manually merging copying pasting in excel there's always a chance for an off by one error where you skip you miss miss a row or some other kind of manual mistake and here it's the computer's just going to read it and roll exactly and then this was a really simple example but if you had a situation where you had to change the data or manipulate the data or clean the data or you know the the file used to work and now it doesn't work well you can just insert the code in here in pandas to actually fix it instead of trying to you know manually document those steps all in a word document or an excel file somewhere all right so kind of continuing with the theme of you know trying to get data from other places so one of the things you know excel has gotten a little bit better with this but it's i don't know that it will ever be as powerful as python that if you want to get data from the web if you want to capture information from other data formats or data sources python just has a ton of libraries that can help you with that so this is one example i actually wrote about it in the blog but if you wanted to for instance scrape data from a wikipedia page so you know wikipedia is going to have all sorts of interesting data that you may want to collect one of the things that pandas can do is it can actually read html tables and kind of scrape them off of the web page without using you know it uses beautiful soup and other libraries behind the scenes but it makes it a lot easier so if i wanted to scrape the minnesota election results then i have this this table so now with that one command i have that table all loaded into into a data frame and can do whatever manipulations i want and you know unfortunately the data is a little bit messy uh because we have these percentage signs in here so need to clean them up a little bit so if you um look at the link and i know there's a question uh in the chat about am i going to share these notebooks and the answer is yes so i'll share this out there you can click on my article where i talk through this a little bit more in a little bit more detail but now i've replaced those percentage signs and then converted them to numeric so now my data is nice and clean it's a numeric format and i can do whatever manipulation i want so this is just a really you know one of those tricks that is important to have in your toolkit so that as you start to want to bring in more data you you know this is one example that's out there and i think it speaks to just that that power of pandas all right okay now this one um was actually a question so i i wanted to include this because this was one that i didn't uh actually know the answer to so i thought it was kind of a fun thing to dive into a little bit more and present back to this group so the the question was from uh istavan i believe and basically the question was how do you connect to a table inside a excel worksheet so if you'll notice all the examples that i've done like when when i read this csv file for the most part you just kind of read in everything on the sheet you do have some ability to specify which rows and columns you want to access but this question was if you actually define a table in excel how how can you access that and i didn't know the answer to that so let me let me show you what i'm talking about so that we're on the same page so this is the example so let's say i have a um a table that looks like this so i've actually defined an excel table that for this case is a lookup it'll blow this up a little bit so everybody can see it that is a lookup of the various products and what type of shipping and what the shipping cost is and maybe right now i know the table starts in column q but somewhere else i um you know in another sheet it's another name table somewhere else and pandas doesn't really allow you to do that to just read that file so what you have to do is use open pi excel so this is a library that pandas can leverage to read excel files and let me let me walk you through what i'm doing so you can load the you load the whole workbook and get get your sheet names and then we need to go to their sales vlookup sheet and then you can see a list of all the tables so this you know gets to be a little bit ugly but so i know that it's called table one so i get my lookup table and then it tells me where that table started so now i know that that table is from q1 to s9 i get the data and then this is a little command to turn it into a data frame so now that table that i had in my excel spreadsheet is now in a data frame and i thought this was a neat little trick i like i said didn't know that before but i know that when you're working with excel part of the power of excel is it has a lot of flexibility um and and that's part of the downside so people don't structure excel files like a like a proper database so there'll be little snippets of tables all over the place and sometimes maybe you can't get your users or whoever's creating that file to actually adjust their process and so you have to adjust your process so that it works and this is one example where if you actually needed to read tables out you could use this function this couple lines of code to actually iterate through those rows and columns in the table and turn them into a data frame so i thought that was good and thanks estevan for that question so the next thing that i think people probably one of the most common things that people do in excel is a vlookup and you know i i realize that there are other options like index match and other more even the new i think it's x lookup that's coming to excel is going to be a little more powerful but um it uh you there michael yeah sorry okay on my mouse that if i it's not supposed to but if i hit it the wrong way it will actually navigate away from the kind of restart it so sorry keep going okay um and so i wanted to talk about the vlookup because i think as we talked about cleaning data and bringing data together vlookup is a common thing so um let's let's open up my my shipping tables file again and let me walk through you know what what i want to do from an excel perspective so this is a let me blow us up a little bit this is a very simple uh list of orders uh the state and the product they ordered and i need to figure out the shipping cost so i've done a vlookup on my shipping rates in a separate tab so pretty you know pretty straightforward uh excel process so if we're gonna do that in pandas um pandas i will actually read in those two data frames so this is an example where you can start to specify specific columns that you want to read in so here i'm only reading in a through e because i don't want to read in my vlookup example then i've got my table and if i want to join them to that together i use panda's merge so i merge the sales and the shipping and i tell it i want to do a left merge so if you're familiar with sql joins is kind of like a left join and now i've got my my table so the one thing that is really nice about this is this kind of merge is going to be more robust than like an excel vlookup so how many people have you know had excel vlookups and then you go in and actually change the order of the columns or add a new column and then all of a sudden your vlookup breaks so it's a pretty brittle process pandas merge is going to make that a lot easier so if you have these scenarios where you've got a lot of excel data that you're merging together with vlookups take a look at pandas merge as a more robust example and on top of that so let me let me go through this second example where you know that example was a fairly simple um merge where we're just looking at kind of one key but what if we said um that we wanted our tables to actually look at not just the item type but also the priority so in excel you can't really do that very well right you can't do a vlookup on multiple columns so what what i think most people do and what i've certainly done is you kind of do your own lookup key where you concatenate two columns together so now i've got baby food and i've got my priority and i'll look that up in my vlookup table here so then i can figure out my shipping cost and once again you know you can kind of you can kind of do this in excel and i know some people look like they're talking about x lookup and index match but with pandas you really have a lot more flexibility on how you want to do that so we'll read in the two files close this one up so now we have our sales data and now we have our shipping data and when we want to merge we can tell it what to use for the left and the right so on the left we want to match item type and priority and the right item type and priority and boom it just gives us our shipping cost so once again the this idea of being able to do to do this um without having to concatenate and you know i'm looking the chat and looks like um you know so there are certain other options so people are mentioning x lookup uh people are mentoring uh mentioning alteryx um and other types of options and i i agree that's a completely valid approach if you are at a place where you have a tool that works you know certainly continue to use it but i i have certainly found that there's a lot of places where organizations don't have any tools outside of excel or haven't invested in those other tools and so python can be a way to you know start dipping your foot into the more robust tools that are out there and i think python will then actually grow with you a little bit better but i'm certainly interested to see what other people think um in the chat one thought i have uh chris you know a lot of these extra tools and uh whatnot are probably pretty great but what about doing them on servers or in production or some sort of automation where you know maybe it's running on like a cloud vm right all of a sudden some some nice gui that costs a lot of money is not so effective over there yes yeah yeah exactly um so it's you know you can always uh find vendors that'll they'll give you lots of options and you know to microsoft's credit excel has also um started to build more more tools so i um i think someone mentioned uh power bi and some of the other uh enhancements that excel has put in place so you know they're they're not standing and uh just waiting for other companies to build better tools so they are continuing to evolve but um you know i think there there's a case to be made that this type of python syntax is um you know not extremely difficult to learn and provides a lot of a lot of value so the other thing and this is one where i haven't seen that there are as many options out there so let's talk about working with text so this is one area where i i just feel like excel doesn't have a whole lot to offer and it gets pretty messy in this example once again i did a fairly simple like okay here's um an account number name and let's say i'm dumping out just a text field so it's a customer notes field and there's information about a prompt pay discount there's information about free shipping free shipping you can tell it's not really formatted we've got some some dashes in here but what if i wanted to dump this out and figure out okay which of these customers actually have free shipping uh or which of them have you know i guess just for the sake of this discussion which just have free shipping and so you can do this excel formula to search and find free shipping but it doesn't work because not only do i say free shipping i've also got no charge shipping uh i've got free ship so you know it's it's anything unfortunately where people are entering text by hand and obviously that's not a best practice we don't want to have that but we live in the real world where that's the kind of data you're going to get or we're going to get and have to work with so i i think a lot of people will try and figure out how to expand this formula to search for more of those scenarios and it can be done it's just not very pretty so let me walk through how i would do this in python and we'll read in our file so here's the file again so we've got our customer notes and in this example i'm just going to do you know one command there's probably a couple different ways you could do this but we've got our customer notes field and i'm going to access the string accessor and you know we're getting into dangerous territory here talking about using using a reg regular expression but i i think this is a definitely a legitimate reason to use them and relatively easy to understand so i can say look for free shipping or free ship or no charge shipping i can tell it it's a regular expression and i can tell not to worry about the case and assign whatever it finds back to free shipping and now i've got my true false f that output excuse me values and you can see that it found my no charge shipping it found my free shipping it didn't get confused by ups next day shipping this is where i found python just really really powerful when you're trying to manipulate the data when you've got that text data and it's just you know unfortunately not as clean as we'd like and before you can do more of the complex analysis you want to do you need to work with the text and this is uh pretty powerful and once again not that many lines of code to do something that would scale to much larger data sets okay moving right along okay let me open up this one i hope this is working with everybody i wanted to purposely kind of go through it fairly quickly uh because we will share the share the data's or share the data oh here's the right file sorry about that so we're we're going to continue to use the the big file so this was the csv file that we talked about earlier so i'm going to read that in and you can see that it's my 84 megabyte file and if you think about in excel we talked about using vlookup copy and paste i think the other thing that i know i use a lot in excel is to turn on auto filter and start to filtering your data so you filter it and maybe you want to find a certain date range you want to find a certain quality level uh threshold pandas allows you to do that using what i call boolean indexes boolean filters so in this example let's say we want to look at all those sales that happened in 2019 and let's say that we you know ran a sale from january 1st through the end of march and to qualify for the sale price you had to purchase at least five products and they had to be pins so i can define my filter and say okay i want my invoice date invoice date time to be between these two dates so i like that you can use between so you don't have to you know have two statements there you can have a filter make sure our quantity is at least five and that our products are in the pins category so we define that and if you look at it each one of these is just a list of true false values and so it's it's a true if you essentially show that row and a false if you hide that row and then if you pass that to the loc command we then get a list of here are the 90 000 almost 91 000 rows that have pins at least five quantities well yeah run run that again for me real quick just hit yeah yeah how long did that take sub sub seconds of 100 milliseconds exactly and i remember seeing a little like computing computing formula like my in excel or sheets all the time that's crazy yeah i mean think about that so if that that's a really good point i almost didn't didn't even think about it but if you've got a million rows in excel and you try and do your auto filter so turning it on selecting your date range then you select your quantity then you select your product um it takes takes some time and you can get lost right because you don't know exactly what's uh selected so then we can say okay i want to put a flag in there so that i know which of these pins were on sale so i can you know put all these filters together so i have my day filter my quantity filter my product filter and i want to assign the special pin sale a value of true once again that all happens immediately and so now i've got a flag so i can say okay this was a special pin sale this one was not and then everybody's probably third favorite uh excel formula is the pivot table and so pandas has a pivot table as well and now with one command i can say okay give me my data frame index it on the category i want to have my special pin sale across the top so was it a pin sale or not true or false and aggregate all the invoice totals give me a sum and margins equals true means to add these totals so i can see what my total columns are for books pencils all told i sold 87 million dollars in this time frame five of his was through the sale 82 was not um and then i put the format in there to highlight the um to put the dollar signs in there to make it a little easier to read and what i like about this is you can then you know copy and paste this and maybe look at it a different way so all i really care about is what is the percentage that actually happened through um the sales and so here i used a crosstab you could do similar with the pandas pivot table as well but here i want to understand okay of my pins 24 the volume was sold through the sale program and 75 was not and then if you wanted to see um as a oh this is just the uh style oh i'm sorry so i i was i lost myself there so i've i've got two different normalized options so one is normalizing on the index so this is for the row how much was by uh pins and then this is in total so 60 of my sales were books and not on the sales uh didn't have a special pin sale and only five percent of the total sales were pins under that special program so my point is once you kind of have that data together in in one place then getting comfortable with the various pandas commands you can work with really large data sets really quickly to iterate through and get some insight and figure out you know ultimately what are you going to do differently as a business or as or as an organization now that you have this data okay any any big questions coming up through the we've got we've got a couple of bubbling up to the top you maybe want to do one more example and then we can uh yeah switch to q a i think that would be great i wanted to open up my excel okay so this is kind of a neat little trick i like to do so one of the things you know i i don't know if i really explicitly said this in the beginning but one of the things that i want to make sure i was clear on is like i i still use excel a ton excel is a valuable tool to take a look at the data do some manipulation do some presentation and one of the things that i really like to do is pandas makes it easy to read from the clipboard as well as from files so let's say that i have my excel file here and if i just select some data and copy it once it's in the clipboard pandas can read from the clipboard and now i have my data frame there so this is a pretty cool little trick when you've got some small data sets that you just want to bring in to maybe merge with some of the other files or other data frames you have so i use this all the time and it's it's just a really kind of handy trick and it works both ways so if let's take a look at our big csv file that kind of filter that we just did and let's say okay i want to see which of my transactions are true because i maybe you know it doesn't make sense did i make a mistake in my logic somewhere so then you can actually do the to clipboard now if i open up my excel and paste in there i now have my almost 91 000 rows of excel data or a data frame in excel as well so this is a really kind of a slick way to go back and forth between excel and pandas to check your data make sure it makes sense maybe send off a quick um report or summary to someone if they're asking about it so i i always think that's a pretty cool trick and it's in the pandas documentation but i don't know how many people know about it so really really enjoy using that and then let me know that's great i've got one more example i can go through and then we can uh maybe open it up for for questions sounds good um so this one uh there were some questions about this so there there were there was a question about reading multiple files and there was also a question about i believe this was from edward for how can you actually put a excel formula into a spreadsheet so let me show you the the data that i'm going to be talking about so let me get this over here sorry about that okay so let's say in this example i have a weather directory and there are 160 files in here that have local weather by country and what i want to do is read each of those files read all 160 of those files and combine them into a single data frame so let me do that so the the you know i talked a little bit about path lib which i think is really really useful function the other thing i get in the habit of is defining like where you want your report files to be and putting a time stamp in them because you run this all the time and you want to make sure that um excuse me you you want to make sure that you can track the the runs over uh each day so let's read in those files and what i'm going to do is i'm going to use path lib again to do a recursive glob for all the csv files read in that data frame and i i don't want to gloss over this point i i think as you start getting started with pandas you may not realize how powerful something is like reach csv so we can actually tell it that this time column needs to be converted to a date time we can do other types of conversions here there's a lot you can do and um yeah no i don't think i have covet um uh just too much talking so then you read in the data frame create a list and then you've got the pd concat that we talked about earlier to build one single data frame so now it's reading those 130 files and just to show everybody um i have this data frame list you can see the first one is just afghanistan and now all data is 30 000 rows here's my full data set and so what we've done is we we've quickly parsed through 160 files brought all the data together made sure it's in the right time in the right format so this time is now a date time so this allows us to do some pretty cool things so let's say we wanted to know hey by country what's the average um min and max temperature per day and notice how i read in the the time or a by by month i'm sorry so we have observations for each day we want to convert it to a month uh if you're using excel maybe you'd add a month column but what you can do with pandas you can use this grouper function to group the key the the time column into months and then get the mean for the temperatures and then i use unstack to convert it uh into columns across the top and now i've got a you know with a few lines of pandas code a nice summary table for each country what their min and max average temperature was for each of these months in the data set so let's say that this was a really important report that we wanted to share with people here's an example of how i'd write that report out so we use the the excel writer and tell what type of engine to use so xlsx writer is the the library that will format the code in excel uh proper format and then we can once we dump that file to the report directory i'm just going to show how to set the column width to 18 because if you've worked with excel before or a pandas before when you save to excel sometimes the columns are a little squished we're going to open up those columns and then this was the example that someone had you know how can i put a formula in here so i just put one formula in here to add an average formula for b4 through b 164 put that in cell b165 and then save the file so let's go back to my reports directory here's today's report that's how today is it so now hold on that's the wrong file maybe just print out report file see where it went yeah well we'll we'll use the uh the one that i have there and while we're going through the question oh wait no it's i don't know maybe it was just cash the the one drive one hold on i think we're good there we go well it's still oh well so i i don't know why i didn't save it but the the the point is let's see if i have my my summary here no i don't have it well let me let me figure out where that ended up going but the um the excel writer does allow you to format these columns and put formulas in here and actually then you know you have your excel file as a report that you can distribute to to others okay i don't know where that went but anyway um maybe are there some some questions that we want to oh here it is never mind i was looking in the wrong directory okay so this is a little better so i i looked in the wrong directory but what what i did um you can see now that the file is the columns are a little nicer the dates are a little more nicely formatted and then down here at the bottom i have the average formula that we put in the code so this once you kind of get this analysis going and figure out what that report structure needs to look like then you have that template and you can start to use the excel writer function to add more formulas you can add more formatting you can control the number formats you can control fonts and sizes and all those things um it it works really well it is tedious so i tend not to do that uh in the beginning it's it's more of towards the end of the process once i really understand what the uh final report needs to look like yeah that's that's a great example and i think that's a good place to wrap it up because one of the cool takeaways from this it's easy to look at what you've shown us chris and say okay well either we're gonna go do jupiter or we're gonna go stick with excel and this says you personally can live in this pandas jupiter world but the thing you deliver back to people who want it who need it well they can have it back in excel with like this one line you put here right exactly and you know that's the the other thing i wanted to touch on as we're talking about you know i think there is this elephant in the room about i you know i can hazard a guess that most of the people that if you're taking the time to listen to this crowdcast you're at least somewhat familiar with python and somewhat interested in it um but there are challenges and environment management how do you make sure everybody has the right libraries how do you actually share your solutions with others um is it or is your management going to support this and i think those are all legitimate questions but as i was thinking about this i think these are problems that are going to be solved sooner rather than later if you kind of look at if you just you know do a look for companies that are developing jupiter notebook environments or developing more collaborative ways to share information so look at something like streamlit which has kind of made it really easy to develop interactive visualizations with python or you look at plotly in their dash framework or other startups that are trying to build jupiter notebook environments or you look at what continuum is doing with their anaconda environment i think we're going gonna find a solution at some point to a lot of these and i think back to like where we were 10 years ago if someone would have told me hey microsoft has python in the windows store they have a visual code editor that understands python uh does syntax highlighting manages your python environments you know all those sort of things i would never believed you and we're there now so like python has its foothold and these are some of the challenges we need to get through but i think if we start small if you start in your organization start small get some wins expand the knowledge and you know kind of build some momentum then you can figure out how to bring this more broadly uh into your organization but i did want to kind of you know specifically mention that this is a challenge and acknowledge that i don't have all the answers for it yet yeah it's it is a challenge but really tons of great advice here chris a lot of good feedback in the in the chat here so let's do we've got a few minutes for q a um we'll do what we can here let me see if i can unfocus the screen without bringing us back there we go kind of y'all don't need to see me you can just see chris sorry all right so these are voted on i'll just take them from the top as we can so uh maybe you can open that tab as well if you want but uh easy way to present your data analysis in excel you can always follow the functions to understand what was done in a sheet as an executive or change a pivot table if you want to dig deeper how is this possible with python if you only supply a condensed report how would you provide this report in the first place sure yeah i i think that's you know that's a good question there are certain scenarios like when you go to your cfo and maybe they have a financial model or a balance sheet or an income statement and they're always wanting to kind of see how things are flowing i don't know that pandas is really going to be the right solution there what i would say is think about what are the challenges to get the data in the place where that executive or whoever wants to actually trace and understand the formulas is there something there where python can make it easier to consolidate the data or present the data for further analysis so i i don't think if if someone needs to see all the details for the formulas then you're probably going to need to stick with excel but maybe take a step back and look at are there other pieces of the process to get the data manipulated that you could use python and pandas for all right excellent next question uh missing gui python does not offer a gui if you don't program it ipython notebooks are an option but they're not as universal as excel it's important that other people can use the program which is easy uh when dealing with spreadsheets that automate data connections for example power query yeah yeah so i think the i think there's kind of a couple approaches so one if if you are trying to build something where it's more of a automation process so let's say you had a process where people are doing a lot of excel copy and pasting vlookup and you want to give them a tool that's python based one of the things i've had good luck with is a program called gui which is g-o-o-e-y uh which is a really thin wrapper around arc parse and so you can kind of write your python script and then it gives a nice uh native look and feel gui on top of it so if you want people to select files select date ranges you know make some choices to generate the output uh that works really well so that's one scenario the other scenario really quick sure so that gives you a python application that still requires python if you then bring on pi installer you get a dot app or exe thing you can just give to people and then it's it's truly independent of the setup and details yes yep no good yes you're right absolutely the other option is if you have something that's more of a graphical approach where you want people to interact with the data visually you could use something like dash or streamlit to build that now there are some challenges and like what's the best way to deploy it i've even done things where i've used like okay to generate static html files that still have javascript embedded in there and you can still get a decent amount of interactivity um so so that's another approach um and then you know i like i said i think there will be other environments uh in the future that are a little more robust to to host all the um python functionality you need in one place yeah things like stream load or hosting your jupyter notebook or something like a jupyter notebook server something like that yes yeah exactly all right next one by votes i have a few issues that kept me from switching to python pandas in my business python skills the barrier for other people to edit a python script is much higher my co-workers don't know any programming language it's easy to edit a few cells in an existing spreadsheet but it's much harder to set up a buy that environment yeah so uh let's see um and then i thought well he says i have another question uh longer but let's go ahead and just stick to that because with the ordering i have no idea where it went yeah yeah so so i mean i i agree there is this mindset maybe the way i would think about it is um probably a lot of the people on this session at some point have done vba and you would develop some vba to solve a problem and you wouldn't expect necessarily everybody in your organization to be able to do vba you expect to maybe use it so it's trying to find a streamlined approach as much as possible where python addresses those pain points that can be maybe where you don't need as much user interaction and and so focusing there and not trying to get everybody up to speed on python it's more maybe focusing in on some of the specific power users um that you can spend some time with and get them up to speed a little bit so they can get their environments running but not necessarily trying to roll it out to a department of 50 and expecting them all to use python i think that's just really going to be challenging and not something we can do in the near term right you may convince a couple of folks after they see how amazing it is to start learning but generally speaking it's not going to happen no exactly exactly uh let's see i think actually this one um i skipped that one because that was michael so we already sort of answered the one from so let's let's do another one from pricing hey grayson uh have you implemented the goal seek function in python scipy optimize i have no idea but maybe you do no i i know exactly what he's talking about and i have looked at it and played around with it but i have not done any like a bake off or you build something with goal seek and then use sci-fi optimize or some other library to uh to do that the python supports that and like way more feature and functionality but i do think it is a little bit harder to get started there it's almost more of like you need more of a like linear algebra background to be able to do some of that so it's it's feasible but it's um it's advanced go ahead it's an advanced feature you would say it is i mean i've like the problem i have with it is sometimes i can find an example it's like a really trivial example and then to actually expand it to something more complex i just haven't been able to do very effectively all right how about we can be done all right last last question we've got time for sorry everyone else to ask questions who we didn't get to if my group already uses an excel file that queries multiple tables and databases to build the time series and graphs uh where should we start with migrating to python and pandas yeah what i think is you know what i'd want to understand is what's the what's the pain point you have right now so if you have a solution and it's working why would you want to move to python and is it because the solution's too slow is it because you want to do more with it um if it's just working stick with it but if it's not working then in what ways is it not working so if for instance and i don't know the details but let's say it works but it's really slow then maybe trying to understand why it's slow is there some way that maybe you could bring in the data with python and pre-aggregate it in some way manipulate it in some way so that it's smaller more compact easier to analyze and then do your excel analysis on that maybe that's one approach but in general my guidance would be go closer to the data source and look at any transformations you need to do on the data before you get it to the point where you can actually do your analysis in excel that's kind of the sweet spot yeah and just quick follow-up i mean i love this crowdcast platform for interaction jay just jumped into his question and added too slow it crashes oh yeah so that's where i would say is there a way that maybe let's say for instance you're getting the data at uh hourly level but you only want to do your predictions at the daily level can you aggregate it so that in excel you're only running on that smaller data set so like excel can't handle the two million rows you parse it down to a hundred thousand summarized rows which still give you the data you need to do the analysis that's that's my initial thought without you know looking at the data in more detail all right sweet how about a quick wrap up here share one thing with everyone if i can get to my i'm afraid of my mouth now everyone here we go so i wanted to have a quick offer for you all if you're interested absolutely zero pressure this webcast we just put on for free but if you're interested in going much deeper seeing some really cool real world use cases put together with this chris did an excellent job building a course with us and it's called move from excel to python with pandas and just for uh today i'm offering a discount of 20 off his course so talkpython.fm excel 20 you'll you get the course for 39 bucks so uh with that said uh let's see i will turn off my screen sharing okay and just say thank you all for coming and chris wonderful presentation i can tell from all the conversation in the chat that people really got a lot out of it and appreciated it so thank you for being here thank you everyone for coming and sticking all the way to the end there's so many of you still here uh i would like to point out that the recording at this will be available at this url you can send it along to friends and colleagues and they can register and then just watch the recording and see where the questions are the chat and everything will be here so okay thank you everyone and chris thank you thank you yeah all right hmm here i'll go ahead and put that link up one more time then we'll stop in just a second all right you
Info
Channel: Talk Python
Views: 2,960
Rating: undefined out of 5
Keywords:
Id: b3e8zt3sipE
Channel Id: undefined
Length: 64min 6sec (3846 seconds)
Published: Wed Dec 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.