Power Query Using VBA and Sql To Import Text File.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm going to continue the theme of importing a large text file into Excel and querying it so just like the previous video on power query we're going to work with the same file that's got 1.5 million rows and Excel only has 1,048,576 rolls so but the difference this time is we're also going to pull in VBA into the mix and power queries and programming language okay so step one as usual is we're going to click on the data tab go on to get data from file and from text CSV this will navigate us to the file in question it's got 182 megabytes it's all donated to your megabytes in size and it's got one by five million rows so that's imported as in the other video that's going to be linked up here etc which covers the previous stuff I've done I'm going to click transform data now I'm going to cover quite a bit of stuff that was not in the other video we're going to do some data cleanups that I did not do in the previous video for example one of the things we're going to be able to do is to use SQL to query this data so what I'm going to do right now is I want to clean up the data to make it easier to actually do the queries so for example having spaces in column names can make it more difficult to write the SQL because you've got to put sort of square brackets around each element which can really slow you down and make the query look ugly so step one let's get rid of these spaces so I'm gonna right click on this and I'm going to choose rename and I'm just gonna rename it now a quick hack that we can use to rename the the column headers is if we go to the advanced area so if you go to the advanced editor and that's gonna show you the end code you can see here we've got renamed columns so what I'm going to do now you see item type is here so what I'm going to do is I'm going to do a little cheat to save me having to right click on all of those columns and gonna copy these column headers from above and I'm going to click in here and I've got to put a comma so this is the renamed column area I've got a paste and now what I'm going to do is I'm going to copy all the priority and over type that and we move the space now this saves a lot of clicking in the other area so this is the code that goes behind the scenes and implements a lot of this stuff and I'm going to show it to you in detail a bit further on on this tutorial but I just thought this was a good place to show you a hack for renaming your column so I could put in anything here that I want all I want to do in this instance rather is we didn't move the spaces okay so that's them all changed and what I like to do as a precaution just in case things go wrong it's do a ctrl a and a ctrl C so I've got that data just in case something went wrong there I don't want to lose it I'm going to click done and you see we've had an error so the good thing about this particular method is in the applied steps everything is just an applied step so if I click on the element above we're back to the previous situation but now I'm going to click here and on the renamed columns that have an error I don't want to go back into the advanced editor and see what I've done wrong ah and one of the things was I didn't change the sales channel I left in the type text from the data type change area which is the low above it so if I just over select over that say it's channel and have I got everything but let's see if it works at this time and ctrl a ctrl C just in case anything goes wrong done and well good so that saved me having to like click and wait and find select in on those you know you might lose may vary it's up to you to decide whether that was faster now the next thing if you've dealt with dates in Excel especially when you import in text files you know the problem with you in the United States and you get data for with dates formatted in the UK European non-american style or you in the UK and you rest of the world and you get American dates this can result in dates getting transposed power queries got your back on this one as you can see here with the ship date column and the order date column the dates are in US format these dates have come from a text file so the purely text so what we want to do is to transform these dates into an internal date number and then it can be formatted any way you like and Excel isn't going to get confused but right now the dates are in the text format so we are going to change these dates to an internal date format so the dates don't get messed up so the first step is you right-click on your column and you choose change type but then you go down to using locale because your computer knows what your date format is and what your locale is this is the area that you got to be careful with so step one you're going to change the data type to date okay that's good but say for example I live in the UK and I'm thinking oh well I want UK date format yes I do but that's not what I'm going to do because power query wants to know what date format is the data it's looking at in so what I got to do is I've got to match the date format and it's English and I need to go down to United States so the actual data it's dealing with is English United States I want English United Kingdom but this particular dialog box wants to know how is the data right now so the data right now is English United States I want to click on that and when I click OK I then get the date in UK format that's day.they month month year year year year ok so it said okay this data is in US for mass we know what it is but the user of power-play we wanted a new case Farmar translated it to an internal system number and then it can output the data format any way we want and it knows that our computer is set to UK format so it gave us the date in that so we just do the same again with ship days light click Change type using locale change it to days go to English and United States click ok and we're off to the races now I actually got this trick farmer an M query book the book on how to use power query using M etc M is for monkey or whatever the link to the book for for reviews etc is below I've got the Kindle version and I just picked out that snippet which was quite good okay so let's remember we are going to be using VBA with this I'm just preparing stuff first stick with me on this because you won't be disappointed you're gonna learn a heck of a lot what we got here as I've said is a text file that's normally too big field to manipulate in Excel but you're going to do some free manipulation in it right now just as if you were using an enterprise database so if we look at this data we've got 1.5 million rows it seems to be products sales and profit broken up by region country and product type item type what we have here as you can see is unit price unit cost total revenue total cost total profit but you know total profit is just a numeric value it doesn't you know profit of $100 on a $50 item is an amazing profit or profit of $100 on five million dollars is like pretty darn poor I think what we really need here is percent profit or profit percent let's see how we can do it here before the day two even gets into Excel so what I'm going to do is we've got total profit and we've got total cost so if I click on the add column tab and then click on custom column you then get this dialog box here so what I want to do is divide total profit by total cast to get traffic percent so what I'm going to do is click on total profit double click and then put in the divide sign then total cost and I want to switch that to percent so I'm going to just wrap that in parentheses and I've got a times at by a hundred but then I don't want a whole load of becoming digits while I want is two decimal places so what I'm going to do now is going to put another parenthesis around this particular calculation because I never quite got on top of order of operations numerical operations so I just like to go crazy with parentheses to the right of the equal sign and what type number and you can see the intellisense kicking in number dot round with a capital R and I'm going to open the parentheses and I'm going to go light to the end here and I'm going to put a comma for two and two for two decimal places and close the parentheses and I'm going to give the column a name and I'm going to call it profit percent or percent profit without a space I don't think okay now we have a new column showing the percent profit so close here got two hundred four percent profits so let's just see that my calculation looks good so the profit was thirty three 338,000 on 165,000 so yeah that's a pretty darn good profit I think we've added a new column now notice again as I've mentioned before everything is done in a step and and so to a certain extent this is an advantage of power pre week so you got your source if we click on source you can see the text with the headers just being part of the data use that then click on promoted headers and the first row of data which was the header is promoted to a header we've changed the data types of some stuff from text to numerical I think that was the order ID for example and now notice also we've lost all the changes I made to the column headers because the renamed column is the next that so I click on the next step rename column you can see the columns will be named change type with locale and change type with locale so this is where we change the data the data types of the dates so if you look at for example all the days it's in the u.s. format month day sorry yeah month day here but it's text not down as it has a date click on the change it's done that change but meanwhile ship date is still in the text format we're going to change that to a days in UK format again just click on that you can see it's changed and we've added a custom column now this is the end query language being used in a visual manner just like a macro if you want to look at how it looks behind the scenes you can click on the advanced editor here and this is your M query code so you can see source promoted header change type rename column change type at locale born change type at locale them change the type of locale change type of locale one added custom so you know I think these look a bit ugly these particular names to the light and it can make everything look very confusing but if you can look at the first row for example it says source equals you've got it's broken into a left and an end so in is going to be your output go figure but so it's operating on a particular CSV file the delimiter is color and some data to do with the encoding of the text file so it's got quote style done promoted headers table dot mode to header source which is that so so the way these particular this petite M code works and this is really me from in just reverse engineering it you can see everything that's done on this particular line here you could say goes in to the variable source and then the next line works on the output so so the item to the left of the parenthesis is the output of what's to the right of the parenthesis so everything here goes into source and then source is packaged and goes into the next line so the input to Table four Mortimer's is effectively the line above there which is in the variable source and the operation table that promote headers on source gives you the results of the operation which is promoted headers and so everything in this line and the line above you could say is in the variable of promoted headers and that goes into the transform column types switch region to type text country to text let's find something different although ID to an integer integer unit caste to a number so you begin to get the picture so if we have a look again as what we've got here so you got source promotion headers to change types now we're going to be programming this in VBA so what I want to do is first of all I want to make this particular code look a bit neater so let's just tidy up the EM code first so I'm going to go to advanced editor and something that I strongly advise you do when you're doing this is you launch notepad so I'm gonna launch notepad right now and what I'm gonna do is I want to copy my original this here into notepad okay so this is what our end code looks like now this is going to be going into the VBA and so we're gonna effectively hack the end code now first of all I think these look very unfriendly so what I'm going to do is we know these are variables so I'm going to change source to V source for variable source and this makes it clearer what we're doing so variable source becomes the input to the next process so if you think of each line as a process in this so table that promote hetal's is processing v source and what i'm going to change this to v for variable for motored headers and you see the hash tag and the quotes here are required because there's spaces in these descriptions so I'm going to call this V promoted headers and then that's going to be the input to the next line so it's beginning to look a bit more like code now that's a good thing for me I like code so let's look at this we've got a whole load of columns that the type was changed on now you know to a certain extent stuff that's text well we know it's text and it's never going to be anything else so let's see if we can simplify things by deleting the stuff that's text because we know that the type is always going to be text because the default type rather is text so if I take that out we know we're changing order of ID to integer ship date is going to be text so what we've got here now is text fields that will change to number so now I'm going to all this be changed you know and we're simplifying the M code I'm kind of learning stuff at the same time so change type now gets changed to V numeric so as you can see the output of the line above becomes the input to the line below what I'm gonna do here is I don't like the line been so long so I'm gonna say V renamed 0 1 and what I'm gonna do is I'm gonna make this a two-step process so I'm gonna copy all of this line and go to the line beneath and paste it and now it knows that a count of two variables the same name so I want to call this renamed too and so we named one so V change numeric is there so renamed one becomes the input two renamed two so let's paste that down and I'm going to delete unit price bus to the end like so so what I got to do here is keep the columns after that point so effectively I'm making it a two line process so units sold we've we've gone as far as units sold in lo and the row above but on this row I'm going to delete all the unit sold and so now we have everything we've made the cord a bit smaller and we have everything on one low okay and everything on two lowers rather so bearing in mind we're gonna be we're going to be copying this code into VBA so we do want to have it as neat as possible and as easy to understand as possible so if we we so we named the one becomes the input here and that creates an output cord renamed or tail now if you look on these two rows here this is where we changed the two dates the older days and the ship date to English us so that we would come true in the United Kingdom for us now however because it's kind of like a macro recorder it basically put stuff on two lines now these two lines are superfluous we only need one line so what I'm going to do in this instance is I'm going to get the ship date you can see each operation is within two brackets this curly braces so I'm copy that out and I'm gonna put that we're gonna put a comma and put that into the line above so and then I'm going to delete this row because I don't need it any longer and I'm going to call this Row V changed date changed two days UK and the previous processed step so we input the previous process step which is V V named or tail and then the output from this process is V changed to date UK now I mean in your normal office processes doing this could make a big difference to your workflow and speed things up because this is going to work a lot faster than opening up stuff in a memory hog like Excel there was much processing before it goes into Excel and your lightening excels workload so let's just copy this in here V changed today's U K becomes the import to this process here so now this is the custom field so I'm going to call this V profit percent so now we've got an output called V profit percent between the left and the in part of all the operations that are done on the file and now the last step goes to the output so you put the last step here and then you're good to go now I've done all this live let's hope we don't get any errors and want to copy all of this to note that so I have my changes that's the before and that's the after so I've added in two steps here but I think it's a bit easier to read I mean possibly I could have shortened these rows a bit more and created maybe three rename steps here but okay let's see what we've got you want to click down so you can see in the applied steps we've got cuff percents change today to UK and you can click on any section and you can step back so you see the good thing about the applied steps thing is it's very auditable you can see if somebody asks you how did you get this number this is how it's better than Excel formulas to a certain extent you can say you can show somebody well this is what the source file looked like the headers we knew that the first row ahead and sort the feathers have been promoted then we changed some columns to numeric when they power query automatically did this and took a guess but you can override as it changed some console numeric we then renamed with renamed the columns and then we changed the date format UK like that and then we created the partial percent now we can save this file now as I said I copied everything to notepad so this is our M code so what we're going to do now is we're going to copy this code here into some VBA this is quite powerful let me just show you what we can do with this taken from me that this cord has been copied into the VBA and I'll talk about that in a minute but that's just well let's just talk about it now I've created a function called get cleaned em query we've got the text file here and we've got our function here so I've got the leche I've put the left like that the reason of space it out like that is just to make it visually simple easier to see the left doesn't have to be on its own line it's just the way I've done it so I've created a string and I'm concatenated I'm basically concatenating all of this are in there so variable equals variable and string now a little modification I've done within the query is I've passed in the file path so if you look at the EM query at the EM code you can see the path to the file that we're working with is howard coded i don't want it hard-coded so with this function i create a variable passing in the file path and then within the m code you can see here i insert that particular variable so we have the file going into the end code and what that does is it allows it to be flexible it also with this particular excel file I've created a config sheet so you can put in the path to the file here you give the query a name like test data which will be is the equivalent of the table name if you're using access or sequel whatever name you put here so this is the actual file but this is the sharp cut name for the file whatever name you put here you will use in your queries this is for internal purposes within the VBA and this here is where you want the downloaded data to appear so I'll just show you what this could do now this particular file will be available for download at one of the links up here or up here or wherever it is you can get this file with the VBA and everything and so to go through the code in detail but let's just have a show you a working example of what can be done so if I initially just to keep things clean delete this connection that we see here let's say you know about the file you have copied in your M code or you don't know about the file but you've put in the path to the file here you can then do something like this so let's see we we happen to know the fire that's got one and a half million volts but what about a file that you don't know how many rows it's got well first of all we can do we can find out something about the file so a little bit of SQL is useful here and you can just learn it from what I'm doing now the one thing to be aware of with this is the SQL and the data is case sensitive in this instance it's not always with databases but in this instance it is case sensitive select top ten star that means the top ten rows star means all fields from test data and that this is where this test data comes in so test data refers to the file and if I click load you could see tend that row is loaded and you can see as a result of doing this we've got a connection to the data and that connection displays the structure of the data you can see a total profit there as well so we've got an idea just now what the big file looks like if you're looking at from the perspective of Excel the reason we've done some pre-processing is so as not to have spaces in the names and to make the data a little bit cleaner and easier to operate with but let's just figure out how many rows are in the data so you could then use a query something like this so what do we got here so select region count region as rows from test data group by region so what this query is going to do is it's going to give us two lows it's going to give us a summary kind of like a pivot table a summary of all the regions but count region is count how many rows for each of this let's just loaded what it does and you can see over here the size of the fire you get an an update on the import so okay so here's proof that we've got one and a half million rows in this file so its summarized all the lows by region so sub-saharan Africa has got three hundred and eighty nine thousand rows Middle East the North Africa has got 186 thousand Europe's got three hundred and eighty nine thousand etc so now you could choose to pull all the data in region by region because it's too big for the spreadsheet but any one of these can come in and to give you a case in point let's just try that so we know reads we know Europe has three hundred and eighty nine thousand odd rows in it so if I do something like this select star from star meaning give me all the columns from test data and within this text box to get a new line you need to press the shift key and enter where we should remember it's case sensitive equals the column names okay is sensitive Europe so now because we've limited the data set to Europe we know we're only going to get three hundred eighty nine thousand and seventy nine rows back and the spreadsheet can handle that so let's load that and you can see one hundred thirty five hundred sixty four thousand two hundred forty seven volts so it's loading all these rows into the connection was processing a lot of data and there we go we've got all of the data for Europe and so we've started that row 15 if I do end down you see three hundred and eighty nine or nine four - the 15 rows but in addition to that we've got it in a list so now you can do extra you can do extra queries straight off the bat here or you could choose to copy this data onto another tab so let's just put in a few test queries in here so let's just clear the filter let's just try another query select region country count country as transactions from test data so we're going to see how many countries and regions we have here this should be a quick one hundred two megabytes hundred thirty-seven megabytes so we have 1.5 billion rows because we're counting the country as transactions so this is yeah so we have basically 8045 transactions within bangladesh okay let's just try something different let's try another query now I've put in code to auto format this so this particular this is the year let's go to home and format cells I'm just gonna choose general so this particular query so you're doing this on a text file and putting it into Excel so select item type year ship date so it's taking this the date of the ship date turning it into a year returning it as a year units so that the quantity of units sold and your total revenue by product so you have the option of course to put all these in the pivot tables if you wanted to but you know this is very powerful so that's have a look at the code so as I said we've copied in the EM code into here which as you can see you can adjust it in power we hack you can even adjust that here and see what you have got but remember copy it in notepad first so when you're programming there's two elements you have to think about you there's the programming of the business rules and then there's the implementation code I try to get my students to code at the level of business rules because the code is a hell of a lot cleaner and easier to understand contrast both of these pieces of code here you've got refresh query here and you've got Bill's query here would you believe both of these procedures are doing the exact same thing in fact the code on the light was a rough code i coded prior to tidying it up to the code on the left and it's a poll on the left you will get so let's look at business rules well we want our code first of all to delete the previous connections then we need to load the config data then add a query connection to the worksheet then delete previous query table create a query table unload quick and dirty form at the table now you don't have to be a coder to look at that code and understand what it's doing but look at this code here on the right well you can start to make out stuff but glancing at this code you're not going to know what it's doing and you know that to me is that it is the sign of a good program or somebody who can write simple who can abstract the code and make it simple and I'll be teaching a lot of this in an upcoming class so let's step through the code now for typical query so I'm going to copy some SQL into this bar so we've got a breakpoint in the code let's run it so the first step is delete previous connections so we've got a connection here which is useful as I said because when you now--so avert you can see the structure of the data and this can help you figure out what your queries are going to look like so now let's just step into the code delete previous connection so that's a subroutine we go into the subroutine and you can see I have a loop here so the business logic and business rule is delete previous connections but you've got a far loop going over stuff so that's so it's looping over everything every possible connection that's in the spreadsheet and deleting so now this is where I use a class module to lower the configuration information now why would I use the class module because if we look at our config sheet you've got all of these variables here okay text file path query name query display name query download start seller and some other information later on the reason I'll use a class module is because it creates an object and the object of an object in this instance is it's one variable that can hold a load of information and therefore it makes my code look a lot tidier and a lot easier to maintain because there's an object that is going to be holding all this config information and it's got to be this object here or settings so if I did not have this object and let me just go through it and load it and you will see so I've created a function within my main module called load config and it is initializing a class module called class config so if we double click on the class module you can see it's pretty simple I've got some properties which are just public variables in this instance don't want to over complicate it with property let's and property gets and an initialize function the initialize function runs whenever the class is used and you can see what it's doing is it's addressing these variables here so these each of these variables has a range name assigned to it quite easily by going formula created from selection left column and everything in the left column becomes the range name on the light come okay so you've got this this config sheet and so the class module is going to load it and all of these one two three four five six variables are contained within the one object variable which keeps everything neat and stops your code from being spaghetti so let's create the class I'm going to step over it so now because I've started creating the variable the class automatically launches its initialize event which is like an auto wall for routine that runs whenever you use the variable so you can see text file as that query name is test data display name display address let's just have a look at this settings class so we've we've used the class module to create setting the variable settings so it's just one variable but if we go to the locals window and expand it you can see that it's got more information so now I'm going to delete the previous query table and you can see from the code it says do the previous query table so let's just step in so as I do that I'm just going to scroll so we can see the code you can see that delete previous query table and I'm passing in the configuration objects the class module which is an object you can see I've created a routine just for one line of code and you know quite did I do that well the reason is again separating business rules from implementation code if somebody's coming to this and glancing at this code in the future she query dated of age OS that display address that resize the yadda-yadda-yadda what's it going on oh it's deleting the previous query table it's saving human beings time because because humans times are heck for that more importance than computer time so code to make your code readable for the next poor chap who's most likely going to be you that's coming along in the future to look at your code so that's the reason to make the code highly readable so to do that anyway I highlight press over it's effectively just getting this range here the range of the query table the the object is passing in the display address and it's saying get the current region and add one sort so the codes is effectively doing is it's getting a 50 it's doing within the range getting the current region adding one mole on to that region for situations where you vote no data has returned so what that happens you tend to need this to be covered so it's effectively selecting that range and then say and you do that to it underneath the complete Bowl so let me go back to the code and press f8 to go over that and there's your data gone and you've got a clean spreadsheet to accept a new query so create query table and load again that line of code is self-explanatory that's business we would now let's go into the implementation of it and so this is the implementation now imagine that you saw this particular code as part of a long 100 line function it's a pain to deal with but you now know when you're glancing at it what it does this particular section of code is creating a query table and loading it so create the query table object there you have the object appearing on the spreadsheet and now apply the sequel that is in the list box here to a variable and then we're saying for the query table the command type is Excel commands sequel what's the command text well it's the SQL select region count region etc refresh Styles insert and delete cells the display name is the business programmer the data display name that we have from our config sheet and we're going to refresh it and on refresh when we refresh it you will see the queries and connection moving here and this data will with loads so let's do that now and so watch you can see the data thirty point one megabytes 32 Mike about eight hundred eight megabytes all that data is coming in from big data and now the data has loaded and we have some contingency for the data failing and now the final procedure is quick and dirty format table it's called quick and dirty format table because it's not hugely accurate but it's just for the purposes of this tutorial so get a worksheet we pass in the display name which again is in the object business programmer data so OS is an object business program data which came from the config sheet business program and data all this data has been loaded from the config sheet and just go through the code here so I mean the little trick really is when you're writing the code you often tend to write a long program because you sort of in closest flow but then come back and tidy it up and that's exactly what I did here so there you go that's the code set worksheet equals to nothing and this particular spreadsheet is available for download it will with the code and with the CSV file to test that all out it would be linked on the above or below but I think it's above and so it'll be business programmer calm for Cisco force that's whatever number is displayed to you you can get the spreadsheet and the data to hack this file and learn from it if you have any questions if there's anything that wasn't for you leave a comment below and I'll see if I can either answer the comments or create a new video as a result of that and if you found this useful please consider liking and subscribing thanks for watching
Info
Channel: Sean Johnson
Views: 1,769
Rating: 4.909091 out of 5
Keywords: power query, power query excel 2016, excel power query, excel, power query excel, vba excel programming tutorial, vba power query, vba sql excel, sql power query excel, power query import text file, power query editor, power query custom column, power query in excel, introduction to power query, power query tutorial, microsoft power query, how to use power query, microsoft excel, power query for beginners, excel power query training, learn power query
Id: QR1XjfWhw3Y
Channel Id: undefined
Length: 39min 57sec (2397 seconds)
Published: Sat May 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.