Data Analytics with the Google Stack (SQL, Python, Data Visualization, Data Analysis)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
learn data analytics using only free Google services this course teaches key data analytics Concepts using Google big query Google Sheets Google looker studio and Google collab Vias will teach you data analytics using the Google stack hello guys welcome to this end to end free data analytics projects course I'm vas adya currently working as an analytics instructor near Berlin Germany in this course I will be covering all the important data analytics topics like Excel SQL Python and data visualization and use it to solve interesting questions on varied projects we will be using the Google stack here because to use these tools with Google stack we do not need any additional software installation all we need is a Gmail ID before we get started I would like to thank free code camp for the massive impact they creating and I'm happy to contribute to this cost and reach a larger audience so that a lot of people can learn these data analytics topics for free I also run a YouTube channel named analyst Aditya the link is given in the description here I have posted End to End videos on SQL python web scraping projects tblo power Bay and also tips on how you can build your resume how you can leverage LinkedIn to get a job and also some tips around interviews and building your GitHub profile so let's get started with the project series first up we will start with spreadsheets spreadsheets are one of the most common tools tools you're going to use in your data analyst job in this coming project we are going to use Google Sheets to analyze my own travel expenses data of my trip to varied countries we are going to use Simple intermediate and also Advanced functions using Google Sheets and drive insights and find interesting stats from my data so this is the data set we are going to work with it's in fact my original data I always want to showcase original data and we can do some analysis on it that's always interesting so this is like the date this is the country where I've been to this is the city then I have something called as cost let's say this is in Euros there's a column called category ID right it has 1 2 3 4 5 6 7 10 9 all these numbers what this is for this sake we have a second table called category so as you see category ID each of them signifies a specific area where I have spent money right one is break first for instance s is some entrance ticket to some place or some Museum or whatever 10 is like something that has been done with shopping and so on right so we have two simple data sets and uh now we are going to solve some questions using Google Sheets right like plenty of questions plenty of Concepts and topics we're going to do so let me directly jump into the question list we'll try to crack them one by one right this will be great practice for you we going to cover a wide array and range of functions as well so first of all find the unique values for each of the columns and show us how to count the unique values right so unique values for each of the columns what is the formula you can use I'll just show you for one specific uh column let us say I want to find the unique countries that I've been to right I've been to number of countries so how I can do that is using the unique function itself so you can say unique and just select the full range right B2 is to b128 so now I get the unique names of countries that I've been to right around eight countries so suppose I want to count the unique countries what we can do the only difference in the formula is you say count unique and you will select the same range okay so now I'll get eight why because I've been to eight unique countries right so if you put unique you'll get the actual values if you say count unique you'll get the number of unique values right same way if we have to do the same thing for another example let us say City I can simply say unique Open brackets and select the different column right so I can select this and you see I've been to so many different cities right basically I've been to 12 different cities across eight different countries right so unique is a powerful use case uh so do remember that and when you want to count unique values use count unique function right very very relevant so so we have done first question now next question comes how can you combine the category table with the original data so I have category table here where I have the actual name of the category how can I combine it with the original data right maybe I'll just insert one more column here just to kind of create some uh space so how can we combine these two so I have category ID here and I have category ID here right they are the same common column now we can join these two data sets using the V lookup formula because we have a common matching column right how can we do that so I can add a new column and I will call it category itself so I will simply say we look up I want to look for this category ID value where do I want to look for it I'll go to this table I'll select this range right this is the range where I want to find in this which column do I want to look for I want to look for the second column right because second column has the ual value of the category and I want an exact match so I'll put false okay and also remember for the whole formula range I want this to be fixed right always I'm going to refer to the same A2 to b14 range so I will just lock it with the dollar and now I get okay category ID one means break first and now let us say I can just go here and double click the plus I will get the formula copied accordingly right so for example five category five means for travel right basically for flights let us just double check so five is travel okay now we have the column category also in our original data we managed to integrate it with a simple vlookup function very very powerful function next question what is the total cost spent on breakfast overall there are multiple ways to do it I'm going to teach you a simple way using Su if okay so let me go here and like only for B break first I want to count the cost right so what I can do for example I can write break first okay and I can say sum if right what is the range right where I want to check for the criteria I want to check this column so I'll select this complete thing okay the second thing is the actual criteria right what is the criteria I want it to match with break first so I will select this and in the case it matches I want to sum I want to sum the values in this column right so I just select that and leave it okay so I noticed that for breakfast alone overall spent around 400 right simple use of some if right so if we want to verify this for example we can simply put a filter right let us just filter out break first this is just to cross check so I'll filter out breakfast and I can see the total cost as you see the sum is coming to 400 so overall across all these tips for breakfast I have spent 00 so I'm going to just remove the filter this is a powerful use case of su very good function to know okay next uh total cost spent on travel for Spain okay there's an extra twist here there are two conditions okay so travel is one column and also specifically for Country Spain right so in this case what we can do we can use a su ifs function we want to give multiple conditions right so for instance I will uh just write these two words here right for example I can write Spain and travel and now I will use a su use function right so I will say sum ifs here the first thing is the sum range so I want to sum this range okay and now criteria range number one right criteria range number one is basically country so I'm going to select uh B2 to B1 128 what is the first criteria what is the first criteria I'm going to select if that ever matches the value here which is pain and then criteria range two criteria range two is basically the last column right basically our category column and what is actual criteria 2 here criteria 2 is actually if it matches travel okay and I'm going to close and I get the value as 768 right so 768 is the amount I spent on travel to Spain if you see here it's a 280 here on a trip to mayor and there's one more Spain Madrid where travel was 488 if you add them up you'll get that number so that is the answer so some so some a column here in the case cost column but based on multiple conditions Country Must Be Spain at the same time the category must be travel okay another useful use case I would say then how many rows are there in the data that have category as travel okay so we want to count the number of rows here so this is a simple use case of a count if function okay so I'm simply going to say count if uh so this is the range again I want to check in this range right I'll select them all and I want to make sure how many of them equate to travel right so I have the word set up here already so I'll just click that sell and as you see there are 11 times in 11 rows there is some cost associated with travel right simple use case of the county function okay moving on to the next next one find the month of the date using a calculation in a new column so in this data we are given month right how can we find the uh I mean we given the date how can we find the month right I'm just going to show a dummy formula here how you can do it that's actually month function itself directly and you just select month of date okay and you close it you get 12 right so this is December 24 2022 so month is 12 so it gives an auto suggestion if you want you can rock the formula just go to remove it for now but you can use the month function as you see for that question next question use an if formula to show wherever cost greater than 100 are expensive the other value should be shown as cheap okay so I'm going to go back here right so let's call this column like price or something like that so if it is greater than 100 so I will say if this value is greater than 100 right then it is expensive very simple otherwise I'll call it cheap right as simple as that and close the bracket okay so 20 is obviously less than 100 so it's cheap going to drag the formula so this is 230 obviously it's expensive and so on okay easy use case of a simple IF function many of times you might use this in your job as well okay moving on to the next one show pivot table for average cost per country right so what can we do I'm going to just select the whole range right all the rows and columns and now I'm going to just say insert pivot table and I'm going to put it in a new sheet going back to the question average cost per country right so in the rows I can add country right here it is and then in the values I can add the cost right I don't want to show the sum I want to show the average I will select the average so then I get the values right so as you see average cost per country um Colombia it's yeah around 164 uh other countries are like slightly lesser right what could be the reason right if we were asked to investigate one step further why this number is high if we go back into the data and deep dive a little bit you see mostly cost is like like this travel is maybe 200 300 and so on but if you notice specifically for Colombia travel itself cost 2,600 right this is the flight ticket going from Europe to you know South America is quite costly and this is the anomaly right this is the Reon why the average for Colombia is high right you you notice this is very high this is the reason you need to be able to investigate and find reasons like that because there is a abnormal value you should be able to find out what is the root cause of that abnormal value right this is an additional thing you could do for this question then uh simple uh calculation how would you display the first two letters for each country right so I have uh the country as a column here how can I display the first two letters for each country for example for austri I want to show Au we can use a simple text function called left so I would say left I select the string which is this and just say two right that will just show the first two letters so if I copy the same formula for instance for here for Slovakia you see you'll getl right very simple use case of LIF function very powerful function text function moving on write a function to find or check if a city name contains the letter V okay what can we do for this case so I'm just going to copy this data up to price and I'm going to go to a new sheet and just paste special like basically the values so now we want to check if the letter V is there in the word or not we can simply use a find function okay I'll write find so I want to search for the letter V and where do I want to search for I want to search for in the um I think it was city or country let's let's just go back here yeah it is for the city right so if it shows value obviously it's it's an error um because yeah it is not able to find it right so if I drag this formula down you can see bratis laava has the letter V none of these have anything plit has the letter V so the interesting thing to note here is VNA has the letter V but it has a cap capital V right that is the reason it is still showing a value error which means it cannot find V so let us convert this into a capital V and C then we get the value one because it is able to find a capital V in VN right so remember there's a differentiation between the small letter V and the capital letter V if we want a showcase next one formula to show the second third and fourth letters of the column category right I'll go to this new sheet second third and fourth letters right of column category which is here here so what we can do is we can try to use the mid function so I'll say mid of this particular string then I want to start from the second letter and then show the next three letters so I will put three so in break first I'll get second letter third letter fourth letter which is re now I can drag the formula down for lunch you will get UNC for dinner you'll get i n travel you'll get second third and fourth is r and so on okay another useful function to know uh mid right very relevant moving to the next question which country cost the most money overall according to the data right which had the highest cost so for this again we can just select everything right you can go to the pivote table and say create a pivot table so again I'm just going to select country and then I'm going to select in values cost right the total cost right as you notice here and then after that what can we do just think about it so if you notice here we want to arrange it by some of cost but descending right so what can we do we can go here and select sum of cost okay this should work and then we notice of course total cost RS Colombia and Peru are on on the top right overall cost RS these two are the most expensive right let me move to the next question so we are done with question this one right so conditional format rows which have country as Spain with red color date before 12th July 2023 right any date that is before 12th July 2023 with blue color okay so let us try this I'll go to the original data so country as Spain right how can we conditional format let me go here format I'll say conditional formatting so for this whole set I want to apply a rule uh let us say text is exactly and then I will write Spain and I want to do red color right so I'll select red color and you know the rule is written and it's done okay so that means Spain is quoted red right let's see if there is some more Spain later yes it's also given red color right that's perfect okay let us move on to the next one which is for the date right so what they are asking us is anything before 12th July should be blue color so for example let me select one date I go to format conditional formatting and I I say for example this cell and I can say custom formula right this is cell uh A2 so I'll say if this is less than I will use the date function okay so date 2023 July is the 7th month 12 right anything before that please quote it with blue color okay and I will say done so by doing that yes this is before 12th July 2023 it is blue color and what I can do I can format paint and just drag it along all the way down to the end of the data right like this and I stopped here and now let's say which all got blue color so as you see all this date is 13th December so up to any date before 12th July which is like up to 10th July got coded with blue color okay this is how you can do conditional formatting for a particular column in Google Sheets so that question is done what was the highest value of cost in the given data this is a simple on line formula so you can say maximum of the value of cost in this column right which is the full e so I presume it will be 2600 which I already showed you right I think it's the travel that happened with if I remember right it'll be in bota like Colombia this is what it is okay moving on to the next question which category cost the most money in pero here instead of calculating I'm just going to go through the data right because there's not a lot of rows for per if we scroll down we see um you know plenty of uh combinations here so what can we probably do we can try to maybe do the P table again so I'm just going to select everything right these columns and I'm going to say insert P table okay the first thing I want to add is I want to add a filter for country that's the first thing um here we can select what we want to show want to clear everything and just show Peru right so it's data only for Peru now coming to row what can we add here we can add the category and then in the values I can add total cost okay so I have these values now order by I can see descending sum of cost okay so now I notice again with respect to Peru to an extent we can see that um travel is probably uh the costliest right and how do I know this is only Peru because here I selected only for Peru okay so travel is costing around uh 960 right that's probably one of the most expensive now let me move back now they're asking can you create a drop- down list of cities and show the total cost of a particular City depending on the city selected right so for this what we can do I'm going to copy um all this I'm going to go to a new sheet right I'm going to pay special I will say values only right now they want a drop down of cities how can we do a drop down for that we can go to data data validation right we can add rules but before that we want to find the unique values of cities right that will make our calculation easier so I will write unique and I will select the full list of cities right as a first step so I have the list of cities from this I can generate the drop now I can go back to data validation and say add rule uh and then from here I'm just going to say uh drop down from a range okay so it's going to apply to basically uh this cell over here right which is I3 which is written Here and Now drop down from a range I'm selecting and what are the values so I want to select all these values okay want to click okay and say done now now let's close okay so now I have the CT and now it's a simple matter of writing a sum IF function okay I'm going to say Su if then I'm going to say for instance City range so I'm going to select everything right and what is the criteria if it is equivalent to the one we have in the drop down which is I3 then please show the total cost so the sum range of this column which is column e okay so it's 143 for bratis laava now if I change it to Berlin it's 144 banal Luca is 281 and so on right if I change it to split was it's 252 and so on right the total cost uh depending on the city selected right we did data validation created a drop down how many unique months are there in the data how can we find this so uh to find this what we can do I mean we have so many months here right so we can first try to find the month so I'll insert one column to the left again right let me say month and then as you all know we can put a simple formula month of date A2 that's done I'm just going to track this okay and now from this column I can simply write count unique right to count the number of unique months write the formula with the right spelling and I want to do it for H2 to H1 128 and then that's it so we have like we have only four unique months it's a little bit weird let's check we have 12167 and again 12 right more or less yeah so you have December if you notice then you have July June then yeah January there's there and then again December right so the count unique values says that it's only four so 12 1 6 and 7 okay then moving on to the next one so this is a bit more complicated I think they want us to create a grid with countries on the one side categories on the other use the index match to Showcase formulas to display the total cost depending on a combination this question seems very very clumsy but let's break it down and see what it actually means right so for this question what I'm going to do is so they're asking based on countries and then categories and the total cost okay so easiest option I can think about is first I'll create a pivot table as usual I will say insert pivot table in a new sheet okay now step number one I will bring country to row then I'll bring uh category to columns and then for the values I want I'll click and say sum of cost okay so this is done now we have for all the different categories the thing so now they want us to create a drop down right so this is going to be a bit tricky so let us say I will have Austria here for example right just to show you a simple example then let us say I write dinner here okay so now for Austria for dinner what was the total cost right if we see the grid we see the value 12 how can we use index match to show this automatically right so what we can do we will have to use formulas right first I will try to match okay the value of Austria and see over in this crit which row does it match to I'm going to select all this and now I will get it matches the first row right because Austria comes in the first row so if I have to change this to Bosnia then it will show second row because in this list of values Bosnia comes in the second row right so this is one value the same match I will use to see where does dinner come in this full list of categories so I can say match of dinner right and now I want to compare it with this list of categories so I'm going to select all these values all the way up to travel okay and I put a zero because I want an exact match so it shows that dinner is the second column right if I were to let's say select game then it would be the fourth column right because break first dinner entrance game right so let me move it back to dinner so now I have where it comes in the country like which row and where it comes in the category like where where is the column basically now what we can do right once we do these two steps we can use a index function right and now what we need to do is we have to select the whole range right just select the range of the 13 columns and the countries right no need to select the grand totals selecting the whole range and in this we need to put 2 comma 2 right that was the value we had we get 40 right so instead of 2 comma 2 I can make so I can substitute it with this cell values which is b19 and c19 right so I'll make it b19 I'll make this c19 now I get 40 so for Bosnia for dinner it cost 40 right let us say now I change it to Peru so for Peru so this value has changed so Peru for dinner for Peru for dinner it cost $189 right so now the grid changes automatically if I change dinner to entrance for Peru for entrance right for Peru for entrance it cost 200 and we get that value now our value is dynamic so you need to match the column and the row and then put it into an index function for the whole array and then accordingly based on the row and the column your value will automatically change right cool use case of index and match moving to the next question 21st question can you use a filter function to Showcase only data when country equal to Colombia right so what I'm going to do again I'm going to copy this right the whole data I'm going to put it in a new sheet okay I'm going to pay special so filter function to show only the information when country is Colombia right so for this we can use a filter function which is directly there in Google Sheets I'm going to say filter right so let us say I want to filter the whole range of data so I'll select everything right and what is the condition that I want to apply I want to apply it for this column right basically country column and say whenever this is equal to columia we can write like this right so filter what is the range you want to show then which column you want to filter and what is the criteria here it has to be equal to Colombia so I'm going to write like that now you notice I got like with just one formula an array formula I got all the rows just for Colombia right basically Colombia and in that there was only one city which is bota right super powerful function filter so definitely learn it very good to know moving back to the questions they're saying use text join function maybe this is new for us to show cities separated by a hyphen depending on the name of the country right for one particular date of your choice right this seems very very confusing you know so so let us maybe go back to the data right over here so they are to select one particular date right so for example let us select just 24th December right there's only Austria in here but I will still copy this to a new sheet okay so for this date for the particular country right showcase all the cities separated by a hyphen right so in this case host has only one city right how can we apply this function how can we apply this so you'll have to say text join okay and then you want to separate them by hyphen you put that first then if you want if there is empty values if you want to ignore you can leave it blank so I'm going to leave it blank and then what do you want to showcase so I will basically put a filter formula here you say so here I want to basically showcase the city okay and the criteria is this thing should be equal to a particular country right Austria for example let's see what we get so I get VN VN V separated by hyphen right just like that hypothetically if I had something else like salsburg or something then as you see I'll get Vienna VNA salsburg Vienna right so essentially this is what it is if we want to test this formula on a better data I can also go back to my previous sheet right and grab some of the data where I have multiple countries right multiple cities like Peru has Lima on the different days kusco on a different day Machu picu on a different day right so maybe I can just grab some of this to just show you so I just copy all this come back to sheet seven right let me just uh paste it again here like this right so the columns are not sorted let me just delete them for now and also remove this column now what I can do here if I have to do text join I could do the same thing right text join D limiter here is the hyphen then I'm going to ignore the empty and the text so I can as usual put a filter want a filter for this and then I'm going to say based on the criteria that is this is equal to B okay now we get Lima Li basically all the city names they're coming in an order and hyphen is there to separate each of them right that is the whole purpose of this question so good function to not text join is not so complicated but not commonly used also but it's good to know it's like an add-on question which country cost the highest money for travel right so I think we did a kind of similar question somewhere so uh let's go back to one of the P table so here I have uh sum of cost right so like let me go back to edit which country cost the highest for travel right so here I'm going not going to show only one item I'm going to select everything okay and then let me just remove country from here right here I will add category right so I want to see only for travel right so I'm going to clear and select only travel that is Step number one okay and actually here in the row I'm going to remove category and I'm going to add country right and now the filter is only applying for travel as you notice which country cost highest for travel again it's Columbia 2600 as we can see we can also sort and see but yeah it cost the most for travel I think we did a similar question before insert a pie chart to show cost breakdown per category right so I can go back to pivot table 7 uh cost breakdown per category so what can we do um I'm just going to remove the filter here and uh instead of country I can add uh cost break down by category so I have this so I can now simply select this full thing right and I can say the chart and we get pie chart right so we get the percentages stay cost 19.7% travel was the most expensive 57% breakfast 4.8% DIN 4.6% and so on right simple pie chart can showcase okay which cost the most in a very visually appealing format then translate the word travel into Spanish right so here we can use a Google translate function itself so I'm going to say Google Translate what is the text right the text is travel I'm going to put that Source language right I want to translate it from English obviously this is English to Spanish Spanish will be es so if I translate this I get vaha like J sounds like a h sound so that means travel right so you can do that cool so we have done that let's move to the next question display total cost spent per month right Remember December was there in 2 years but they are looking at month level so we can use the same month although it was different years and let us see right so what can we do we can maybe go to the original data again right so I'm going to Simply sa insert first let me copy this to a new sheet right because I want to do uh the month calculation as well um let me just say paste special right I just want to paste the values so I'm going to do month and say month of um this date okay and then drag the formula so step number one step number two select them insert P table create so row will have month then values will have sum of cost and then percentage of total right so for that here you can see uh values is shows as you can select percentage of column then that will show okay 80% of the cost came from December right this is an easy way to do it okay let's move to the next question how many days were spent in Spanish speaking countries right how do we know which is Spanish speaking this like needs some basic knowledge so here I know for example Spain Colombia and Peru are the Spanish speaking countries right so what can we do I can again copy this data and put it into a new sheet I will pay special okay then I can put a filter and only select Spanish speaking countries right what is those Peru Spain and Colombia right so these are the dates now count the unique days right so I'll simply say count unique of this right so now now I have the filter data let's say I can copy this and put it into another new sheet right so now here I have only Spain Peru and Colombia right now I can find the unique number of days so I can say count unique of this column basically so there are about 22 unique days I have spent in these Spanish speaking countries right that is there then concatenate country and city separated by a hyphen right this seems a simple question how can we uh concatenate country and city so let us go to the original data again or this column for example I'm going to remove the filter again right concatenate country and city so we can do it two ways is you can use concate this one which is country then put a hyphen and then select the city right this is option number one Austria hyphen Vienna option number two is you can select this put an Amper sand then hyphen Amper sand and then the city C2 right so either use Amper sand or you can use concatenate function both are going to work to do this then how do you remove duplicates from the country column so this is also very straightforward we go back to the original data let us copy the country column into a new sheet so I'm going to Space special values so now if I want to remove duplicates I can get to data data cleanup remove duplicates and now I get the only the original countries the unique countries which is eight Austria Slovakia Germany Spain Bosnia Croatia Colombia and Peru okay so I hope you enjoyed the video we have managed to solve all the questions I'll post the sheet also in our description have a look at it just to finish off suppose some of the formulas you're not aware of right you can obviously use chat GPT so I'm going to show you some interesting examples where even if you don't know the formula how you could leverage CH GPD let us take simple example of that Google translate question for travel so you can write write me a Google translate function in Google Sheets to convert the word travel from English to Spanish okay simple example chat jpt will probably give you uh the formula Google translate function says travel and Es right is English es the target language is Spanish right so simple level it can easily teach you things like this right if we are to just give it one more scenario for example here I have like the data right let's say is in column B cost is in column B I want to find only the total cost for Country Spain right we want to do some if but let's see if we can generate that using chat gbt right I have column b as country column e as cost write me a sumi formula to show only the cost when the column has value let's say Spain right and it's going to give a simple sum if formula it's assuming that a head us so it simply says sum if B is to be Spain and then do the cost of that column right so if I can copy this code come here and paste that's a double equal to let me just remove that you see we get 1747 which is kind of the cost for Spain okay so if you're not aware of the formula also you can use chat GPD please remember that give the right prompts and you will learn the formulas okay I hope you enjoyed the video I'll see you again in another project if you like the video subscribe to the channel follow us for more videos there are more topic related videos like this with projects on SQL python coming up stay tuned spreadsheets are a great tool to start with however not always you get data in spreadsheets especially if the data is very large it becomes prudent to understand how to write queries and get data from databases in this coming project we are going to use Google be query to write SQL queries and analyze information on my own expenses data from 2023 this expenses data has detailed breakdown of expenses across various categories we are going to write simple and also intermediate and advanced SQL queries using Google B query to drive in sites so once again welcome to this practical project we going to use real data to understand and learn SQL write queries and make insights and we're going to use Google B query okay so I have some data here for example there's the date uh the company basically where this expense was done uh there is something called category ID we'll learn about this further and what was the cost rate like what was the amount spent so there's a separate table called category ID and C category so you have category ID and the category so for example one is rent five is recharge eight is shopping and so on now we're going to try and answer questions by uploading this data to Google bigquery we are going to answer a lot of questions okay so what is the first step we can go and put this URL I'll put it in our description as well and you'll land in a place like Google bigquery okay now you can say create project as a first step give it a name big query analysis right you can give whatever name you want I'm going to say create okay that's the step number one it will take some time to create a project okay and this is where we can actually upload our data and start writing queries so now it's loaded so over here I can go and say create data set so data set let me say uh expenses okay I can give it any name I want rest you can leave it as such and just say create data set okay so now I have a data set called expense in this I can upload tables okay so how are we going to upload tables so I'm going to go and download this as a CSV file okay so downloaded that same way we will download category also as a CSV file now I have downloaded both so I can go go here and say first of all create table so I want to upload a table so I'll select upload so as a first step I'll select uh the data table okay and I can give it a name data over here I will say Auto detect and create table right so first I'm uploading the data table so now we have the data table loaded you can go to 3D buttons and click on query and you can select everything from the table right I will just remove this uh limit so if you select everything from the table you get the data now okay so basically here what you have is bigquery analysis 41260 that's the name of the project expenses is the name of your data set and data is the name of your table okay you can also go here and click on open to understand the data types like there's date column which is date company is a string category ID is an integer cost is an integer okay same way I want to upload the other table which is category so I will again say create table I'll say upload and I want to browse my computer to upload so I'm going to come and say uh category and give it a name category itself right so it's easy to understand Auto detect and create okay again it might take a few seconds and it will finish loading so now I also have the category table so if I say query nice select star I'll be able to see the category table right so as you notice category table as category ID uh the data table also has category this is the common column between the two okay now we have both the data sets ready and the tables ready so let's start U writing our queries and answering questions first question find the unique categories in the data table okay so I'm going to go here so how do you do unique categories I can say distinct category ID right as simple as that and then I just run the query so of course there is 1 2 3 4 5 6 7 8 9 in the data table okay so I'm going to copy the query and I'm going to save it here okay that's the query so next one show the purchase which had the highest cost okay so let me go back to this table and select everything so the purchase which had the highest cost so basically we want to find the maximum right so I can simply say maximum of cost right this is a very simple formula to get started these are more like warm-up questions so we get 630 right so that is the answer so I'm just going to copy this and come here and paste the query okay write a query to show only first 20 rows of data so what is the logic we need to use so if you say select everything you'll obviously get all the rows right so if you have to select only the first 20 rows of data what can we do we can simply write U the statement called limit okay so you can say limit 20 uh when you write that it's going to show only 20 rows as you see here now we are only able to see 20 rows okay so I'm going to save this query as well so these are all pretty quick um yeah like if it's multiple lines it's coming here so maybe I'll remove it from here and yeah I can paste it here okay so this is question uh number three okay maybe I'll just create question numbers okay so 1 2 3 and so on then you know you can just do like this so this was question number three right now let's move to the next question show the unique company names where money has been spent so I go back here and I say select everything so unique company names where money has been spent so I can simply say distin company right it's very similar to one of the previous questions we just solved so so you see you see the company names like there are like a lot like close to 21 company names these are the unique names where money has been spent okay so I'm going to come back here and I'm going to say question four and paste my query okay let me go back to the questions next question how many unique days has money been spent in each month so in each month how many unique days right so for that first we need to also find the month so how can we do this question so let me say select everything from the table right so now I have a lot of days like Jan Feb March April and so on uh 2023 there I think even May and June so what can we do for month we can try to use a formula called extract of month from date okay so if I run this query let's see what we get we're able to get the months right 1 2 3 4 5 6 the question here is Unit t money has been spent in each month okay so I can select the month and unique days what can we do for Unique days you can do a count of distinct date right um and then you can do that and then uh Group by this uh extract of month from date itself right so that will give for each month the unique number of days let us now run the query right so as you notice in uh one month one which is January 13 unique days some kind of purchase was done in month through February it was 18 and so on okay so the highest seems to be in March when there was something spent on 21 unique days right that's why we are doing count of distinct date so this is going to be question number five so I'm going to paste the query okay let's move to question number six for the above question do the same just show it in descending order right so what is the difference here the same thing I want to show in descending order of this count distinct date so I can simply say order by count of distinct date and then say descending Okay so now we will see the month of March which are the highest number of unique days on top right now this is in descending order so I'm going to copy this and uh this is going to be question number six right almost same we are just also doing some kind of sorting okay that's question number six and anyways I'll put this in our description the solution also so you can use it later let's move to the next question show only data of category ID 3A 4 okay this is question number seven so again I'm going to just delete all this and I'm going to say select star okay uh category ID 3 comma 4 only that so what we can do it's a very simple thing so you can say where category ID in 3A 4 right you need to use the inst statement so this will only show all the data all the expenses of these three Cate I mean these two categories three and four as you see right if you see a lot of four uh Category 3 is very minimal I think just once so that's the answer okay this is question number 7 so I'm going to paste the query here okay let's move to the next question question number eight what is the highest category ID of expense in March right so we also want to filter for the month year let's go here highest category ID of expense in March okay so first of all what we can do we can say where extract of month from date is equal to three okay because we want for March category ID with the highest expense so what we can do we can write category ID can do the sum of cost right the total cost and then we want to group by category ID right and not only that they want the highest right so I can also do order by sum of cost descending right very similar to the previous question so what I'm doing here is I'm just filtering for month number three which is March for each category ID the total cost and then grouping by category ID obviously and then also ordering by sum of cost descending so that we can see the highest on top okay so now let me run this query and we can see category ID 1 had the highest expense it was 630 we don't know what this is we'll see it later but category id1 had 630 EUR worth expenses so I'm going to copy this query so this is question number eight okay so we're done with question eight let's move to the next question question n which store I think by store they mean the company had the highest expense in May okay so now it's May so I'm just going to change this uh some of cost is the same so when they mean store it means company so I'm going to select company and and then Group by company right and order by sum of cost descending because I want to see the highest and this is for the month of May so let me run this I notice okay Prima again right like this is the rent that I spend usually every month so Prima has the highest the next one is re which is a kind of Supermarket okay so this is how you can do this question month number five me uh which company or store had the highest expense so I'm going to copy this again this is question number n right so we'll keep moving next is question number 10 uh question number 10 which category had the lowest total number in February right lowest total number I think they mean lowest total cost for February which category okay so now there is a small twist so here right in the data table if you noce I say select everything just going to close this and say start so if you notice here I have only category ID I don't have the name of the category category comes from this table right so if I click on this table there is category ID and category right so we can try to join on category ID between the two tables uh and we need to find for uh February okay so what can we do here so I will take from this table data table I will call it as a and I want to join this with basically um the other table so the other table name is also the same almost so I can copy like this and instead of data the table's name is category right I'm going to substitute it like that and say as B right so we going to try and do a join and what is the common connection both have category ID right so I'm going to say a do category ID equal to P do category ID right and now from the second table I can select the category and from the first table I can select the total cost okay I can do this and then I group by B do category and what else uh for f February right so I need to also do One V condition so where extract of month from date right that filter is equal to two right two means February so let's run this and see It'll be again rent only rent is highest usually but then next one we see it is grocery right 276 which makes sense to rent grossery is usually my highest expense so some restaurant and other things we can see as well okay so this is how you can join on a common column put a filter and also Group by to see the cost right all right so I'm going to just copy this this is question number 10 right so I'm going to paste it here and let's continue question number 11 show the data only where shop name I guess this means company name contains the letter W okay so this is just one table again so I will say select everything from this table shop name right so we where we can say company like percentage W percentage so this means that using the like we can find whichever contain the letter of the world W so you have for instance re right majorly and WW Worth right capital W it starts with that letter so these are the shops right so I'm going to copy this and paste it here this is question number 11 moving on to question number 12 find a way to get the category based on category ID this is what I showed you just a while back so let us say from this table I select everything I'll say a. star I will call it as a how can we join with the other table can simply do a join and yeah just copy this table name and then just replace the last part instead of data it is category table so I can do that as B and then you can say on a. category ID equal to b. category ID right and I can say b. Star as well right so I'm joining both the tables let me see whatever is possible so now I can see okay company Prima category ID 1 second table also has a category ID so since it's a duplicate column name it's giving underscore one and what is the actual name of category so one is rent two is Crosser and so on right so we'll if I move forward uh I see six is ticket seven is cosmetics and so on right this is how you can join both the tables I will just copy this okay this is question number 12 okay let's move to next question question number 13 is there any category ID not present in the data table okay how can we do this so first of all let us sect this table right category table the original table and let's see what is there so over here I see 1 2 3 4 5 6 7 8 9 10 right so 10 unique category IDs but if I were to see the original table the other one data table and say from that table distinct category ID so I see 1 2 3 4 5 6 7 8 9 right so that means category ID 10 is actually missing from data table but how can we show this right is there a way to show this by comparing the two tables right so what we can do here so we can select disting category ID this is obviously an option and then manually compare but other than that what we can do we can make our category table like primary like this as a this time okay and then I'm going to join this with data table as B okay so for so good on a. category ID equal to b. category ID right this is very simple so again I can select everything from both tables for sake of Simplicity and then here you have to write a join but I'm going to do a left join right so what left join will do is it will show all the category IDs from the category table if there's a match in data table it will show the match otherwise it will show null Okay so do like this let's see what happens we get 115 rows okay so you can scroll down can even make this 200 so I see all the results in one page if you notice I will see everything right all the data but at the bottom right category Ed 10 is not there in the data table so you get everything is null right so how can we now spot only the ones which are null you can say where uh B do let's say d date is null right any column which is null you can just pick that because when there is a match obviously there'll be a date and data table then we see this right then we see this category ID this is the one that is there in the category table but it is not there in the data table right so there is no not been any expense related to this category others so I'm going to copy this this is I think question number 13 okay again a simple use of left join now let's move to question number 14 show categories with expense more than 150 for the month of April right so again uh let us delete this so this time I I will make our data table as primary as usual I will make our category table as secondary okay and I just want to do a normal join right wherever there are matches so that is all fine on will still remain the same now from the first table again I want to know the category right and what do we want uh expenses so I'll say uh category actually comes from the second table right B table so I'll do B do category and from the first table I want to do cost so I will say sum of a do cost is there any other filter month of April right note that down so where extract of month from date right so April is the fourth month so we can say is equal to 4 then Group by B do category right this is Step number one now let us see what we get so we get a lot of expense okay and now they want to show only those with more than 150 okay so what can we do here we can simply say having right on an aggregated calculated column we can put a filter like using having having sum of a do cost greater than 15 now we'll see only those categories it's going to be rent and grossery the rest are below 150 right nice use of the having Clause this is question number 14 okay so let us move to question number 15 they're asking any patterns in ticket expenses over time right so what do they mean let us uh combine both these tables together right and I'm going to say a. star comma B Star right so let's see what we get so there is a category um let's see if there's a category like ticket so I'll say where category equal to ticket okay not sure if there is something like that but let's see okay so there is no data Maybe it's capital T let's try like that and you see DB is do bond company where I spend for tickets yes there are ticket columns they're asking any patterns Okay so so maybe what we can do now we can try and do it like per month right so I'll say as usual uh maybe I'll extract the month right so I'll say extract of month from date which we usually do and uh patterns so I'll try to put some of cost itself right and here it should be b. category equal to ticket and uh now group by basically this extract of month from date so I'm going to copy that I'm going to paste it here okay if you notice here just by looking at it there's a lot of 33s but then there's a 98 as well so let's see what has happened okay so January was 33 Fe Fe was 33 uh for some reason in March there's been a lot of expenses it's become 99 April is 66 then from May B basically this uh Dand ticket started so me and my wife each took for €49 so it became standardized okay so for some reason March there have been lot of trips so it has increased right so that is the trend in terms of tickets so I'm going to copy this this is question number 15 okay I'm saving all the queries so you'll have access next which restaurant has received the maximum orders based on days right right so that means like which restaurant has received the orders on the unique number of days right that's what it means actually so restaurant obviously is a category so I'll put restaurant okay and here the company right company is where the name of the restaurant or the name of the shop is there so I will say company and the question is maximum orders based on number of of days so we want to count the unique days which we kind of did little bit earlier so count of distinct date right for the restaurant category so first let's run that and see so so many restaurants are there right sound is there Kebab shop is there so Panda seems to be the highest we could also simply say order by count of distinct dat descending right so pandas is a restaurant where we buy MOS that has been the most prominent in the six months being there nine times right that's the highest so another interesting use case of count distinct order by alongside group bu okay so this was question number 16 so I'm going to paste the query let's move further 17th question calculate average spend per day for restaurants okay average spend per day that means you have to find the total cost and divide it by the number of unique days you been to the restaurant right this is one way of doing it so the cost is what sum of cost divided by count of unique days right this will kind of you could also do count of date uh this is also another option right assume that maybe on the same day you been to two restaurants then this number can change so anyways let us do both the calculations and see what is coming so basically average spend per day for restaurants so I'm going to do with unique days so it shows on average when we go to a restaurant we spend around 15 right based on unique days if I change this to days right let's see what happens yeah slightly lesser 14.5 but still it's 14 to 15 Euro on every visit on average to a restaurant okay so that's what happens so I'm going to save this question number 17 right that's basically that 14 to 15 is the amount we spend on average when we go to a restaurant right some may be lower some may be higher we we are not sure about that but basically that is the overall average which day of week saw the highest spend in me right so we need to know the function for day of week is it there maybe we can also check with chat gbd highest spend for the month of May so remember it's for the month of May so this is question number 18 right which day of week so here we don't even need the category table so I'm going to remove that so I have date right so we have this function called format date and you can say like percentage W so if you're not sure we can go and ask chat GPT as well right so always chat GPT is there to help so Group by week day or day of week in Google big query and find total cost please tell me formula something like that you know you can type something like this should give and it's going to give you a general example it's giving sales date and it's saying format date okay percentage a right that seems to be useful so we will just copy that so as you see Sunday is 1 um Saturday is 7 right so let's go and try that here we have date the question is highest spending me okay so we still need to keep that in mind so I'll say sum of cost okay that is is obviously there and then Group by this so This step is covered but before that it is May so I have to say where extract of month from date is like it's month number five so I have to put five right let's run this and see so as you see we already get the day so we get different ones so it looks like Tuesday is the highest right and this is all categories put together it looks like Tuesday is highest obviously this is 7:34 that is because maybe on that day I paid the rent right rent itself is 6:30 that is the reason right that becomes a big number but other than that you see wedness day is also quite high right for the month of May so I'm going to copy this this is question number 18 okay I'm going to leave it here let's move to the next question question number 19 calculate total cost for grocery per month and show month in year and month format right separated by hyphen so total cost for grocery per month so let's go back here and let's do the join again right because we need to identify uh this particular uh category called as grocery so for that we need to join back with the category table so I'm going to again write category that is Step number one and what is the common column on a. category ID so we just going to repeat the same thing again this is done now where Clause we have to put where this B do category should be equal to grocery right it's g with the capital and I think the spelling here is wrong okay and now we want to find total cost per month and show the month in year and month format so for that what we can do like percentage a you also have percentage y percentage M right this will show in year month format total cost okay I think that is pretty much it and I can also do a group by right and what do I want to group by I want to group by this year month so I guess this should be it let's run the question is total cost of grocery per month so for example January is 210 May is 183 Feb is 276 March is 178 April is 174 and so on right June is also pretty less and what else have they asked they just asked to calculate that so I'm just going to copy this right I'm going to put it here as question number 19 okay let's move further question number 20 calculate total spent for shops starting with capital letter r okay I think they want like for example stuff like R so I'm going to just say sum of cost from the original table I don't want to join and uh the column name is company so I'll say where company like R percentage right so that starts with the letter A so just give us the value 859 so if I want to see what are the companies you can also do a group by so you can just do like this and then say Group by company not sure if there is anything else anyways let's see right there's R and Rosman right that's pretty much it so I'm going to just copy this this is simple question number 20 right so we kind of getting closer to the end let's continue 21 how many unique companies exist in the shopping category right so again we have to go back to the join right so maybe I'm going to come here and uh pick this up right I'm going to put it back here so the question goes like this unique companies that exist in the shopping category okay so we can say where B do category right equal to shopping so I will just say select Star right first let's see if there's something like this actually for spelling is right for shopping yeah we have for instance Teddy Amazon and wwor I think that is pretty much it anyway I will say distinct company that will give us the distinct values so Teddy Amazon Worth right only three companies are unique that exist in the uh shopping category okay that was question number I remember 21 okay move to question number 22 what is the spending pattern at re month-wise and any insights okay so we can go back and pick this question up right I'm going to copy this so R and month-wise right so I don't really need the join here I think we just need the original table so I want buy month so I'll put company comma per month total cost and then Group by company as well and uh we can say where company equal to reev right I think this should work so the question again let's see it's question 22 spending pattern at R month-wise okay so R January 128 February 128 right very very standardized March 104 so March and April 145 March has dropped a little bit I think right and then um yeah it's 12 13 it's very very stable pretty much around the same Mark as you see more or less so it's like 120 130 that kind of range right so there's no Trend but March I can see it's 104 that was the month like I made lesser purchases because my wife had gone home that also makes sense it's so another use case of this format date function so this is question number 22 so I'm going to paste okay so we have I think about 12 more question any Trend with respect to eating at dominoes okay so same thing here uh the company name will become if we see the original data right it will become dominoes okay so where just remove this and say dominoes okay any Trends so I'll just see month-wise 1722 it's standardized I think in March I've ordered a lot more I didn't cook much so March it has been a lot 49 otherwise it's 17 14 15 right this is the standard if you order once a month uh so we can also see distinct dates I have Ed in each month as well just to see the patterns so if you notice yeah most months it's one in March alone I ordered three times with dominoes okay that's the pattern um that is question number 23 I'm going to save the query let's move to the next question 24 is there any month where grocery expense has is a bit different or as it changed a little bit right so I think we did query similar to this right so I'm going to copy this so I'm going to run this again so grocery expense yeah as you see it's kind of standardized right January is 210 febr is 276 there picked up and then yeah March and April as you see it's below 200 so like there's been a drop and may is again 180 right so it looks like February had a peak right like February had a lot of grocery purchases and then uh June again dropped June we went for a trip as well we were not there for a few days in Germany so I guess that's why it's dropped to 156 so I can strongly see February has peaked at it's a month where we have purchased a lot of grossery right that was question number 24 so February had a slight jump in terms of grocery expenses then move to 25 show the show only the company with highest Spen in each category for April right so this is probably a question where we might need to use something like Windows functions right only only the company with the highest pin in each category for April okay so I'm going to I probably don't need the category ID I just remove that the join so I we have the original table so I probably need to find the total cost right so what we can do we can do category ID then uh the company so all this is good and and then this is only for specifically the month of April right so I have to say where extract of month from date is equal to four right and then say Group by category ID for comma company so I could just copy this right so I get each month I mean each uh category company and then this right right now what if I want to show only the highest like for example uh Category 2 is grocery in that there is R there is interest store right so what if I want to show only the highest right same way in four which looks like a restaurant there is Panda jaur Domino's kebab and S so only the highest right s is the highest spending how can I pick and show that so this is interesting so I'll say sum of cost as total cost right this is Step number one now I can put this whole thing into a CT right so with Clause I can say with let us say uh data new or something as right so I can do step number one like this and now from this data I can select company comma category ID and then I can do a rank okay so I can say rank Partition by category ID so in each category ID and also do a order by that total cost column descending right to show the one on top and say as ranking right and then I say from final new right that is the name of the temporary class or table I created with the width so now I can run this table final new must be qualified with the data set right so there is some error here it's not final new it's data new right put the wrong name now it should work okay so now I see okay in category4 sound is one in category labar that is one in Amazon and Teddy are category 8 which is shopping Amazon is one right now I want to show only the ones that are one to do that what we can do in big query we have a formula called qualify you can say qualify ranking equal to one right this is like a V Clause but meant for window functions so if we write that then we will see only the ones that are ranked one in each category right so rev is coming as the highest in grocery Rosman comes in cosmetics s in restaurant Amazon in shopping and I think DB is the only thing in tickets so we are getting all that right so this is a nice use case of the with Clause normal Group by and also learning to use the window function like Rank and filtering rank or window function we can use qualify right so great question this one question number 25 I'm going to save okay question 26 let's move on we have I think few more questions percentage change change in total cost for each month and find the month with the highest percentage change so anytime you have percentage change year on year it's time to use a window function like lag okay so for each month percentage change in cost they're asking so I'm going to start with select and uh we need month right so we can write uh format date function and say percentage y percentage M right that is Step number one and we want to uh find the total cost right so I can say sum of cost and then copy this and put it in the group I okay so there is no filter here so I will remove where and I'm just going to run this okay I think there are two commas I will remove the one of them format date it should be for the date function right I mean date column so I forgot to put that over here as well right so sometimes be careful you might see errors like this silly mistakes only right so I'm getting total cost all months okay so this is great so now what I can do I can put this in a with final as in a CTE right that is the first step now I have to calculate percentage change so now I can say select let us say this format date whatever is called as year month I give it an alas name so I can say year month total cost and also to see the previous value I can say lag of this column lag of total cost then over here there is just month and year right so we don't really need a partition so we can say order by this year month column right ascending order and I can call this as previous month cost right and say from final let's see what we get so we see each month so for example uh January 1062 is total cost previous month is not there it's null February total cost 1166 previous month value was 1062 we get that and you know here it's not in order but for example March 1071 previous month cost is February that's 1166 and we are getting that here right so so this is like we getting them side by side how can we now calculate percentage change so for that it just becomes little more complicated so I will put this whole thing into another CTE so with I will say final 2 as or something like that okay close the brackets just drag this down and now from this final to I can select year month then I want to do this percentage difference so I will trite total cost minus previous month cost divided by previous month cost right from final two that's the second CT and then I want to order by year month right and if I want to do percentage I can also put this whole thing and multiply it by 100 right let's see what we get okay so we don't really see the values so if I want to also see the values over here I can put those columns as well right so total cost previous month cost as well and run this just for you to have that appeal so from for January there's nothing makes sense February you see um it's basically a 99.7% increase right from January so from 1062 it became 1166 then from Feb to March it dropped again so minus 8% decline from here to April again it declined by about 2% then from here it increased by about 7% then from 1125 it dropped to 1084 so 3.7% drop so the highest has been from Jan to Feb where we saw 99.79% increase okay this is how you can use multiple CTE and also use lag to do this percentage change okay interesting and slightly complicated question was question number 26 I'm going to paste the query here let's move to question 27 okay do the same as that question but only for restaurant category okay so if if you have to do only for restaurant category the query is going to remain pretty much the same and we know we have to just put a filter on the restaurant category right so if I go here and open this table in a new tab I see for example restaurant is category ID 4 so I can go back to our query and over here I simply put where categ category ID equal to four right so that will filter the total cost but only for restaurant right nothing else changes so it's running let's see what we get now we see some start differences for example from FIB to March the jump is significant 50% jump so I told you march I didn't cook much I used to order outside that kind of makes sense right and then same way then it drops back again in April and then again drops and then from uh May to June again there's a jump right so 73% jump so it's like a lot of trends like going up coming down but this is what it is for this restaurant category q27 okay let's move to question 28 we're nearing the end find the date with highest number of unique categories where money was spent okay this is a fairly simple question so I'm going to remove everything else okay delete all that is there find the date with the highest number of unique categories okay so I say select date and then count of distinct category ID right on a particular date what were the unique categories and then we want to find which was the highest date in this so I'll say Group by date order by discount distinct uh category ID and say I put double remove that and see descending as simple as that let's see if there's a particular date okay so I think 7th June there have been three different categories where there's been some kind of purchase money has been spent right that is the day rest of it is 2 or one so 7th June is that date so I'm going to copy this this is question number 28 simple question 29 use Cas statement to category is response as Indian versus non-indian based on name and show total cost for June right so for June we need to use some case statement logic let's go back here right so first of all I will remove all this and I will say where extract of month from date equal to 6 okay step number one let's select everything and then for restaurant so we also need to format it by restaurant category so if I go here for example jur is restaurant so category ID should be four right so I also put that as a filter equal to four so let's see what what is been the restaurant expenses so you see lot right so in this we have to tactically separate Indian non Indian so the only Indian is jpur the rest are all non- Indian right so I will simply say case when company equal to jur pretty simple then Indian else non- Indian okay end as restaurant type you can give it a name then you sum of cost that's it right so I'm splitting jur versus the other ones the others are non Indian and now we can say Group by this uh restaurant type write the column that I created for month of June for restaurant category Indian versus non-indian non-indian 50 Indian 21 right that is the split up Indian only jur non Indian had Panda and other restaurants as question number I think this is 29 okay so we getting to the end 30 ratio of total spend for restaurants versus grocery for April okay so April so first thing I'll do is I'll put this is four um can remove this so let us go back to the category table and uh so restaurant is four a groceries is two right category ID so let's note that down so we want to do the uh total cost only right so I'll do this is a use case of a sum sum if so I can say ratio of uh restaurant versus grocery Right restaurant is four grocery is two that is the ID so sum of if category ID equal to 4 right if it is four then do the cost otherwise zero this will show the total for the restaurant and I can simply divide this by sum of if category ID equal to two whenever it is category id2 which is grocery please sum the cost otherwise zero so if I divide both these and I'm putting equal to four in the filter because it is the month of April right let us see what we get so we get 36 right what does that mean so if I put a comma and see the values right what is the restaurant spin what is the category spin uh I mean for grocery so restaurant spend is 63 grocery is 174 the ratio is about 36 right so you can put divided by and that will give you the answer so that is question number 30 cool moving on 31 average spend per month at interest store interest store looks like some grocery store so I can come back here can say where company equal to let's see if there is something called interest store right so I'm going to just say star yeah interest store is there so average spend per month okay so what can we do we can say company comma let's put average cost I think it's AVG but we also want to do per month so I'm going to put per format I mean we can even put just the month so you can say extract of month from date we want and group by we can say these two right so instead of writing company and extra extract of month from date we can also write Group by one comma two group by the first and second column this is also possible then we notice okay it's usually 12 15 7 this is average of each purchase right so we can do that or to make it even more correct we can say sum of cost divided by count of distinct date so like cost per each unique date that we visit if I do something like this then you see it's around that right March was a bit cheap we didn't buy much but the rest of the months is around 12 to 15 right so this is basically calculating total cost divided by total number of days we have been this will give you average cost for that month we are doing the same for all the months that was question 31 next one which company in shopping category had the highest total cost I think we did something like this already uh shopping category is what category ID is8 so I can say where category ID is 8 that's a easy thing to do which company had the highest cost right so I can simply say company comma sum of cost again I can Group by one group by one means Group by the First Column which is company so I can run this I think there only two or three companies so we can see Amazon had the highest cost okay you can also do an order by if you want question number 32 okay and then use Union Clause to show total cost for Kebab shop and also Panda using two different queries uh so let us uh see the distinct names of these shops total cost using Union so that is keop shop K and Sr capital and what else was there Panda okay so we can easily do this we can say company comma sum of cost from table where company equal to first let me put Kebob shop um Group by company this is done and then I can use Union all and simply copy the same query right only difference is here I will substitute it by Panda they want us to write two different queries to do this give us the total cost for Kebab shop and Panda Kebab shop 34 Panda 66 right simple use of Union all question number 33 so we are almost there and then finally last question is there any fully duplicate value in the data right that means every row like every column should be the same is there any kind of duplicate data like that this is a bit of a tricky question so I can say select star from and do this right I will run so for this we need to do a check of all the columns if there is any kind of duplicate okay so what we can do I will select everything and then I will say row number and then I can say over and here we need to Partition by every single column so by date company category ID and cost and say row number okay alas name let's try to run this query right and we see if I select all the 200 rows I see row number is only one right that kind of means means there's no duplicate what does this mean Suppose there was row number two somewhere that means that particular set of data is repeating multiple times right so if row number is more than one that means it is potentially duplicate data right so we don't have that so to check that we can put this whole thing in a CTE so I will say which CTE as um and then say to check for duplicates what we will do is we'll say select star from CT where row number is greater than one right if it shows some value then that is a duplicate here it is no data that means all our values are unique unique in each column right that means there's no two rows where all the values for all columns are the same right this is also very challenging and commonly asked question to find duplicates so you can use this method so I'm going to copy that question 34 okay so I hope you enjoyed the video it was a long video you can practice all these question we have tested a lot of Concepts I'm going to put the solutions all the data in our description stay tuned for more videos I'll see you again in another video next week till then take care bye now that we have learned using Excel and also writing SQL queries you need to understand many of times in your data analyst job you'll be dealing with business stakeholders they might not have the relevant technical skills in this case it becomes important to tell your story and drive your analysis through Visual medium so in this coming project we are going to use Google looker Studio to analyze my own social media data from LinkedIn YouTube and Instagram and build basic graphs charts and understand how you can also join data in lucer studio and drive simple insights so here we have some social media data this is my actual social media data I post across multiple platforms Instagram YouTube and also LinkedIn I post regularly on all three platforms as you may all know so I've just taken the month of the post the year of the post and collected the actual data right views likes and comments for each of these posts right and what was the title what was the idea of each of these post then there's a column called category ID what does this event mean there is 1 2 3 and so on so for this sake I have a second table okay so it says category ID and the post category right for example category ID one is tips two is projects five is like some interview four is more of fun post and so on right three stands for information now what we're going to try to do bring these two data into looker Studio this used to be called Google data Studio before and try to build some simple graphs simple analysis and drive insights okay what we are going to do is the social media data we going to upload it to Big query if you want to understand how exactly to do it do check out my big query video which I posted a couple of weeks back back where I've shown in to end how to upload your data okay step number one we need to download this as a CSV file okay the social media data I've already downloaded it so I'm going to go to big query okay I already have a project I'm going to create a data set okay I'm going to say create data set let's say social right I can give it a name and I can create the data set okay that's it step number one is done so I have a data set here now I can click and go and say create table I will upload it from uh something that I already downloaded so I can simply crawl down and say social media sheet one right I have downloaded this data I can call it social unor media okay this is simple step number one schema autod detect and create table okay it will take a few seconds to load let's wait for that and once it's done we have the table okay so if I go to query and I say select everything let me run the data and see and now I get the data itself right as you see uh month year views likes comments everything is there okay so we have now connected this data set to bigquery the second data set will stay as a Google sheet now we going to try to combine both within Lucas Studio let's see how to do it so for that you go to Lucas studio. google.com this used to be called uh data Studio Google data Studio before so I'm going to type that and this is how you land right you land in a homepage something like this so I'm going to Simply say blank report okay when you click on blank report basically you'll have multiple ways to connect right you might connect it to your Google analytics data you might connect it to your big query you might connect it to your local computer right upload some file you might also connect it to a Google sheet right and there are so many other options to connect to other social media platform so today we are going to start with trying to connect it to bigquery okay because I have that table in bigquery right so I go here and I know this is my project in this social is the data set I created and I click on social media okay so I can connect to the table inside big query like this so I can simply say add right it will take some time to load and then simply say add to report okay step number one as you see we we have all the values right so if I pull for example the total views and uh see it across three platforms so Instagram had 750 7 lakh right like 7571 0 right basically 757,50 7,100 right depending on the notation you use so we have the data like this okay this is a nice and easy step number one now let us go and see some questions right so step number one is they are saying import data from bigquery we have already done that right uh now we will try to connect the Google sheet right with this before that let's see this one right they say write a case statement to categorize fun and tips content as one category other as a separate category right so if I go here I see fun is ID number four and tips is ID number one right so whenever it's tips or fun I want to put it as one category how can we do this in looker studio so first of all before that I'll come and give this a name social media report okay I can click social media report give it a name like this then I can go to Resource manage added data sources right and click on edit right now I can add a new field okay calculated field um and then what else can I do I can say for example category new right I can give it any name I want so the idea here is so I can use a case statement very similar to SQL so case when category ID right that is the column right when this is one or four right that's what we said then we can call it uh let's say fun tips right otherwise we can call it other right so we're splitting like this and we can say end okay in SQL we'll say end as here we can simply say end right as you see the formula is correct I can simply go and save right and I can click on done right so we have now created a new column called category new uh which is appearing here right so if you want to see category new for instance um as you see most of it is in other category right so uh fun and tips I have not created as much content so the views is only half right this is just giving us a general idea but you now learned how to create a case statement in Luca Studio as you see it was very very similar to how we do it in SQL okay so first step is done now let us come back to the uh previous question which is like also connect to the Google sheet data and see how can we merge them or combine them together okay so to do this first of all I'm going to say go to Resource manage added data source and I'm going to add a new data source right and this time it's going to come from Google Sheets right and what is it basically this category ID post right that sheet so I'm going to just select that and that's it right you can simply go and add right this way now we are connecting it to a second data source as well right so we have now social media as well as this uh category ID post Google sheet data okay now we have an option to blend or merge or combine the data so now let us try and uh do that okay so I'm going to say blend data okay now what do I want to blend so I want to join social media table from here I want everything right so I will select category ID category new is already there I want to see the month of of course the platform the year and the title as well right I'm just going to pull all of them into dimensions and in metrics I want to see the views what else I want to see the likes I want to see the comments I guess that's pretty much it right from social media table I'm pulling all the things for this blending or smudging process join with another table from this one category ID post I want the category ID obviously and I also want the post category right that gives us the name now that I have pulled both right I pulled everything from here and both the columns from the other table uh we can configure the join we can say select inner join right as you see category ID table is common in both so it will join based on that similar to the SQL principles so I'm simply going to save right and uh it will be named as Blended data one name doesn't matter so I'm going to Simply Save okay so as you see Blended data come now uh so now I have category ID and category new as well right apart from that the name of the post category right so if I pull post category here as you see in other itself we have information we have projects we also have interview in fun tips we have fun and tips right now they showing separately so we have been able to blend or merge the data similar to join in SQL right so we have done this this is cool we also have all the metrics here and the category ID right if I I want to see the category ID as well I can pull that in so for instance uh information is category ID 3 tips is category ID 1 and so on right now we have all the information set up we have also done the merging right now let us go back to the questions so case statement is done so they're asking to observe Trends in platforms and also categories right now let us do step by step so we will make sure we don't overpopulate too many graphs too many tables in one sheet so this is uh the first page right so over here let me do something very very simple so I'm going to just remove this right I'm going to cut this out right and I'm going to click on Blended data and let us say I want to insert a simple scorecard scorecard will have simple values like this right so first of all what do I want to see here let us say I want to see the total likes right I'll drag that in okay so the total likes that I received right across all these platforms is 25,338 so I can copy and paste so I can get a duplicate like this as well and now let me just put the total views okay so total views is uh more than a million right as you see across all the platforms put together and then I can again just copy and paste right so I can uh replicate as well and this time let us say I want to put total comments okay honestly we don't get many comments so total comments is less than 500 so we have these three right now this is like just simple top level metrix that we are seeing right uh before that if you want you can also go and add a text box on top right I can place it wherever I want maybe I'll just bring this down a little bit I think it just got stuck so let me just drag this down a bit also drag this down little bit and I can say insert text right I can place a text box here right I can say important overall information right I can give whatever name I want right can play around with the formatting and stuff right so as you see you can make it bold right by selecting the text we can make it bold uh if you want you can underline it and so on right you can play around with all that just going to drag this make it a little bigger right let's make the size even bigger maybe I can select the whole thing and make it 20px okay so it became a little bit bigger uh if we want to give it a color for the background or filling also we can do stuff like that okay if I want some background can also do like this right so um yeah going to just remove this other one so this is like important overall information so I have likes I have views and I also have comments okay you can share it with whatever color you want but this is just giving simple important overall information okay now let's say add new page okay so I'm creating a new page I've created a new page here let us say I want to add something important right so I can simply say table okay I'm going to insert a table let's say I want to see by platform right so I want to bring platform to Dimensions the total views the total likes and total comments so I'm going to drag uh total views here total likes as well and total comments okay the three main metrics for all the platforms this is across the complete time period okay so as you notice Instagram is on top right in terms of views YouTube views overall is not that much same Trend with the likes but if you see YouTube is getting more comments than Instagram so Instagram doesn't get as many comments right although the views is high likes is side can we show this visually is it possible so yes we can try to do it let's say we want to show different colors for views I can go to style right and I can scroll down now we have three metrics right views likes and comments so for metric one let us say I want to show something different can click heat map right and I say heat map you see Instagram is the highest it will show darker blue color right LinkedIn is slightly lighter blue the YouTube with less views is very much very light blue right for instance for the second metric for lights I can show like a bar right it shows Instagram has more likes and if I click on show number then it will also show the values right you can depi like a bar same way for comments right you can come to metric 3 and select whatever you want right I can say again heat map it will show a different color right you can play around with the color but this shows that LinkedIn received much more comments than the other two right so simple Trend we can see so it looks like Instagram performed well on the views and the likes but in terms of comments LinkedIn is the best right so this is what we get just at overall Picture level now if you want you can also add a filter right so you can say insert a drop down list and over here for example I can put month as a filter right I can bring that here now I can see per month right let's say I want to see only for no can select no and see the trend right so even there you see Instagram is dominating in views YouTube had a lot of comments right so I can select multiple months or just select one month for example December or I could select everything right so I could play around and see the overall trend right so I put month as a filter right so this uh is indicating me that yes Instagram is doing well in views and likes whereas LinkedIn is doing well in comments right this is a simple thing we can play around this is in terms of the platform right so this information is basically about platform now let us say I go to page and click on current page settings so we get something like this so we can go to style and you know you want to change the size and all you can play around as well right I can go to view mode right so if I go to view mode for example what will I get I can play around right your stakeholder can go select a particular month and they will see only that right for example there's no August data here for LinkedIn I guess in this so only Instagram and YouTube are showing right so I can select everything if I want if I come back to edit mode I get the access to edit the report right so this is by platform now let us move to something further let us say I add another new page right this time let us say I want to add some more calculations right so what I can do this time let me again add a table okay so this time let's say I want to see by category right so I have post category so I can select that right by category how things are going right so I I don't need the month so I can just again pull likes then comments and views right same so I can just reorder them so that uh first is views then it's likes and then it's comments right so now I see the values as you see across these five I can apply the same logic right I can simply come to uh the style section and for the first metrics let's say I want to show heat map and second metric also I want to show heat map okay I can select heat map as simply that and then third metric let's say I again select heat map okay so going to show different colors but now as you see here in terms of post category it looks like informative posts are doing really well overall right uh tips is also doing reasonably well the others there's not a lot of use maybe there's not a lot of content this could also be the case same way information and tips are getting a lot of likes as well and very good amount of comments right in fact projects and interview did not get as many comments maybe it's because number of project videos are also lesser right this is also something you need to look into but at least you can see what kind of posts getting views likes interactions and stuff like that on top of this right I can drag this down a little bit and I can say insert same way right I can insert a filter and what filter can I put in here right I can put in for instance platform itself right and see in each platform is there a different Trend right so let us say I want to see only Instagram in Instagram what is happening as you see I not really uh posted all types of content here right that's just information tips projects and fun there's no interview post here and you can see the trend right again information is doing pretty well tips is doing well as well right now if I go to LinkedIn let's see how it's different here there's little bit about projects but again it's predominantly information and tips and they are doing pretty well as well right and finally going to YouTube YouTube there's also projects but again information and tips are predominantly dominating right in terms of comments likes and views right so this way you can put a filter for a platform and see in each platform which post category is doing well in terms of overall numbers right now let me go and add another uh page so I'm going to Simply say page and add a new page okay now what I want to actually do I want to this time insert a table again right but this time the purpose is going to be little bit different right so what do I want to do here instead of just pulling likes comments views just like that I want to also do some calculation for example what is the ratio of likes to the Views what is the ratio of comments to the likes right or what is the ratio of comments to the overall views right if we get some kind of Trends here so for this what can we do let's say I want to see it by platform I can drag platform to Dimensions right that's step number one now I can do some calculations right so so I already have uh metric here if I want I can add a new metric okay how do we do that you can say add metric and say add field okay now let us say the first one I want to do is the ratio of likes to the Views I can type it like that I can come here and write a formula right I can say sum of likes right total likes divided by the total views right you get a formula like this and let's say I want it decimal with two percentage points can select like this and say apply okay so this is how I get the likes to views ratio right I can make this a little bit smaller if I want and drag this a little bigger so likes to views I'm getting right this is one calculation I doing already I can also add another calculation if I want right I can simply go here and I can say add field this time I want to do comments to likes right this is another interesting ratio let me again say percentage but two decimal points so this time it's going to be uh sum of commments right divid by the total number of likes right that's it I'm going to apply this right so this is a second calculation I'm doing this will be a very small number as you see uh comments to likes right these are all below 1 percentage but still it's fine to show them right there's likes to views and there's also comments to likes right if you we want we can add a third uh trick as well I'm going to go here and add this so this is nothing but comments I'm getting to the overall views right this might be a very minuscule number but anyway let me just add it right I'm going to say percentage two let's say if this calculation even shows some insight so I'm going to say some of commments but this time I'm going to divid by some of use right it's a very minuscule very small number that we going to get as you see 01 0 right doesn't really make a lot of sense uh so for instance maybe I can just remove it right I was just testing it but it's not really making much sense very very small numbers so I'm just going to keep the other two okay I'm going to remove comments to use now with this what else can we do we could for example bring in uh post category right can bring in post category so we see values across the board right so and I can drag this to see everything at one place and now we can also do an execute our heat map thing right to change the colors and stuff like that so for example I could come here and for the first metric I want to see heat map right so we get a distribution and then for the second metric Also let's say I want to see eat map right you can play around so now what we notice right the dark blue ones right for example YouTube tips is having uh very solid likes use ratio right it's not very high but it's 0.05% same with Instagram tips as well right looks like tips is doing fine whereas LinkedIn tips is not working right it's 0.01% same way YouTube information is having 0.05 YouTube fund content is also having 0.04 right and if you notice here generally uh in LinkedIn right the likes to view ratio is very low right whether it's tips projects or information so that is one thing we are observing then if you go to Instagram if there is an information or projects content that is also not doing that well so in Instagram what is primarily doing well is uh to give tips right so it looks like executing tips gives a good likes to views ratio right so out of people who view the video good number of people click on the like button and in YouTube generally tips projects and information all are working even fun videos are working right even interview is doing quite well 0.03% is not bad so it looks like the likes to views ratio is pretty good in YouTube right as a starting point now looking at the other metric which was basically the one we calculated that is comments to likes you see um whenever there's a project video compared to likes there's a lot of comments right and same with the interview right when I interviewed someone about their data analyst journey and so on uh these two are working and information video is also working fine in terms of comments to likes13 but you see Instagram and all this is non-existent right what is the reason like overall all comments only is very low in Instagram right if I go back to our first page we see it's only 490 not only that if you see here right you see the comments are very very low in Instagram right compared to the Views and the likes that is the reason why you see um this value is very very low for Instagram at least LinkedIn is fine 05% 04 but it's not still high right YouTube is still doing well even in comments to likes right so that is an interesting Trend to note so if I come back to this page you notice that YouTube has lower views and likes lower comments as well but the likes to views ratios and the comments to likes ratios are good right so although overall numbers are not that high but still there is reasonably good engagement compared to the Views happening in YouTube and if we want to pick up specific things it looks like uh information videos are doing well in terms of likes and when I post a project video or an interview video the comments I get compared to the likes is really solid right so this is some idea I can take to further develop my content in the future right very very simple insights so that is there now if I want to uh go and add a new page I can again say page and insert another page this time let's say I want to add some simple uh charts so for example I could select column chart right column chart I could for example uh put uh post category right as a simple example and see okay what is the trend in terms of likes and also in terms of use okay so we could do simple things like this right so I make it bigger as you see fun and interview is very small numbers right they kind of non-existent and one other thing if you notice is there's already two sets of graphs right because views is in millions right that is the light blue color and this dark blue is the lik one so as you notice information has like lot of likes and then tips and these two also get the maximum number views right fun and interview there were not many videos I guess so they're kind of non-existent so I can move this down right as a simple graph and uh I can again insert a drop down filter right pretty simple here I can bring in platform as the filter so I can select by platform and see for example only LinkedIn what are the values and so on LinkedIn there's only three types of content I put and yeah you can play around right so it's good to give interactive filters now I'm selecting all the platforms can select by platform and see any sort of insight simple bar graph with multiple metrics that can indicate some kind of trend right so this is how you can create a bar graph on top of this you can also add some insights right like if you notice information and tips is working you can add this as a bullet point as well right so this way you can play around little bit with uh you know your filters as well right so let's say I have this one uh graph uh can I can I do a second graph yes we can right so there are many other options as you see you could like I mean we already saw what's the Heap map uh I also showed you what is a scorecard so you could also create a pie chart for example right simple pie chart uh would be to understand in terms of platforms right which has the highest share in terms of views right so I can say insert pie chart right I can place a second chart here very very simple and in this pie chart what I'm going to do I want to see by platform the trend for the overall views okay so I'm pting views in here and uh here I want platform right I don't want the post category let me just uh drag platform here now you see right so Instagram has the line share in terms of the total views almost 60% then LinkedIn YouTube is a minus cule just 1.5% in terms of views right that's hardly 20K views right this is a simple Trend right so on top of this you can also insert a simple heading if you want right so you can say insert can come here you can add a text if you want right simple text box right and you can move this around and say for example view share by platform right of course you can do the formatting and all right I'm just uh building out the graphs and the calculations right you can of course format it the way you want uh you know you can select the text make it bigger as I have shown you in the beginning as well right so this is all there right you can give like a color as well you can come here select a random color right let's say select this one so you can share it and so on right you can give headings for each of these right you can play around with all this right so this is also possible so keep this in mind right on top of that what else can we do so if I go here like we have already done this color coding for most of it let's see what other features are possible right so if I come back here into this I have views likes already right if I come back into this graph let us say I want to add another metric let's say I just want to see the overall views also as a third metric I will just drag that in right so this way I can see the sum of views as a extra column here right for this column we can also do a few things right so if you go to St right there's a way to add conditional formatting similar to excel right you can play around with this so I can say for example add right let's say I want to add a color and what is the condition let's say if views is um greater than let's say 5,000 I want to give it a specific color right for the entire row I'm going to come and select the color let's say I'm going to select the color yellow and save right I did a simple conditional formatting so wherever as you see it views is more than 5,000 it will share the whole thing in yellow right so this is also something you can play around and do right if you want you can do it you can go back and also uh go to the rule and like simply delete it right this is just to play around this is also possible apart from that as you notice if I come to another uh chart like for example here I can also go to um the setup right there's also a way to do sorting right you can sort by descending and ascending you can play around you can also add a filter to the table right I'm just going to click and what can you do for example you can say include or exclude right and over here you can like select what kind of filter you want to you know impose right is it for the platform title so title for example only for certain type of title right if I come back into the data only for resum or something like that you can also put and play around right that is another possible option so yeah that is that now let's go back to the questions so we have done calculations for total views comments we did some calculation uh they are asking to observe trend-wise for months and drive key insights right so we found a few things already so one thing that we have still not done maybe I will add this as an extra page right is to see by month is there any kind of trend right so for this let me again go and insert a simple table right and I'm just going to do it by month right I think I have the column month already ready so I can just select this um and yeah go to properties right and then I can drag the month okay and month I want to see the total views total likes and total comments right I'm just going to drag these three to see if there is a trend right there's like multiple months so again we can simply go and do heat map or bar chart right that's up to us this time just to do something different I can go here and select bar chart and also show the value I'm going to do the same with the other metrics as well right I want to do a bar chart and show the value comments I'm just going to leave them as such so now if you notice again November seem to be the strongest month right and then um of course October was also strong right you see the buildup right August was only 16k views overall across platforms September 80k then 275,000 or 275 1,572 and then so on right November peaked right so we see a steady increase in Trend in terms of views over months so on top of this I can also add a filter I want for uh in terms of platform right so again I'm going to I'm not going to insert text I'm just going to insert a drop- down list so I can come here right and here I can give platform as a filter right as simple as that so month I don't want as a filter so let me just cut this and just platform right so I can select each platform let's see if there was specific Trends in LinkedIn LinkedIn there's only data for 3 months uh in LinkedIn October seems to be the strongest month right in YouTube let's see YouTube actually August seems to be the strongest month right but but in Instagram as you see from August September October November I mean December November in that November seems to be the strongest right so there's a steady uptrend up till November but when we select um overall all the months together then we saw see that there's a steady increase right there's August September October November December actually dropped a little bit right so I think that is something to keep in mind um but there's been a steady increase right with the drop in December August September October November showed an overall uptrend right so you can drive insights like this and then find which platform contributed to that increase right so this is how you can create a report and basically if you see here you can also go and download it as a PDF if you want you can also share it with people right you can share uh uh click the share option and then you can create uh for example a link right instead of restricted you can say public so I'm going to share the link of the dashboard right uh and then then I can copy the link and you know anyone can view it right on the internet right so we can do all these kind of things as well right play around so now if I go into View mode I mean all the pages don't have a title yet but as you see we have uh shown a few things right each page has their own unique story so make sure each page doesn't have one table or maximum one or two charts right keep that in mind now one last thing to do how can we actually uh give the pages some kind of names right this will be very relevant so for example I can go to manage pages right so I if I go to first page this is overall information so I can give it a rename overall info right I have not put a lot in this page just overall numbers right second page this is I think a platform wise metrics so I can click here and rename it platform wise metrix okay so I can give it a name like this so names are getting saved next page this is I think uh post category wise overall metrics again right so rename post category wise metrix right you can give some legible name right that should be enough then here it's I think this uh specific KPS that we are calculating by platform so like let's say specific kpas per platform and category right I can give a name like this this is overall likes and view share right so I can rename this as likes and view share and I'm mostly I'm considering by platform and category so I'll say platform and category right simple names and then the last one I think was by month right month-wise performance right so we can easily give names like this right I'm saving now if I go into View mode I also can jump across pages right I can go to this page this page CH change and play around with the filter and so on okay so I hope you enjoyed the video right we didn't do a lot but we we learned basics of Lucas Studio how to use it how to blend the data bring data from Google sheet from Big query how to create calculations right for example in these sheets as you see I've done some calculated field right like comments to likes I showed you how to do that how to do heat map coloring say a story keep this in mind I hope you enjoy the video see you again in another video till then take care bye now that we have learned Excel SQL and data visualization it becomes important to learn one more tool that can help you differentiate yourself in this job market in this era of AI it is very important to learn python in this coming project we are going to use Google collab and write python code to to analyze my own food intake and calorie data and drive insights we're going to use some simple functions from packages such as pandas matplot lib and num pipe additionally at the end we will solve a few general purpose python questions using Loops if statement and so on so here we have a data set it's basically the item I had there is a column called time ID I'll explain what that means then the amount of calories of that particular food and also the date right this is some sample data from Jan 2024 of the food I actually had like I eat dosa Corn Flakes C rice potato fry and so on okay what is time ID here so for time ID we have a second table so time ID one means breakfast two is lunch three is a snack evening snack and four is dinner okay it's like the time when I have that food so that is also marked and we have date of I think up to 15th of January right 1st to 15 January 2024 we're going to answer some questions based on this using python right this is going to be there plus we'll have some additional python questions to solve other general concepts as well so here we given list of questions so basically we will start doing them one by one but before that uh we also have this small clue to start with this right what does this mean what to do how to do uh how to use Python right this is what we're going to see first so for that step number one you simply type cab. research. google.com I will put this link in our description as well this will take you to place like this so over here you can click on new notebook right this is a place where you could actually write python code and execute it right it will set up like this so to begin uh they just asked us to put these two lines of command I'm just going to copy right uh let me just copy this okay this one and then the second one okay what does this mean is we want to be able to connect to Google Drive Right from this because the files right are in Google Drive right so basically these are the two data sets right if I go to my Google Drive I've already downloaded them and uploaded them as csvs okay calories. CSV and time day. CSV they basically contain the same data so you can upload it to your Google Drive and then over here we are trying to connect to Google Drive so for that we write this set of command as step number one let's see what happens right it will take a little bit to connect little bit of time let's just wait patiently and it will pop up like this give a permission to Google right so we can just click on that and say continue right that will enable us to connect to the dve and I think that's it right a few steps to start out if it is done it will give a green tick that is how we know that it's working okay is still taking time it says mounted so that means this is worked so we could add this plus code to keep adding more and more lines of code right first of all we are going to import some packages right to actually perform some operations one package is called as pandas so we will say import pandas as PD that will be step number one now I want to read one of the data set right as I told you there is calories and then there is time day CSV two files so to the calories file let me give it a name data so I could use this command from pandas pd. read CSV right and then all I need to uh do is I have to copy this okay and then I have to write my drive right because it's coming from my drive and then the name of the file right this is just a syntax so I'll say calories. CSV okay and now if I want to see what is there inside data then I am able to read the file so from our Google Drive we have been able to bring the data into Google collab which is able to let us execute python code right this is that the same way I can just copy the same thing for time data I would just give it a different name right time data I'm going to change this to time data okay and the name of the file was timecore off day. CSV okay it's showing some error let me go and check oh it's time day not time of day so I'll remove the off okay just make sure you put the correct file name and we have that data also right so we are nicely set up so data is there which is about my food and the calories time data is about uh the you know the time of the day basically was it breakfast lunch and so on right and time ID is the common column between these two data set now let us uh jump right into the questions and start start solving them one by one read data from the folder this is already done we have finished first question write a command to see what type of columns are present in the data how can we do that so for that we could simply say so like I'll say command to see what columns are present right so this is a way to write commments to make it more readable so I will say data. info right so in in the data table we have item time ID calories and the date right so this is how we can see by putting info we can see what is present in the data notice one thing date new is still an object object or like more like a string it's not yet date right keep that in mind we will need to address this at some point okay I'm just going to keep adding more lines of code show only the first 15 rows of the data how can we do that so first 15 rows so for that we can simply write data head of 15 okay this is like limit 15 in SQL if you say head of 15 you'll get the first 15 rows right starting from zero all the way to 14 this is how we can see the first 15 rows of data so that question is done convert the date new column to date or date time data type right why are they asking us to convert it as you notice here it's object right so how can we convert so I can come here convert date new I'll just give it a heading so I can say date of data of date new right this the column and say data of date new right and I want to convert it so I can say do apply and pandas has this function called to date time right so I want to convert it into date time so I will do this as a first step okay now that is done if I now write data. info let's see what we get now you notice date new has converted to date type type because I did the conversion here right so now it is no more an object we have been able to convert it to daytime this is simple operation commonly asked question sometimes now let's move to the next question combine the calories and time date table right the two tables with the common link right they want us to join or merge the two tables how can we do that so if you notice this is data the other one is called as time data time ID is the common column okay so we need to join on that so so I will say combined data right so I'm joining both how can we do that I will write data. merge of the other table time data right this is how you'll do now what is the common column on which we want to join here we have to notice one thing for the data column data table the column name is time ID t with the Cs whereas for the time data it's time ID but T small letter right so we have to mention both so the first table or the the left table is actually the data table so I will say left underscore on right what is the left column that is common on which we want to join it is this one right time ID but T SCS this is one and from the right table which is the time data table what is the column is one I copied now right so it's time ID but T is small right this is how we can combine the two tables now let's just quickly see how the new table will look like yeah we have time ID So based on time ID We join now we are able to get the time also in the same data right we have breakfast or dinner or lunch whatever we know that this is a duplicate column right it is same as this so we want we can remove this how to do that can say combine data. drop right you can write like this and let us say I I write the small T time ID then you can say a is equal to one right this basically means drop the column and then if I want to drop it in this existing data set itself I can write in place equal to True right when I do that this combined data table will lose that particular column as we are dropping it so now once I write that if I write combined data again now you notice that other time ID column is gone right now we have only five columns because they were duplicate I removed the other column right that is how you can drop a column so now we've been able to uh combine them and also drop the duplicate time ID column right both these are done now show the data filter the data only for break first right how can we do that so before that I'm just going to give a heading to this so what did we do here is merging the two data sets and also dropping duplicate column of time ID right this is what we did here so now let's move to the next one right so I'll keep adding more and more code so I'm just saving it again okay so so I'm just saving it something got stuck there it's okay so we'll continue so only data for breakfast data for breakfast how can we filter only for breakfast if you notice here breakfast is there in the time column right so if I want to show only data for breakfast can write data breakfast as a new data frame it's basically combine data right but within this combine data I want to filter for this column time right and it should be equal to only break first so whenever this combined data of time double equal to break first This Is How We compare and put a filter in Python please show or pick up the data right that is going to be stored into Data breakfast now if I read data break first I will see now I get only the data where the time is breakast this is only purely breakfast data right that is how you can do this question show only data where it is lunch and also calories total calories more than 250 two conditions right so time has to be lunch calories must be greater than 250 so let me put the condition lunch and calories above 250 how to put this condition so let me say data lunch and 250 above right I can give any name so I will say combine data that is the data set again within that combine data I have to introduce a filter so what is the filter the first filter is time has to be equal to lunch right that's the first rule and also you can put an Amper hand and say the other column is about calories so I'm going to copy this right uh column name is calories with the c and this has to be greater than 250 that is the condition okay and remember when we give multiple conditions we have to put each of them within a separate bracket right so this condition in this bracket then an Amper or an end and then the other condition again within this uh normal brackets okay so once I do this now let me run and see data lunch 250 and above what data we get now we only get data as you see off lunch and only above 250 right so basically there's only card rice which I eat during lunch it's more than 380 calories and then pizza right pretty much it and then once fried rice which was 350 right these are the only three items that I eat during lunch that have more than 250 calories right so that is how you could solve that question next one is Group by time ID time ID is like by breakfast lunch dinner whatever see which had the overall highest calories intake right so how do we solve this question let me just keep adding some more code so calories by time ID right so time ID column is there but I could also do by time because time contains lunch breakfast and all for this what we can do we can simply say combine data. Group by here we have to do group by similar to SQL so I want to group by time right and what I want to do is I want to do for calories right the total sum right so to see which is the time when I eat the most amount of food so I can see like this I run it and as you see breakfast 6650 dinner is 6175 little bit less snack is not everyday so it's less lunch is where I eat the most right more than 9,000 calories in this 15-day time period so lunch seems to be the highest amount of calories the time when I eat a lot next one sort the data of the given uh you know information by calories in descending order right so how can we do that so this is sort by calories this is also very straightforward question okay so I can say combine data dot there a like order by in SQL we have sort values a command right and then I can say by what do I want to sort by I want to sort by calories right just put that column name and what is the rule I want to sort it in descending so there is a function called ascending here I mean not a function an argument you can say ascending equal to false that means you want to sort in descending order now let me run this now you see Dosa is come on the top because that's the highest 400 banana is at the bottom because that has the lowest calories right so it is getting sorted in descending order of calories the highest calories is coming on Top This is how you could do a simple sort here just let me save it one more time and yeah I'm going to come back here next two seem very straightforward so show the unique values for the item column how many unique dates does the data have right so let's see here unique items right this is very straightforward so I can say combine data of items I think it's item right not items so item what are the unique values I could simply say do unique so you see Dosa cornflakes bread smoothie POA carrot cheese beans Tamar and R chips piz all this right all these are the unique values of items that I have eaten in these two weeks time frame if I want to find unique dates but also do a count right what is that so I can simply copy the similar formula right so here I the column name is date new that's the only difference but I want to count right so unique is similar to distinct in SQL right if you want to count distinct we can write Y unique okay Y unique is another function that will count the number of unique values so it says 14 values okay there 14 unique dates in the given data set that is what they wanted us to count how many unique data set is dates is there in the data set next one they say rename the calories column to intake right the calories column name should change to intake and the time column which is this breakfast lunch dinner thing to time of day how can we rename okay this is an interesting question rename is a function that's also sometimes asked in interviews so first of all what is the rename I can like create a new map in a dictionary okay first of all what I will do I write okay calories is there please rename it to intake okay that is what they want us to do and the other one is there's a column called time so you write this colon symbol and write I want to rename that to time of day okay this is what they're asking step number one now in the original data right combine data you can simply say combine data do rename okay okay and what is the renaming that should happen so the columns should be changed to what is given in the map okay you can say columns equal to map what does this mean so calories will change to intake time will change to time of day now can we cross check our combined data let's see now if you notice instead of calories we got intake instead of time we got time of day right so you create a normal map using a dictionary and then use the rename function to rename the columns right pretty interesting question then coming to next one from 12th January to 13th January can you show the percentage increase in total calories okay now remember calories column has been renamed to intake so we want to find percentage increase in intake from 12th to 13th January how can we accomplish this right so I'm going to keep adding more and more rows so 12 to 13 gen percentage change in calories right whether it was increase or decrease first of all let us see the data right we have combined data like this right now for each day right we want to First do the total sum of calories right how can we do that so first of all I will say Group by date new right I want to do for the overall day right not buy breakfast dinner and all for the overall day so I'll do that and then I want to do total calories so intake is the column and do some okay so if I do like this you will get for each day for 1 Jan 2024 20 150 is the overall calories second gen is 2,130 and so on right but here if you notice we don't have the column name here so if I write reset Index right if I use this now you'll notice we have date new and intake okay this will be step number one now step number two what can we do I want to compare it with the previous value right this is similar to lack function in SQL so what can we do I can write combine data of previous value right I can create a new column how can we do this I can say combine data of intake right and simply say do shift of one right this will basically pick up the previous value right and put it in a new column called combine data previous value maybe I shouldn't do it on the combine data I will do it on this calculated group right so what is this let us say aggregated data I write AG data equal to this okay so so I'm going to run this code and now I want to do it on the AG data not on the original data so I will also convert this and change it to AG data okay just observe what happens now what is this shift doing okay so for example first gen intake was 2150 there is no previous day value so obviously it is null right na is like n 2 January 21130 previous value is 2150 we are getting that here 4th January for example 2055 previous day or previous value is 1750 we are getting that here okay if you notice for 11th it's 1225 but previous day is 1,600 that is coming here which is good but previous day is 9th okay so that is missing we will still not be able to address that just go to pick up the immediate previous value of date okay for us we are just focused on 12 to 13 so basically from 13 and 12 how did it have change right so 12 was 1475 it increased to 2100 but we want to do this calculation right so what can we do I can just come here and add aggregate data and create a new column okay percent change right I can give it a name so this is going to be nothing but aggregate like want to put a bracket so aggregate data of intake value right minus aggregate data of the previous value right percentage changes the current value intake minus previous value this overall thing divided by aggregate dat of the current I mean the previous value so that will show how much it has increased from the previous day or how much it has decreased right so something like this I think there's an extra symbol here this should be good okay let's now see aggregate data what value we get okay so we getting for example on 13th right from 12th if you compare 1475 became 2100 so that's almost a 42.3 7% increase from 12 to 13 gen I increased my calorie intake by 42.3 7% okay this is how you can do this question so this is nice use of use case of the shift function right it's very similar to lag so I'm just going to add more and more lines let's go back to the questions use some function in NPI nump is another package to create a new column which which says small meal if that you know the time of the day is snack otherwise it has to say main meal right this is a very simple use case right similar to the IF function in like Excel so I'm going to go back here what I can do so this is numpy function so I can first import the package numpy and call it as NP now for this new function I can say combine data of let me say meals right new column so what should it says it the function is npw n.w so basically when the time of day right that is the column right whenever the time of the day is dou equal to snack right I have to call it small mean otherwise call it Big mean okay this is what they want us to do simple calculation like if now if I see comine data I'll have the new column okay so break first is a big meal dinner is a big meal when wherever there is snack right so if I want to quickly verify this data for snack it should show small meal right so wherever this time of day is equal to snack make sure you put the spelling right let's see so wherever is snack you see it's small meal so our logic is working right dinner breakfast and lunch will be big meal snacks is small so that is how we can do use np. whereare it comes from the nump package okay moving on to the next one rank the food so we need to do ranking which had the highest overall intake for breakfast dinner and so on right what does this question actually mean is in breakfast which was the food which had the highest total number of calories was it Dosa was it bread same way for lunch was it C rice was it Tamarind rice whatever same way for dinner was it pasta or was it something else for each time of the day right breakfast lunch dinner which food had the highest total calories how can we do this this is another interesting question so like rank food for each time of day right this there is a rank function similar to SQL in Python we will try to explore that right so I'm going to first load the data combine data and what I want to do first is I will say combine dat let us say this is ranking right so I'll give it a name rank data so I want to do combine data do group by right and I want to group two things right so based on time of and also the food right basically the item because I can eat the same item on multiple days and for this I want to do the total sum of intake right let let me do this let's see what I get okay so now let me see rank data what is there okay I'm getting like this so if I do reset Index right I will rerun this let's see what we get now now we get three columns so break first totally bread I have taken th000 calories right so for instance if I come to the data here bread is 250 per day so maybe I've taken bread four times right in this two week time period and same with so on right poha total calorie intake is 400 and poha was part of breakfast right so in breakfast Dosa is 1,600 but in dinner also some days I have taken those are that is 700 right this is how we are splitting the data now we want to give a rank right within breakfast which is the topper right so if we look at the data here can quickly see I think looks like conflix is the highest in breakast in terms of intake how do we give a ranking for that can create a column called Rank and see how this logic works so I can say rank dot Group by okay and I want to group by time of day okay so this group by is very similar to the Partition by we use in rank in SQL time of day and what do I want to rank I want to rank the intake so I'll say do rank off and here you have multiple methods okay method you see within that you have average minan Max first and dense dense is similar to dense and then we have first mean Max you can play around for now I'm going to just put first right this is one of the methods and then ascending or descending right this is like order by ascending is false right because I want to do it descending order of calories now let us see what rank data is showing right so Group by time of day is like partition each time of day separately then total the intake intake value is already total and find the rank right so within breakfast which is the highest within lunch which is the highest we will be able to find this out now let's start the query and see so notice we got the rank right so within breakfast rank one goes to F 2,000 and I think the least is seven rank seven which is vermi which is just 200 if I come to dinner for example the topper is milk like every day I have milk for sure so that is the highest then when I come to lunch C race is leading the way right more than 4,000 calories right and in snack category also there is milk so I drink milk a lot both for evening and night so milk is the Topper Again rank one right so this is how you could Partition by time of day and based on intake do the rank there's also dense rank like method equal to dense just play around with it but this is one way to Showcase okay in each time of day what was the food that had the highest amount of intake overall across this two we time period okay next one uh simple plotting question plot the time of day category and the overall calories using uh plotting function okay so for this we can just go to chat GPT and ask uh plot package M plot Li how to import suppose you don't even know the function for example it should ideally give you some sort of an answer import M plot Li as py plot right I could come here and copy that so this is plotting I'm going give the name okay now what do I want to do each time of day total calories so we have done this multiple times I'll simply say plot data this is nothing but combine data do group by right I want to group by time of dat step number one calories is now called as intake do sum right so if I see plot data it'll be like this okay now if I want to actually plot this what I can do can see pl. bar if I want to show a bar graph I can say plot data. index index is nothing but the time of of the DAT and also plot data. values okay values is the values we have so if I run this I'll be able to plot right okay I see breakfast dinner lunch is where I have the most intake right simple way to plot you can also see ways like how to add xaxis y axis and all right try with chity but this is a simple way to plot right so you import the package do the group by and then you say PLT which is nothing but this package bar want to plot a bar graph plot data. index index will be the time of day values is the calculation right the sum of intake that we did this is how you can do simple plotting one more question write a code to find rows which contain name pasta right so I think there is pasta somewhere um if I scroll down you see so you want to see the row where there is pasta how can we do that I can create a new column combine data of let's say check pasta okay you can give whatever name you want so I can say combine data of this item column right it should contain right the string should contain pasta so I'm going to say string. find of pasta and now if I run this you'll see mostly minus one right wherever it is not found wherever it is found there will be some value right so how can we try and do that so I'll say combine data of combine data of check pasta right this column wherever it's greater than zero right wherever it's not minus one let's see what we get okay there's like literally no value like that right that's a bit weird let's check let's say not equal to minus1 okay yeah okay it's it shows zero here right so it's able to find it at the zeroth place that's why you know we should not put greater than zero but should put not equal to minus one only two values right I think on 2 Jan and 7 Jan I had dinner as pasta right it was able to locate that that is that question now coming to few general questions beyond the data set little bit about loops we will see write a for Loop to display only multiples of five starting from five all the way to 45 right I want to show 5 10 15 20 so on till 45 so for that let me just add a few more lines of code so how can we do multiples of five you write a simple for Loop so you can say for I in range uh you can begin it at five take it all the way up to 45 in steps of I okay and in each of these cases you want to just print the I right print the value here you will get only till 40 right because whatever you include here it will check only till one number before that so here it will go only till 44 so before 44 the closest multiple of 5 is 40 so if you want to show till 45 you can increase this by one number now we show 5 10 15 20 all the way up to 45 right simple for Loop you can use to show the multiples of five all the way up to 45 same way for Loop to display squares of numbers from 1 to 9 so very simple I can just copy the similar formula squares of 1 129 okay so here what I have to do I have to start with the number one go all the way up to number I want to show till 9 so I will say 10 and in steps of one that is okay and print the square of the number so I will say print I into I that will show 1 4 9 16 all the way up to 81 right I into I means I square right so for each number whenever I is 1 it will show one y is 2 it will show 2 into two which is four and so on since I put 10 here it will go all the way up to I equal to 9 when I equal to 9 show 9 into 9 that is 81 we get that right then one more question write a simple code to declare a variable a as 8 and write an if statement to say if the value is greater than eight or not so this is if statement so they say declare a as8 and I will write so if a is greater than it then I will say print is above it okay simple and then I can put else condition so else what should I print print a is not above it okay now let me execute this code and it will say a is not above eight as you know a greater than 8 condition won't work the else condition is working right if I put a is six again a is not greater than 8 so the else condition will work a is not above eight okay okay if I put a as 13 then if a greater than 8 condition is true a is greater than 8 so it should show a is above 8 right so you can play around this is how you can write a simple if and else make sure this I and E are aligned in the same line that is very very important okay one final question declare a simple list with B with four values 0 1 2 3 write a for Loop using try and accept for running I values from 0 to 6 Whenever there is no value print value on available what does this even mean right looks bit complicated let's see right so this is a list question so we they ask us to create a list B with four values 0 1 2 and three okay step number one so I'm going to execute this okay now within a list if I want to read an element if I say B of Zer it will read the first element that is zero if I say B of one it will read the second element what is second element it is one right if I say B of two it will read the third element that is two right that is how a list works right L index start from zero now they want to show us these values and run it in a for Loop for I ranging from 0 to 6 okay so you notice if I write B of three it will show the fourth value that is three but if I write B of four it will show index error right because B of four means the fifth value in B B has only four values there is no fifth value so it shows index error right so we have to be careful just going to change this to something else now if I write for right for I in range they want from 0 to six I'm just going to put like this right and they want us to print the value within B right so I'll say print B of I if you notice here so when I is Zer B of 0 is zero right I is 1 B of 1 is 1 I is 2 B of 2 is two i is 3 B of 3 is three but when I becomes Four B of four is not there it's an index error that that is what is showing up and they are saying use try and accept statement to remove the index error and say when there is no value present print value unavailable how can we do this so for this we have to add this in a try and accept statement so first you will say try right wherever there is no error in those cases the for Loop should work right make sure they are aligned step number one so I'm printing BFI right in the cases where there is a problem where is there is an index error you can say accept index error right wherever there's index error in those cases can simply print what they're saying value unavailable or something like that okay now let's see what we get now we we get value unavailable right so 0 1 2 3 is coming that is great but when I becomes Four B of four is not there since value is not there it shows Val unavailable this is better than showing an error so you can use try and accept for these cases where there's some specific type of error like index error okay so what I will be doing is I will uh share the link to this whole notebook right you can play around with the code I will also upload the data sets please upload them as CSV and then use this code to start reading them okay I hope you enjoyed the video I'll see you again in another video till then take care bye
Info
Channel: freeCodeCamp.org
Views: 134,712
Rating: undefined out of 5
Keywords:
Id: NnSIKA77pD8
Channel Id: undefined
Length: 177min 10sec (10630 seconds)
Published: Mon Mar 25 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.