SAS Training | SAS Tutorial | Intellipaat

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to the session by Intellipaat so in this session we'll be learning about SAS comprehensively so SAS is also called statistical analysis system and it is widely used in the field of data science also guys before moving on to this session please subscribe to our channel so that you don't miss our upcoming videos now let's take a quick glance at the agenda so we'll start off with what is SAS and understand where exactly it is used moving on will be looking into SAS applications and after that we'll be looking into the program structure used in SAS and finally we'll be doing a demo in SAS as well so that you can understand the concepts better also guys if you are looking for an end-to-end SAS for data science course we at Intellipaat provide you that and you can check those details in the description now let's begin with this session so what a SAS it is an analytical tool so SAS basically stands for statistical analysis system and with the help of SAS we can perform variety of analytical operations like time series analysis predictive modeling and data management SAS is also visualization tool with the help of SAS we can create beautiful graphs and also build stunning dashboards to represent or analyze data so now that we've understood what exactly is SAS let's go ahead and look at some of its applications SAS has wide applications when it comes to the finance sector it is used in calculating credit risk on loans given by buyings credit unions and other FinTech companies SAS can also be used for fraud prevention by continually monitoring transactions and applying behavioral analytics which enables real-time decision-making now SAS is used in the healthcare sector to identify potential issues before they become a reality by analyzing diverse data sources to predict and medically investigate patient safety signals it is also used to gain a more comprehensive view of patient care across a variety of conditions and procedures by analyzing huge volumes of structured and unstructured clinical data SAS is also used in the automotive industry for tasks such as warranty claims and let and service parts optimization now we'll go ahead and look at the SAS programming structure so any SAS program basically comprises of these two parts data step and proc step the data step is used to create and manage data while proc step is used to implement different procedures for analyzing and visualizing the data right well head on to a demo part now and we'll be using the SAS University addition to implement a demo so let's go ahead and open up a SAS University edition right so I'll be using data step to create my data set which will be student so this is the keyword video data with which we can create our dataset and I've given the name of the data set to be student and we need to end every statement with a semicolon right now I'll go ahead and input some of the columns for the data set so I'll see a student has a name and he has marks in three subjects so let's say subject 1 subject 2 an object 3 and the student also has a gender so these are the columns for the student table now I'll go ahead and give the values for this data set so I'll be using the data lines in word now I will give the values so let's say the four student is Sam and these are his marks and his gender is male next student as Ann and her mark are these and gender is female next in line is Julia who scores 19 for subject and 12 and 12 in the next subject a gender is female the next student is let's say Bob who scores 50 respectively and all of the three subjects and has gender with male next we have chef who scores let's say 78 in for subject 24 in second subject and 1 in the third subject and his gender is male the final student is my who scores let's say 19 for subject 1 in second subject and the 50 in favorite subject and it's gender is male now we'll go ahead and give a semicolon so that we can signify that these are just two values which are supposed to be included in the data set and then we'll go ahead and run this the let's do that now what we see over here is we just get the marks in the three subjects but these two columns we have dot says there are no wall news for this right so the name column and the gender column we have just dots and why is that happening and that is because we have not specifically told sas that these two columns are of character type so we need to follow the name of the column by a dollar symbol who tells us that these are actually character columns right and now let us run this and let's have a look at the result right so now we see that we get the values in the name and gender column as well right so this was a data step with which we were able to create the student data set now we'll go ahead and implement the proc step so the first proc step or the first procedure will be the print procedure with which we will be printing the data set so I'll say proc print data is equal to spooling and then I'll run this right so this is a data set which we have printed with the proc step now we'll go ahead and the look at Proc SQL so proc SQL basically enables us to implement SQL like commands and SAS so let's say I would want to have a look at all of the columns and rows of the dataset and I'll be using the Select star command from SQL so I'll just type out select star from Student and I'll see wait right so this command which you see over you select star from student this enables us to select all of the rows and columns from the data set let's go ahead and run this and we get the same result so we get the entire table comprising of all the columns and all the rows now what we'll do is we'll use proc SQL to find out maximum marks minimum marks and average marks in all of the three subjects so let's do that so I'll see Select max of top 1 as highest underscore sub-one max of sub 2 as highest underscore sub do max of sub three as highest underscore sub tree then we'll say from student that is were selecting the maximum marks from subject 1 subject to and subject 3 from the student table and when aiming those columns as highest sub 1 is up to higher sub 3 so let's go ahead and execute this right so the highest marks code by any student and subject 1 is 90 and the highest marks code by any student and subject to or 78 and then subject 3 here is 89 now similarly we will go ahead and calculate the minimum marks code by a student so I'll say minimum of sub 1 and I'll change the column name to be lowest of sub 1 again minimum of sub 2 as the lowest of sub 2 and minimum of sub three lowest underscore subtree from student again let's run this right so the lowest marks in subject one is 34 and then subject 2 and subject 3 is 1 now let's calculate the average marks so I'll change this and say Seect mean of sub 1 as Average underscore sub 1 mean of sub-2 as average underscore sub 2 and mean of sub-3 as Average underscores sub three from student let's run this now right to the average score in subject one is 68 average score in subject to was 24 and average score in subject 3 is 35 now what we'll do is I will go ahead and implement a summer SQL statement now what I want to do is I want to see all of those rows separated by male and female students so I'll individually do that so I'll do Select star from student where gender is equal to male let me run this so now I have the data set which comprised of only the male students right so I have four male students over here similarly if I just want the female students from the entire data set I will change the gender to be female let me run this now this is a data set where we only have female students and we have two female students over here right now we'll go ahead and find out the total of all of the three subjects so let's go ahead and do that so I'll say select star and I'll add a new column which will be sub one plus sub 2 plus sub 3 that is I'm adding the marks of all of these three subjects and I'm creating a new column and I'll name that column as total and I need this from the student data set right let's run this now so we have a new data set over here which comprised of one extra column which gives us the total in all of these three subjects and with this we can find out that Sam is the topper of the class right so this was an implementation of proc SQL now we'll go ahead and create a new program that will understand looping and SAS and we'll work with the do index loop so for this we'll be creating a factorial program so a factorial is where you're basically trying to multiply one particular integer with all of the integers below it so let's say if we would want to find out five factorial then we'll multiply 5 with four three two and one and the result would be 120 or else 5 into 4 into 3 into 2 and 1 will be 120 so let's go ahead and create a factorial program where we'll get the factorials for the first 10 numbers I'll use a data step and create a new data set I'll name this be a I'll create a new way it will name it as factorial and give the value of one now I'll use it to index loop and I'll type out do I is equal to 1 to 10 tell us this loop starts from 1 and goes till 10 and inside the loop what we are going to do is we'll multiply factorial with the value of 5 right so initially the value of factorial is 1 so 1 into 1 is 1 then after the incrementation value of I becomes 2 so it becomes 2 into 1 then again after incrementation value of I becomes 3 then will become 1 into 2 into 3 and after again and incrementation it is 4 so it'll be 1 into 2 into 3 into 4 and we'll output all of those steps right and then finally we'll end the loop so let's go ahead and run this this actually needs to be i not 1 so let's run this now right so we get our factorial data set over here so one factorial is 1 2 factorial is 2 3 factorial that is 3 into 2 into 1 is 6 4 factorial which is 4 into 3 into 2 into 1 is 24 and that is how we get the entire factorial table for the first 10 members and this was an implementation of to index loop in sass now let's head on and import a new dataset into sass and perform some sort of visualization on it so I need to upload a file so I'll be collecting the empty cars data set I'll upload this now I need to import this so let me go ahead and do that let me select the file that I've just imported so i'll open this now over here I get the code to import this file so let me go ahead and copy all of this and I'll open a new program and I'll paste this over here so this is the path where my file is stored and this is the type of the file and over here I will give the name for the file so now I'll change the code to car because that is what I want the name of the data set to be let me go ahead and run this so we've successfully imported the data set now let me use the proc step to have a look at the data set so I will see proc print data is equal to work dot car because this data set is stored in the work folder let me run this now so this is a data set I said where we have columns such as mpg the number of cylinders displacement horsepower Dedra wait Cusick yes it basically stands further type of the engine it v shape or Straight then we have the type of transmission in automatic or manual then we are the number of girls and carburetors right now since we have the data set with us what we'll do is so we'll go ahead and do some visualization with the help of SG plot procedure so I'll see probe as cheap load and I'll give the dataset which is work toid card and let's say I'll create a scatter plot and the scatter plot will give me the distribution between mpg of the car and horsepower of the car so for this the command S Carter X is equal to MPG the less mpg is mapped onto the X column and Y is equal to H P that is H P is mapped onto a y column now let me go ahead and run this right so we get our graph over here and this is a scatter plot and what you basically understand from this graph s there is an inverse relationship between these two variables not us as mpg increases horsepower decreases and this was an implementation of SG Plot in SAS okay guys a quick info if you are looking for an end-to-end training in SAS for data science we are Intellipaat provide you that course and you can check those details in the description now let's continue with this session all right now the libraries let me tell you that libraries these are the libraries and this is nothing but the tool new data sets okay so if I I have shown you that these are the default which are created into the system and so the libraries the libraries which are created keep able to see them by default but in future suppose that if you want to create any library and you want to restore some retaining definitely you can do it so let's read why we need library and what is the significance of librarians but they have given an example here that the rahul guy the fellow he stores all his photos in his whole career under his my picture folder as he has falling folder okay then he has created various folders under rahul or my picture folder by graduation home relatives ooty trips college year one college year 2 college year three and college year of four graduation who relative so these other base folders which he has created and you know for various confusion in stores pictures in low separate folder so why does he arrange these photos in this manner the use of grouping them in this way people can understand easily that yes it would be easier for him to search that when whenever suppose that he need a picture from Ooty trip he can directly go to the Ooty trip folder all right so arranging photos semantically reduces you know the time or when you are going to send those photos in the future but how does be in the scenario you stole all the pictures into one folder right suppose that I have put a create a folder called Ramu and inside of this folder I'm just going to put every whenever I go and click with my camera I put that picture in the same folder so you will not be able to figure out that which picture okay then right this picture belong to the ooty trip with picture belong to the college graduation quality for right so do not propose to get into this situation it's better to create a particular hierarchy or a folder then you can install your pictures as it does the same for you you have libraries option over there related you k now you can create n libraries and SAS and you can install you know that type of data into that only right so if you want I can do I can create the library here with the health care domain I will store all the health caredara in to that I can create a form of liability and I can store all the pharma datasets into that I can create medicine library and I get you know sale on the list of medicines and details about them into that library it's up to you and how you wanna arrange you data sets you can create a number of libraries into the cell but again the processing and the storage of this lab that is will depend on the system where yours is installed so let us suppose in case you have you know when investing like memory in your system but you are storing a lot of data definitely the performance of the user so it's better to look at the remedies based in your system and then store the data into these libraries because ultimately whether slash tool is very fast but it has a limitation if there is a limitation you know in your system where the system is installed right so it will work in coordinate system performance so keeping that in mind you can install later all right so again it's telling you about that there are by default class really SAS helps as user and one which you get on your system let me tell you an important point and important question which which normally mean as we do interviews so basically when you open one this unit ready anybody want time you do not really need a set in the garage ready you do get something ready and you get a lot of data base data sets into SAS help library so my man showing all these libraries my bed motto is see you might have work the sequel server tab three tables ask what basically happens if you do not mention anything in your data set that by default goes to the work line variance get saved to work library and openness you will not move that into any permanent liability community language interpreter that will be first of all to tell you that which library is permanent and which is temporary oh but the other is called internal operation right and SAS help answers user and any library like Sofia or these maps or whatever I'm reading you are going to clean will be the formatted clattering so now how it makes a difference the library whether it is a permanent or temporary so anything which you will save in the work library will be temporary right when I set em reading it as soon as you are going to close the SAS environment that data set will be vanished we won't be able to recover that again right so another thing suppose that you have created a little and you have not mentioned any other library so it will by default go to save go and save it in to see that in the work library but if you want to save that into any other library then you will have to mention the lab that evening I will show that how to mention library the name before dataset but right now this is maximum or I would say important to know where libraries in cells are temporary and permanent work library its temporary lab ready Venice as help as usual our cells define permanent libraries and everything else whatever you create is a user-defined permanent library and how this is going to you know this matters to you working with live data sets or you know creating programs it will automatically be you know it will be automatically has the script to you that how this is going to make a sense or make a difference for you but in a one-liner before that a library is nothing but the collection of data sets or a reference to the location where you alright so that's all and let's move to the next page Don so as soon as you noticed as you can see these menus we have already gone through all right now it says what is a SAS program so SAS program I told you that if I'm gonna write Jose has programs into SAS window which is determined oh okay a SAS program is a sequence of steps that the user submits for execution data steps are used to create fast data sets you she didn't say bigger steps are used to Cleveland I will show you that what is a roster and then you have Rock step Rock step is to process data when I say process it could be that you are sorting that it could be that you are going to find out a report out of that data set going to you know summarize that data set so there are n number of manipulation which you can meet with that data and if you want to see the output in on window then you use rock rock steps so basically a SAS program has two steps on two parts first one is data and another in steps will we know we will be writing a SAS program I will tell you that how to write a program in couple of minute and then they you can see how we distinguish these both parts whether it is a data part or the steps or it is a proc step now key word statement step and program consider this is a hierarchy that we have key words we combined in those key words that we make statements and then we you know work with run off statements we treat stem stem Smith with a proc step or the Duster and the combination of proc step or data sent me all right now example of keyword so keywords like in file input data by prom so these are the keywords and you will you know get to know more and many more data keywords keywords statement statement without sorry this cleaver starts with the table and ends with the same wall so this is very and really you need to know that every statement which you are working with in size is going to them any caller if you are not using semicolon at the end of your statement or any statement it's going to give you ever right so for all the programmers whether you are new or old make an habit to use any columns at the end of your statement now step-step start with data or drop as we have already discussed that there are two steps in a SAS program data and talk so each step scott eater with data or talk and ends with the step boundary what his stead boundary will discuss in detail okay so here is the question the transfer quiz how many keywords are there in the following so as I already said that keyword could be in file output data and by similarly you have a lot of key words here like data we have run yeah paw print theta where run from sort data then by and then the same thing Rob means theta that and these all are the keywords steps how many steps are there so you need not to do anything just count all the semicolons in this program because I said that each statement ends with a semicolon we can count statements and semicolons we will find the number of statements so like 1 2 3 4 5 7 8 9 10 11 so there are 11 statements and how many steps are there in the program it starts with either data or drop right to count all the data steps and all the clocks now how many steps are there one this to this tree so there are four steps in this program 11 statements and n number of and then when I say and number you can count all the key words which are which were there right and will give you answer the key words 10 the from that run drop by drop where and run these are the keywords eleven statements and four steps one two three four four steps alright everything is clear enough all right so everything is good to go then let getting move ahead let's step probably the word we just use that ends with the state boundary that each step and within the boundary so step out three what is the boundary let's take an example that what is the boundary they have you know shown a beautiful map over there and there are various states now the example which then we are going to make it just read it it says it is like a border of two things you can see if in Tamil Nadu we have one side we have Karnataka one side we have in her and one side we have Andhra Pradesh so left side our border of one state of right another state so it step out in like within two steps but basically if if you talk about this man then they step on this is nothing but the line between two states that is what that is what we call you know this step boundary and similarly when we are working with program we have state boundaries between two steps which distinguishes that Mike has been completed now I'll be working with the next step all right so step up reading assess program that says detect the end of a step when it encounters one of the following a run statement a quick statement for the beginning of another step is what is the beginning of another step either you get a data keyword or a pro-gamer so if these three things you are the sass encounters it is starts another step and so on very important thing the program statements it is not because other users you should be you should be you should understand so it's good to you every step begins with either dog or data so if you want to find out the program you just count these lines and one okay because as soon as he ended and you program then only you can see this is the boundary so you can have to find out the number of steps in a program you can add one to number state boundaries so this is so this is how it will look like to you now let me just show you that but the editor that works fast draw the line for each step boundary the program we see here has these two boundaries all right please stay bound green and has four steps so data from run one step talk to them another step Prague this is another step and then again doctora this is another step so there are four steps and three step boundaries all right now let's move how many steps are there in this code so if you will just read it and you will automatically know that how many steps are there to know how many steps this read either data or draw so one two there are two steps there is only one I would say there is only one step boundary certain as two steps any covered since s so why using sass you can add comments anywhere in the program anything other than the comment will not be processed by itself so the use judicially suet that everyone looks at the code can understand what is the purpose of the pool sometimes even the author of the whole will benefit from going what is type C so one basically had its language in this particular platform but each programming language have commenting commenting facility so you can you know comment your codes so that next time you go through that code you can understand what is written in it and if anybody who is new to the system and if he goes to that you know code he will understand that what exactly that code is talking about so in order to use this one I will end by sticking with semicolon I know that this is going to bring some you know bigger set baseball but if I bike like this a strict this is going to baseball data set all right you can see that I have put a comment here and it tells me that this is going to cream the base for dataset but when I do run this program it will run this program it will bring baseball data set but the program is not going to read this line first one because it is commented ok this big code line will not be compiled it will not be executed so that is when then you may need a printers for information for the user or who is reading their program now this is a single line comment which I have just made if you want you can make button and comment and for that you can use slash star and you can taste the same then I will add my card here you can see this is a multi-line comment and the system is not going to read these comments I can come in through them like this or if you want I can show you how to run the complete have selected all the rows again I will get only mine nothing else apart from that so now these are the results from there is already know if you want to go to the last one result you can click here like this if you want to go to the first result you can click here so this is how you maybe get with that one this will go from one system to another ok guys a quick info if you're looking for an end-to-end training in SAS for data science we had in the LeapPad provide you that course and you can take those details in the description now let's continue with this session ok can be single line or multiple engine so I am just showing you that it could be multiliner or single under a spell these are the examples they have given here that data hello hello is equal to good morning SAS user how are you today and then run so this is how you create let me explain this dataset for you let me say hello let me say there's an hello is equal to carnage you okay I'll end my statement semicolon and then I will sever so this is what B does that I have created I couldn't walk when though I can see the data set one don't have you had one of the mission where variable it was created successfully I'll go back to my program again print veto attorney I'll see that was putting out the window it says hello how are you all right now the initial thing which I do want that here I have not mentioned any library before my data set name in this is Jake when you can see that I have mentioned SAS health dot this one so I'm deferring to this lambda T but in this statement I have not given a dilapidated name so as I told that if I will not mention any light reading it will by default go to the UH cloud ready and save it over there so we can check that now what we are reading and this is the hello dataset here it's on or double click on it you can see hello one how are you alright so this is how you need an S get created and you submit your program in this way but as soon as I'm actually I will you know close this data set that environment I will not get this lipids coffee a sandwich clothes get up close the sass environment and open it again and go back to member klair bedding nothing is there so that's why I said but if you want to say anything prominently so save it into a permanent library not in the more gravity but normally suppose that when we are working and in any process in intermediate think we create a lot of you know datasets then you definitely save them in work because and then another program we don't need those datasets now the current folder so current folder is nothing but it gives you the location then you know populations are being taken care so all the operations be taking care so in the SAS atomic you can see it be users s model so this tells you that this is the current folder there you operations are being taken care so this is nothing if you want to can just change it anything else and you can click OK it will change right so let's move some little program that either you can click Fe or keep on running then you can submit your program so we have you know seen all things that have to write a program this is the basic program which I have written and half-savage in that so it's going to help you a lot so what you can do is just you know work with some of the basic program just try to get familiar with that long window though tracking everything through the help windows options from the help then explorer and dessert try to get familiar with all these these which you see here the menu bar in the menu reader so once you get familiar we can go ahead in our next class and will tell you that how do you create libraries how to be bigger functions on all for today follow that thank you so much we will be working on various techniques to import export prop to copy data or to create data set into SAS and then like we'll be working with a lot of datasets and files satisfied and after that we will work with a while work VD and sub settings as datasets so visually let me tell you that SAS SAS is is just statistical analysis software or system so this was you know it started by an agricultural Institute in California and in 1961 and they developed this tooling for their analysis and then after that in Cortino so famous so that every I mean most of the organization started using SAS for their analytical work and basically FDA you must be you know you must have heard of FDA the Food and Drug apart of America right so that the users SAS for every you know sample whatever they receive for a minute EC so they've borne the data regarding a every medicine in SAS so it is a you know well known and well known tool for analysis services our analysis practice people have started using it and you know in most of the health organizations or people who are involve in a statistical analysis they use fast as a primary tool there are other tools in the market like you can say are and then we have SPSS or other tools as well in the market which are used for the same purpose but SAS is the mostly used to across the industry across the world so the confidence of this piece has I do believe that the SAS software which we get the window visual unit right now this is visibly on SAS piece such as these plank initiative the software is quartz as there is but again the material of the course which we go through right so that is also known as SAS based in the market but when we talk about macros and rocks equal so we we say that those topics command post as exams course so that is what the course categorization is that but basically the commands or the cell death program which we write into the piece of software coming to the base has course curriculum but drops equal and SAS macros they come into it burns us let me tell you that apart from this component the SAS will do is you see apart from this a staff has provided you other GUI version where and you can interact and that is called as SAS Enterprise guide so let me open that first if you go to the SAS folder here I have SAS Enterprise guide version 6.1 say if I click that it is again a SAS tool right so either you can have this path or you can work on this SAS Enterprise guide as well you will be no working on the same same thing but what you want SAS Enterprise guide is user friendly I mean you will very easily the SAS so cool this you saw here it is on your local right so whatever datasets which you agreed and that gets saved into your local system but when you when you are losing this pass into private grants and you have advantages that you can access the data available on servers aspect you can create google databases and then as you can interact with the data sets located on servers so the right here in the server list you can see that servers then private overlap servers okay so all that is basically VPN overlap you must have heard of this is online analytical profit of all or processing so there are databases which are specifically used for analytical things services or analytical processing so you you know you access those databases from this server list you are not connected to any server that is why you are not getting them here but if I have on this plus sign you will get local so local is nothing but my just - whatever databases this is also the data system located on my system those will be replicated but right now I'm not connected to anything I have not even any local data with data sets here so you might not see them but like you know main after some time I'll show you that how you know then will be no interacting or we can even navigate to them you can see that I have got all the libraries from my system and that is one you get a lot of folders and if I try to navigate from this folder to other datasets and definitely will get some of them so if you want me just give me a minute I'll show you that how to navigate to the data sets you appear it on my system so I have the one data set on my desktop so I go to my next all like here you we disagree instead of it so if this is you genius TxDOT sans 7b - so all the pants which you see right now are from my desktop so this is the way how you can navigate reduce us as Enterprise guide for any data set which you want to look at the bottom you can see that data set will open it in the SAS environment like this so this is also a mini you know used on very extensively in the market chance Enterprise grind it's nothing but you can see the client version of the SAS based software which we use for our day-to-day work so I think that's enough for the SAS introduction on the tools available for this now and I initially said that what I wanted to cover today so let's go ahead and I'll be starting with the topic which we have mentioned product itself various techniques to import and export so basically the first topic which we have for today's draw coffee from coffee the procedure using which TDD does it can be copied to an output libraries from and input right reading so you must be remember you must have a member then what do I play D is actually that live ad is nothing but the reference to a location where you save unit assets so in SAS environment I did show you that these are the libraries when you must have some data in it like this these are the datasets to do the SAS flight ready on the pine told you the difference between the world lady and this has helped our SAS you celebrating that in what they are ready buddy would you say that will you know get famished as soon as you close your session right now there was another lambda G this SAT by in which I created so whatever theta said I'm going to you know save here that willing before my preference so no when we are talk about this prop copy so let me just tell you that what this does it says that proc of B is a precision so I told you them there are two parts of SAS program first one is the data part and and that is Prague so this is a Prague part and which is used to copy any data sets from money like treaty to Canada this index is given below so now it says that for example following good copies to datasets video any message and alias message from staff helping to the work updating now since an Excel workbook can be used as a stash library after the appropriately a pin statement one can use drop copy you will read from feet all right to from or do Excel workbook right so in Excel workbooks like selling book in nothing but a type of database you can see having the shields in it worksheets in it if there is a later so is that they done the eaves like as a database or to excel workbook behaves like as a database so this is how you know you can you know even treat an excel file as database and you can import or export the data into that as well so the next thing which we are going to do is in Odense poor vision so then import wizard enables you to read data from an external data source and write it to the SAS data set so basically if you look into your sass window the first option is file and here and here you have options like import data and export so these things are usually used to you can use to import data into SAS or export data from the SAS right so let me show you the further options do not let me create a five-year so that I can show you then how to import data export data from that so I'll be ready to testify here test text and I will put some data into it okay so who let me insert some data the first column is me then I have ID here - I believe in such danger then ID as I will give 10 next line I will get a brown then idea idyllic cloud then I will take let's say John the idea of a utility then let's say I will take Jeff and say 20 so this is the data which I have created I seal this high on my desktop I mean I will go back to the system SAS system so here you can see that I have a list of a lot of sources as the type of sources from which I can you know input data into the SAS environment so we have excel file in Microsoft Excel workbook the metrics of taxes database members of taxes were look on PC file server then access comma separated values tab millimeter delimited files so these are the various type of various boxes from which you can even import data into SAS environment what I'm going to do now is anything and I have taken tab delimited file so I will select tab-delimited file and hit next so it says where is the file located and I will have to browse that file so that's on my desktop annandsteve was test so I will browse to the test file like this big next number member is you know or nothing but a type of databases it says that what type of database the data set is so if I am into the work library have the list of these data sets basically member is telling you to append onedah suppose the review if you already have a dataset into your library then you get up and you do that any existing data data sets so right now let's say I want to create a new dataset so I have to give Indian okay and whatever the code like what I mean code right now I'm just using your expertise are so if you're using import/export resorts as automatically user cindex for this for you okay so let me provide that with the name of hello and if I browse it so let's see i browse it to my desktop and the name I will give it to sass hello okay and I can click on finish so as far as I am don't finish you can go you can go back to the log window and you can see that my started from here it tells you the external file interface has developed codec stop code then it has given some it's good and you can just read then read the notes into the blue line that inside see you this ask you the extra X of X is the file name and it has given other things then when you last modified and created for the course were read from this file the minimum the continent was six annex in the quadrant was far so it has created a new data that will work library has four of the mission and two variables so if we can google you can see you this is the data set which God created here if I click on it you can see that these are the records which were there in the text so by using import reserve have this imported the data from SAS from x-file Kunis has environment now you remember that I told you that whatever operation on doing here whether they follow of SAS creates a code for syntax for that so which I still do the SAS window so let's see if I'm a fitness you can see that this is the program which task has created for us so if next time you want to import the data from that X file you can directly run this code so it says proc import out work dot lim data file this is a link to that file tab delimited build name yes get them yes is visible for the headers if i say no that will not be that so let us do the quest info and if i have done it again so basically what it has done is it has because we already had that data leader said Lou yes I will say you do not want so if we create a new data set for you and you can see it does not have some headers it says that one that whatever one variability so that yes or no is nothing but it specifies the headers whether you won't be put into your SAS data set or not right so this is the code creation by Hassan which which is the reusable you can use it anytime to import data from the text file okay now before I do anything with the excel file let me tell you that there is some component missing for the excel file which we can use to import data into the SAS environment but the same I can show you into the Interbrand guide what I mean is I am going to import some data from a text file through the SAS environment right so I'll go here having poor data click on import data now it gave me this vision so this year I have some heights test one test so I'll select test one and I'll click open so you can see that it going to a CD it has picked the test went fine from this location and it's just going to create a dataset into the work library called test one if I click Next opening suspense this exactly how each yellow operation works CJ exception transformation I am rotating so again you have option if you want to you know see the code which is sad going to create for you if you click embedded then it will create a course for you right here and if you do not select it it will not create that so basically you if you are working using Enterprise guide or you are using sass both are usable that's also good it's good for you and enterprise get can also so this is the data which I have just imported from the SAS imported from the excel file right and if you really want to see it then how this looks like you can go here into the work library which we had let me close it local - oh sorry like ready click on library I have work library here you can see that this the text data which is got created here but I as I told that work library all the data sets which are Eleonora attending our station is specific so whatever data set which we have created here that is only for this session and whatever data sets which I have created here do the specific to this work library I don't CC test here and you do not see these data sets into this session so both are separate session but Arda specific to the work library as soon as these sessions or the system will be closed but these data sets will also vanish they will not be this existence of these things all right so let's go to the PVD once again so this was the import wizard which we just have used into SAS Enterprise and then we use it to the SAS environment it says that steps we have just followed you can you know Google through them once again similarly export to export any data set from the SAS let's see this is the data set which I'm going to export I'll have to go to export data then from wish lab ready you want to become bleed on weather from Sofia from work absent selected work then all the data sets it will show from the work library so I have a let's suppose I have selected by okay it says that ripe variable labels as column name so if I have selected it click on next these are the things the type in which you want to export your data let's see let me select Excel but I know that X will be some error or my sister it will ask me to enough select any workbook which is located on my system so that we create a textbook here text fine have created on one okay it's not in a degree so as I already told that there might be some issue with the that is might some issue with the Excel and SAS on my system let me create a text Feldon all right so these are the files which have been created the version was a different side so I have selected one excel file and click Save or append you got both options if all it is is defined already there's an Indian you'll either replace or append so I'll say append and if I click ok so it is exploring some data into it let's see give it a minute and then we will go back and check that so this is exactly what I was saying is that there is some problem or using pixel and component of Excel is missing for SAS so I'll export the data into text factory right so let me select a tab delimited file actually yes I'll save the file in do my next job and their injury path I'll select the text file say and I'm sure that append is not supported we want to release cocaine and finish go back to the location okay this was the excel file which we created initially you it has a lot of padding that is why it takes time that's given to me you all right it does not have any data into the excel file but I remember that I used I created a text file so let's see where that test file is okay this was done by Brady these are the data sets which I have here okay I'm not sure that what happened with it but let me try again and export some other digas heads pull that the fine export data from where they are daily the air next I'll select tab-delimited so they tap the limited next I want to save my file into desktop until it bad at the limit let me give it to name just fine let's select this okay you place links this is a place where I'm giving me to save the code let's say I'm same again on desktop and I will replace because I know that already there is that okay now if I go back into the location and check for the file so you can see that we have a fine here and it has some data so this is same data for that yet it doesn't we have exported it around here data set from the work library to this particular text and the pool which we saved here that is this so if I go back okay so this is the data only let me create a new code for it for you so all right if I do it once again exclude data work next tab delimited this is what we exactly used where I'm going to save the file and save the file is good same location like this to kill the place next if one statement to be generated okay I did select it hello okay I did not mention the location that they replace and finish so if you go back and check into this video file you will give the boot to exclude data so the first code which you see that was for the import this is rocking boat and then here you have drop export so this is how you know what you can do is you can create or as as we have created the code for sassing photonics wooden you can further use these codes or you can modify this as per your requirement and you can use it further right now we are using text file but you can work with the exhale access SPSS file or other files happen where you can work with you did I'll show you the same thing from the enterprise then how can you exclude your data so this is the data which I have s1 I will go here and click on export it says that export data imported from the things don't Excel so let's see to see in who many want to see in the data so I will go to Intel it back and read in this excel file wherein I want to save the data once again so if I go back to the helipad location so this is the data sent with God is exported to do the stance format if you want to change the format you can anytime do that then so these are the biggest we depend in both an export data from SAS to you look now let's move to the next thing that's reading and some setting the data set so listen if this is the part where we will not have the demanding things asked since equally when we can in or threat of the data we can create subsets of the data X so the similar kind of activities they are going to do here so it says that we can create a dataset using data stored in another data set so let's suppose we have our data set into one library or let us say that you should be half theta into just help run update to be happier who became your subset sum of the data from any of these data sets and they we can another data set so the syntax for that would be nothing but C as you can see here that data element 1 dot be your f 1 is a library reference then B does it mean is B and the next row you are using set heel impress 2.8 so here is a reference of the second library and their data set is and then so what is going to do is you are creating a new data set or libris 1 dot B and you are putting all the data from laterally to of the data set E right so our code will create a data set B with the data and descriptor portion return to the data set key libris 1 and limited who can be seen or defense thank the press is just that library reference which you want to do and they can be temporary or permanent when I set them pretty it could be working already prominent it is your they final exit so earlier we have seen that data statement with the data and the function of the data statement it gives the name offer new data and being created known if they just is the C file that they just achilles the existing it will be overwritten be careful so that is the thing to you know keep in mind that if he lifts this one dot B is already existing and you are running this code again so the previous data set on the initial data set will get replaced okay overwritten that paint over it so let me go back to my it's a script and I will write okay let's say I'm going to write work where I want to see if my data at all I have given you the desert let's see what happens now what I want to set it as I want that it should have the same they'd all the same descriptor portion as this data set which is the SAS help library we need to mentions deliverance and yes a CD this is the exact name which I've given prop for the data extract ironic let me go back to log it said that they were 123 also we should read from the data set this if I go back and check into the dataset oh sorry this system if I go back and check into my work line reading so you can see that new data set has been created it would have the same data same descriptor portion as SAS health got this gap or data set so you can see that this was the proc system when we manipulated data here we have just created and linked data set so these are the two programs which are in the part of a SAS program one is the data step and another is the frost so this is how you know we use our sister ok now let me go back all right now the said statement this is what exactly we have just done that set statement reads observation from the SAS data set for further processing in up they dust by default assets typically leaves all observation and all variables come back into beta set the safe statement can read their value or drama and they doesn't the terminology from this slide onwards till the end of base SAS we will refer to the data set given in the SAS statement as output dataset and the data's and given the same statement as in the dataset right ok let's move to the next slide so here what it says the things to remember are so while you are creating the datasets using set statement what are the things which you need to keep in mind so that you should not go haywire right any case of in case you interchange the line tool and if your output data set exists then the input dataset will be over at understand when I say it if I use input data set here and all data set over there so as soon as you will interchange your existing data set will vanish so you need to you know keep it in mind and you need to be aware that what exactly you are going to use so then you can you know use it you know better then in case when you do some mistake if put data set does not exist then improve data set will be overwritten with general data set ok in case you overwrite any dataset in SAS help you might get an error message or worse you might lose the data set in value because that is you know by default aegisub given by the SAS and if you want that then definitely you will lose it and you cannot get it the less you will not download from Google or SAS again in some cases you might have to the install SAS else it might not launch so this is one of the possibility which can happen in the worst condition now please decide beforehand which is the input data set and which is the output till you master SAS the synth I don't want you to you learn you learning this the hard way so be aware that what we are going to do and first of all keep in mind that what exactly you want to do and then do that so it put it as it consumes 60 feet and must exist the second line and output dataset will be the new dataset which you want to create if it is already existing it will be overwritten as soon as you will run these statements all right so the next slide it says that greater that because SAS help dot o in the world librarian so this is for you as an assignment do that and let's see if you are able to do it or not right keep in mind that you are copying a data or you are taking input dataset from a shelter ourselves and you are creating a new dataset in the yogic lactic alright now when we talked about the sub setting of data sets so here is one bear statement so see how you wanna screen a sub set for your data stream so where statement sub set observation that gives a particular condition general form of a statement where an expression this is the one language you need to put it into your SAS program now main expression in the sequence of operands and operators opens with your constant and variables operator symbols that request a comparison automatic calculation or logical operators it could be your hand or so these things you can use for you you know operators know then submitting means rendering observational creating a small intercept form of the desert now consider the data set cells we have discrete it so we have created an over selves data set from the were clacking okay now what we have are we are going to do we are going to create a subset where Laura Klein is equal to children so this is a example what they have let us see that didn't we have our sails into the SAS but not for us so that we can sense all our sales I will not go through the communities what I'm going to do is I will check with our half that they descend into here on the right rock print beta dot o R to put ourselves right if I will have data here than it is printed for me the mighty big event okay so we have this data set into a little dome now what I'm going to do here is I will create a subset of it right as the data then let us say I'm going to create it into the wall a baby then I will say and then I'm going to use set yep so this is what it's going to create into the word library but I'm getting another sash input which I have dot for ourselves right the condition which I was supposed to use let me print it again so that we can see that what condition are want to you know look for that's a product category okay tools so I'm going to create a dataset or subset where product category is equal to clothes so I'll take only all the records with product categories clothes a one create a dataset with those are so right first whenever we'll run it you can see I have to go back to long it says 240 of the mission read from the SAS data set over sales where product category is equal to viewers go to my work library and you can that subset have been created and it has all the records with those product category clothes all right it has been loaded one can do 40 records all the records for clothes so we have created a SAS data set or subset from the ourselves as help library wherein we have taken all the records for the clothes so this is for you we have created a subset using their cross now there are other Vedas which you can use to you know create data set who will reveal to them as well let me move to the next slide now we get that innocence this film is this which has the data pertaining to that in years so again you they have they want you to do this will be your assignment that you are going to create a dataset where year should be one through July 19 1199 alone okay alright operands a constant operand is and things to value okay character value must be enclosed in quotation marks and numeric values to not use Corrections so you must be aware that in SAS we have only to date arrives one is no American and I visit character so what characters really you need to use quotation marks as I have used you to do this with grooves what fulfills you you need not reduce these single quotes you can directly give a numerical value where every operands must be of their ever coming out of common input dataset for example where genders will do them it was a character that field that is where them includes into quotation marks but salary is a numerical field that's when they have given the numbers as it is without two matrices in the other two statements identify the variable and the constants numeric or crack right it's really pretty easy let us use one of the examples which with the numerical field okay so I have just we have just an example with the character field now with the newsletter field to in order to do it with the numerical field let's say we have oddity right so what I'm going to do this is the sake but let's say and there's good too and this time I will take quantity here take wanna be copied and I want to create this dataset where the quantity is more than hundred as soon as we do run this statement SAS will create another dataset for us it's in there were 866 observation read from the data 6 hassle brought our sales let when it is greater than 100 so if I go back if I go back to the expression will go okay I want to see my expression with explore a little okay so you want to see that explore their work library this is a subset - if I double click on it you can see that all the records which I have here has the number of item that is point to be actually that will be aa greater than hundred it will not have any field in this column which is less than hundred so this is how I have created another d-does it hasn't having them 866 tickets so this appear this is the way you will be using where clause but the difference which you have that four character feels be using the goal for the medical trees you do not using labels so this is how you can create data set using their drawers right now let's move to the next light it's just comparison operators so in SAS we have these comparison operators which you can use you know like we have used equal to and greater than facility but these are the symbols in the first column which you see you can use either these symbols you can use eqe not equal to so you can use this string for master the meaning of this is given here and how to use them it's given an example so the first one is equal to DH and not equal to again not equal to or not equal to so you can use morning paper three times so three different ways then greater than less than greater than or equal to less than or equal to or in event so for anything as you if you do any SAS is equal so these are the ways you can use these SAS to create yourself set to print a new data set or to create a subset of a dataset from one lab they determine automatic operators which we use for automatic operators indicate that an automatic calculation is performed right and the symbol is given here that who has check is for exponentiation okay the power basically multiplication one a strict division same as you do in mathematics so basically exponentiation or the power thing you can use two as checks for multiplication one so using these and we'll be working with huge data or if it could be required to place any manipulation is greater then logical operators logical operators combines or modify expressions so hand or or not these are the signals which you can use and when we use and or not while creating or subsidy dataset or while doing some data ID so you will be using this husband and this is the example what they have given here okay so they are trying to create a subset they are trying to create a subset which says data subset two and set SAS health dot over cells we're here in two thousand or two thousand two and the product line is children and properties greater than Python so if you want we can run the direct statement and we'll see that how it is going to create subset the exact three for us so as soon as we have ran it you can go back to your explorers work and again this upset three has been created so this is what we have just created a drive it has six reading parts it will go to the log window you can see that aspect that there were 63 of the vision read from the data set so you can see that wherever we will get error you will get like this your success results are in Bluebell and Bernie Chi period in you know green color so we haven't seen got any warning yet but there will be things when we will be using them during the mornings aspect so we are going to cover today reading data sets and subsetting data sets we'll talk about labels and we'll talk about formats let what are the formats and labels in size so before I go ahead with the terms leading and subsetting data sets so let me just tell you that in a previous call we saw some options or functions which were used to subset data sets like where or we did work with a lot of operators and operands so let me just continue with them and I would like to show here that these are the comparison operators I told you in the previous class that we can use these functions to subset our data sets so let me just tell you that you can use symbols like this equal to or you can use EQ as well at the place of equality right and how you are going to use that either you can write a is equal to 3 or you can write a eq 3 in the same way not equal to you can write in all these three types alright or you can write an e like they have given an example here that a and III so this is not equal to same way you greater than then we have less than we have greater than or equal to or you can write less than or equal to and similarly in so in will work in the same way as we have been working in to sequel server alright so let's see now we have arithmetic operators what are automatic operators are somatic operators indicate that an automatic calculation is performed right and when we say automatic calculation it could be your multiplication addition division subtraction or exponentiation exponentiation is nothing but the power like you want to create the power or you want to get the power of any male number right so these are the various mathematical operators which you can use to perform some mathematical calculation now we have logical operators logical operators in all subjects are always same that's aunt or or not but here in SAS environment we have a different symbols so for and we will be using this and then or we have to pipeline operators and then we have you know a film a tree sign between them and not then we have these three things which you can write not write and at the place of these symbols you can use your syntax like this as well alright so the symbol you use or or not depends on the operating environment so in UNIX you will have a different sign and in Windows you will have a different sign so create a data set subset to from the sales that list only product line children or even numbered a for even-numbered year and profit greater than $500 so you should get 63 observation and they have given one exercise here so we'll be doing that let me go to SAS environment so that we can you know perform the same task so I have already open a SAS session this alright so what they want us to do they want us to create a dataset subset to from sales ok so I told you that in library SAS health we have a data set called Oh our sins right right now you cannot see that but let me show you draw print you might remember how to say it because I have already told you several times that how to use this prop and you need to give data so I will give the reference to the library then let us set name that is Oh our sales I'll terminate my statement and then I will write the run so as soon as I will write this or I will run this you can see that I will get an output and that will show the Oh our sales data set now you might wonder that from the beginning till till now I have been using you know proc print statement and I have been getting the result into a separate window whereas I already have a window called output all right so this is what I have you know you can say I have already added this option to see the output in this window but otherwise what you can do is you can go to tools then options you have preferences here then you have result tab and at the place of this create HTML you can uncheck this all right said that and I'm going to click okay so let's see how I'm going to get my results now okay so it said that no output destination active right now I was supposed to get output here but I haven't got that yet so for that I will have to change some settings next thing I will have to go back to the same options then preferences and I will have to say create listing listing is but nothing but the output listing what I'm going to get here so if I run it now you will see the result in output window like this all right now the other window is not getting popped up I'm getting on the result and loop into the output window only so let me go back to the exercise what it said create a subset data set subset so I told you that how to create data sets to create a data set or a subset what you need to write is you can you need to write data then you need to give reference to a library so let's say I have a library called Satya here I will say Satya dot the subset this okay so this would be the name of new data set which I'm going to create now how I want to create it so for that I will have to use input data set that I am getting from SAS help library and my data set would be Oh our sales right so I have already shown you that where this data set is who are sales now the condition they want us to put in that the only product line children for even number okay so first of all let us put only children as a product line so you might have seen that in the database sorry the output which we have got we have a column here product category or the product line basically they want us to put a filter on product line and only children so for that I'll go back to SAS environment editor window and I will right where product line is equal to children I'll copy the children here pins that into the children goods and then I will save them alright my syntax is complete but before I run this program let me tell you that Seth is not a case sensitive environment so there are SAS reads satya or SAT by a or Satya all of them as a same character or same string but if you are specifying that into code then it would read it as a case sensitive until illness you are not putting quotes in around it it will not read or it will not be case sensitive so if you want to check or validate particular value then you will have put it into strings or not strings I would say you will have to put it into single quotes so that's as well you know understand that it has to be case sensitive now I'm going to run my query which I have already written here so let's see what I am going to get now all right you can see that size subset got created now first of all before I see this data set I will go and check my log window it's recommended for you as well that try to see the log window try to get some earlier with the log window as soon as you can be right because log window tells you each and every thing that what is happening in the backend so here it says that data satya dotsub set to set SAS health dot parcel so where product line is equal to children run their birth 176 observation read from data set SAS health dot over cells where product line is equal to children the data sets at the adat subset to has 176 observation and eight variables alright and it said that real-time took this much and cpu time is this much so we have created a dataset now there are two ways I can see the data set either I can double click on it it will open the dataset for me oh wait a minute it is opening the dataset for us here we go we have got all the records where product category or the product line is children all right I'll close this the other way of looking at the same data would be I use rock print and run yuuna output for subset subset to our data set all right now let me clear this window first clear all we are again back to our editor window but you might have a question here let till now whatever proc print I have been using I have been giving data and then reference to library and then the name of data set but this time I have not done that I have just written proc print and run so let me tell you that whenever you run this statement wherein you are not specifying your data set you are directly running this program or the proc print a function or statement SAS identifies that you want to run then last created data set okay so whenever you will run this much statement without specifying your data set name or the library name it will automatically pick the last most created data set or the latest data set so in our case if I go to library then subset 2 was the latest data set which was created and that is why you have got it or it you know it have printed that data set but let me tell you that looking and the output into this output window is not that you know good or user friendly so I will switch back to the same you can do which I had earlier now you can you may understand that do I have been looking in and that window or why I had enabled that window because I love to see the output on this Explorer window so I have turned that on it takes you know it takes a time no no this time it this didn't work you can seal to the long window it says writing HTML body files as HTML dot HTM but it did not open ok so I will have to close this listing first of all listing is the output window whatever you get into output window and it will start using this create HTML so I will you know open it again I will go back and I will run it again so I have got the resultant to the Explorer window again so this result looks good when you look at this it looks beautiful all right now if you click f9 you get this window alright where I just clicked f9 and I have got this window tells you all the keys and the definition that what they do alright so here you can say that F 1 is for help F 2 is for retial F 3 is for end or just a mid buffer default a force for recall f 7 is for output similarly there are you know keys which do not have been assigned anything but you can see that clear it has been assigned to control e ok or apart from this if you want to assign anything you can right here and click Save okay now I am going to close it and if I press f12 here you can see the log got cleaned on the output got clean if I go here and click f12 you can see all the logs which were there in the log window but clean so this is how you can in order assign a definition to work so this is again video if you are you know you if you need to work with a lot of short keys or fast keys you can you know sign the operation to those keys and you can work so this is exactly you know the ha the good programmers do this is the best practice or the good practice to follow so I would recommend you to create some you know please fast keys for yourself so that you can use them and work all right so it says control base for library let me just do control B into control is B okay I've got the win a list of libraries here so again yes it's working now let me go back to the slide not see on the slide basically said that created data set subset to from sales that list only product line children for even-numbered years and profit greater than $5,000 so till now I have put only one condition and that was let the product line children now I'm going to put another condition that says the profit should be greater than $5,000 so you can see here that profit column is this I'm going to put this condition and write so you will be using and operator with the latest condition so I will write and profit should be greater than 5,000 all right I already have this subset - if I do not mention a new name it will overwrite my previous data set that is subset - so let me show you that I'm going to create third dataset called subset three I run this condition or the program subset three got created and if I run this query it will run the latest one it means upset three you can see this is the subset three now here so all the records or the profit you can see that is greater than $5000 you will not see a single record with less than or equal to $5000 now this time I have got 122 records but my exercise or the result says that I'll be getting 63 observation but I have 122 records why because we are still left with one condition that says for even-numbered years so now we need to put even-numbered years now what is the way by which I will find out only the even numbers alright let me tell you so even-numbered years you can see in the year okay so let us use mod let's see what I'm going to get I'll put another condition into this and I have column years right this even-numbered years so I will say into the editor window I'll write mod years divided by 2 should be 0 let's see what it does it ran I need to check into ok function mod requires at least two arguments so the mod function which we have used is not correct mmm okay I may be missing something in the mod function all right let me check okay so this is the best place where I can show you that how to look for help into sass okay I will go to this book item which sells help or you can click f1 the another window will open and then we will see the mod function all right now get you should also see that how to get help from the sass I have put mod it says mod function and it says mod function returns the reminder from the division of first argument by the second argument division of first argument by the second argument so what is a syntax it will tell you that how to write that syntax so basically the place where I use the divide sign or the divisible sign you need to just use comma right now if we run it let's see what's it gonna do I will go back to the long window again and it says that it is started from here and said that there were 63 observation lead from the data sent satchel dot over sales so you see can see that we have got the same number of observation as were told into the file that you should get 63 observation they have got 63 observation and if I run this query to see the output in the output window here we go you have the result here and you can see that we have got maximum 63 records so what I have done I have put a filter and then created a data set for product line children for profit greater than five thousand and year should be where ear is divisible by or I would say here has the even numbers all right now another thing which I wanted to show before this program I created subset three when I put two conditions now what I have done I have put three conditions and I have run the same program so what would have happened the data set three got overwritten okay so if I put the same output data set name and I change the condition so it is fifty of the conditions whatever new data set will be created will replace my existing data set so you need to be aware of it or you need to be you know I would say yes you should be aware or conscious make it sure that you want to overwrite your old data set or not if you don't want change the name of your output data set and if you want to replace or override the old one you can give the same name okay so it was a good practice for us let's move on to the next slide so we have or done this exercise this is what they have given here now they have given a done the same exercise with a different way they have said that here in 2000 or 2012 what we have done they have used mod function and then we have tried to find out the result but the other way of doing that is they have given years into in clause but for this you must have the knowledge that what are all years in the data set so they I mean whoever has written this example must have been aware of that there are two thousand and two thousand two only two entries for the a veneers so that is why they have mentioned those two things here otherwise you can use to make a dynamic that mod mod function remains same across X sequel and SAS so you can use it anytime to find the even numbers or the odd numbers if I would have found find odd numbers I should have written is not equal to zero so every time or in each condition it will give me finder okay so this is how you gonna create or find all the even numbers so let's move on to the next slide it says assignment and here what we are going to do is that navigate to help and in index type in between and operator explain the following with example so basically now our next topic would be to work on all these things whether between and is know is missing contains or like alright so let's move to the these topics to explain the following between and end up to like we will need the following data set called cards which we will create conditionally from SAS l dot cards so let's see do we have the data set into SAS help or not so I again either write the same thing that proc print data is equal to SAS health dot car then run so as soon as ever run I'll be running this proc you will see some output and if it is not the means I'm doing something wrong so you can see that it said that file SAS her dot card or data does not exist so I will have to go manually into the library SAS help and I need to check it out whether we have car data set here or not okay we have cards not card we have cards so I'll go back and check or change my query to cards and then I'll run this again so you can see we have got the result like we have make model type origin Drive brain we have MSRP mark then we have annoys engine size cylinders horsepower mpg and viscosity then in pidgin is score highly weight wheelbase and length so we have got the data for SAS help library and the core cars data set now what they wanted us to do is that prism the following code and create the data set ok so they have put in clause sorry it's clause and they are using n what is this underscore and underscore let us use this and then we'll you know find it out that what is n underscore n is okay let me go back to the slide I will go to SAS environment my editor window I have spaced it that you know code so what they are going to do is I will create this card's data set again into the Satya library and I'll run this so what will happen I'll go through the log window it will tell us okay said there were 428 observation read from the data sets as held dot cards the data said satya cause has 100 observation and 15 variables hundred observation is hundred rows and 15 observation means 15 columns the data statement used to reprocess time this okay now let's see the data said that how what kind of data set it has created for us so I run this and they will get output in the result window okay so this resultant output has only hundred rows and all right now let me show you let me create another data set which is gonna tell you that what exactly the cool is doing okay so I will first of all right I'll eliminate these two things I'll run this code I'll go back and print so as for my code I wanted but we're underscore and underscore should be less than or equal to hundred less than or equal to hundred okay this was the first condition so they have given that now basically this is nothing but the observations right so this is the observation column it just puts a condition that observation should be less than or equal to hundred it should not be more than hundred right now the second condition was they wanted this n underscore and in ten twenty forty thirty then cylinder is equal to missing for these fields cylinder should be missing so if I implement the second condition I'll go back to my data set so it said for 10-20 let me show you that again for ten cylinders shall be missing right now you can see six here now let me put that code and see that whether we are getting the right thing or is it something else I have done it again I will go back to the results window again okay so you can see where my observation is ten I have got missing as a surrender so earlier I was getting six but now we have missing value so it says that wherever observation in underscore n is in ten twenty thirty forty so if I go back and check for the twentieth row I would have the same situation over there if I go back to the 30th through I would have the same situation over there if I go back to the 40th a row I will have the same situation over there and same for row number 50 alright now let us implement the third condition what was that it says that if observation is in between fifteen twenty five thirty forty five fifty five then type is equal to blank right so for 15th row let's see what is there in the fifteenth row and the type so we have for fifteenth row we have see them as type so as soon as I will be implementing this you know this condition I will have blank as in type so I'll go there and I'll write another statement and I'll run my query and I you know proppant my state data said once again so you can see on the 15th row we have blank here right 15:3 have blank now you can say that in cylinder I was getting a dot over there but in case of you know type it's blank there's nothing all right so look at down the line we see that the code does however for the inquisitive who we are creating a subset of Oh SAS help dot cards where we initially filtered out the only first hundred observations within that we change the value of variable cylinder to the missing for five observation where observation was five twenty thirty and forty and fifty and do the same for five observation for variable type alright so we have you know change the values to missing values for these observations so these were the topics which were left from the first session now let's move on to the next slide which I wanted to cover today and in continuation with a lot of functions we have between and and right so let's read that what it's talking about and we'll be you know using them so this it says the between and condition outputs all observations that fall within a range specified by the Fed statement for example our data set car has various cylinder size between 1.6 liter and 6 liters the following code will print out the observation that have engine size between 2 liter and 4 liter so basically till now whatever we have been doing with the cards data set now we are going to put another condition and this time I'll be creating another data set angel sighs okay so I'll take I'll change the name of our two data set and I will say data Satya dot engine size let's say sighs all right I've given this a name engine size and now my conditions will change so what condition I want we specify here that we're engine size so engine size is given into this column engine size is between what condition I need to give two and four all right so I want all the records where engine size is between two and four I want all those records four cards where our engine size is between two and four and that to be saved into engine size data set so I'll be you know copy paste copy this talk so that I can reuse it here we go and I will run all these once again I have selected both of them and RAM so it has created the data set for me and you can see here that engine size will be between two and four when I say two and four and I'm using between so it will cover two as a minimum value and four as a maximum value so including these two it will provide me the complete values which will fall between two and four including two and four so all the records where you see you have minimum value s2 and maximum value as swore I haven't seen four yet but yes we have a four here all right so it will show you all the records wherein you have you know valleys between in engine size column the values between two and four so this is how you can use between and function in two stars for subsetting says the two-valued need not to be in ascending order okay and using the condition between four and two will output the same result so it's a very good thing to know that it's really not necessary that you give two and four you can give 4 into as well but however you will get the same output in both the conditions when using character value the same condition outputs in alphabetical order so similarly they have used another condition here or another appraoch let me just go back and I will be also writing the same thing here similar kind of function which says let's say there I'll take and any other thing let's say it will take alphanumeric things so let's say make I'm also going to take make between Acura so it's nothing but it will see the alphanumeric thing all right but the alphabetical order not exactly the alphanumeric things it will check the alphanumeric order and I let's say Porsche a and I learn all these queries so this is the result which I want her to see so I have make as an Acura and then if I go to the last I have Porsche so it has given me all the records between Porsche and Acura so this is how you know you can use the to win and either four numbers or false string or character type of you know variables okay let's move on the next slide we have is null and missing four is missing so I said is null as well as it is missing can be used to output observations that have a missing value they can both be used for a character as well as a numeric value the two examples below all right so it says that prop print data is equal to car or where cylinders is missing for numeric field or for character field you can write prop in data is equal to cards where type is missing so before I go ahead on this example let me tell you that what is a missing value in SAS so similarly as we have none in sequel server write null value nine means nothing in same way in SAS suppose that if you leave anything blank okay let me show you I will create a dataset right now for you and then I'll be using a missing value into that so I have not told you yet that how to create a dataset using data lines do I have shown you the examples so let me create a data I'll save it in two dates at the library and I'll say test I guess I only have a lot of test data sets over there so in Satya oh great I don't have any test dataset so test and then I will write input where I need to specify my variables let's say I will give their address as ID then I will give a variable ID is to be in teacher so let me tell you that there are only two type of data types in SAS normally if I talk about sequel server or other tools which we use we have a lot of data types available but in SAS we have only two data types one is numeric another is a string so forth numeric when we you know create data set we do not specify anything with the name of variable but for you know character type or string type we need to use a dollar sign okay so let me use a dollar sign here like this and after this dollar you can specify the length let's say I have given them and I will end my statement so what it will do is if I do not mention the length 10 so it will by default it will take eight lengths for the name variable but I have given ten so it will be up to ten now I will write data lines and below data lines you can give the value for these variables ID and name so let's say I've given one for Heidi I will give you space and then I will give a name called SAP Andra then I will take two then I will take say Shawn then I will take three I I will leave it then I will go back okay I'll give it four and then I will give RAM a leave it blank give it a space and then I will give chef again I will give six space chains seven Thomas nine Kathy then Eileen okay so I have given various values here and when I really know then this statement a data set will be created but you will not be sure that what is the order in which the values got inserted so right now it would be a confusion state for you and I would say that try to understand this as per your own and later on I will tell you that what is the mechanism which has been followed to insert the data and this is the main reason that is why I then tell you how to create data set using data lines yet so I will be telling that later but right now I am running this and you can see that what kind of data set is gonna create for you it has created tests and if I run proc print statement so you can see that it has created a data set for you and this is the output right you can see it has done you know render you might not understand that what it has done to create this data set all right and right now I am NOT going to tell you the logic that why and dead like this but later on I will explain each and every thing that why I ended so basically if you look into this data set you got a dot here in case of ID field and for all named columns you have spaces or blank right so I talk to you about missing value missing values okay so in SAS we have you know depending on the data types we have two type of missing values it could be your numeric field numeric value or it could be your string all right so there could be two type of machine values in the numeric or string string basically not I should use character here okay so four character and numeric I shown you the data set this was the numeric field and got us dot here and force character or string I've got blanks so this is exactly happens when we get a missing value into the data set we get a missing value as a dot for integer field or numeric field and we get blanks blanks for you know a character or a string type of data or variables so this is exactly how you know missing values behave so I'll go back to the same example where I was this that prop print data is equal to car where cylinder is missing so basically I'm not going to specify a blank here or in the second example I'm not going to you know yeah in the first example I'm not going to specifying a dot or in second example I'm not going to specifying a missing I mean blank at the place of that I'm going to write is missing okay so is is missing function will do the same job what a dot for numeric field or this blank for character field is going to do but is null is null is again the same thing it says that is null as well as is missing can be used to output observation that has a missing value so let me change whether we have a missing value okay I can use this missing value into this column or this so I will write the same statement that proc print data is equal to Satya dot test and I will write where I'm just printing right I'm writing wear a saree ID is missing done if I run this statement let's see what I'm gonna get I have clicked on f8 and you can see that I have got only one record okay because there was only one record in the observation 5 where I have a missing value had it been a name column at the place of ID that we use that as well their name is Missy so I can get the list okay so right now I have got nine records nine observation I should say that I've got nine observation where name column is blank and they have got all the records so this is how you know you can use is null or is missing in your examples while subsetting a data set so now let's move to the next slide so contains is the condition it will check the value it will check the value of a variable to see whether the specified characters are found and will output only when the condition is true the condition is here a case sensitive why because you are going to give your value into inverted commas so for example you can consider this code that proc print data is equal to cards where model contains convertible and run so this will output all the observation that have convertible included in the model name if any of has convertible it will not be included right so why it's convertible will not be included because in that case the case is different so it says that contains is case sensitive because you are writing that in two inverted commas I told you already that if you are writing anything into you know inverted commas or single quotes that would be case sensitive and until illness if you are not writing that into inverted commas it will not be cases Jason said it so convertible right now contains works you can say more on the same as like in sequel server we have like in sequel server and here we have contents and you you could you must have used contains in Excel so exactly the same way it it will work for you so let me show let me go and create we're okay let me proc print data is equal to SAS help dot cards I'll sim where make content done now what make will contain that I will have to check it out from the data set because I really don't know okay let me say Pontiac okay p om let's see it works or not I need to go to the space and I will write here P Ont okay now let's see that what I'm going to get I have run this statement and I have got the result I've got all the records where I have make as Pontiac alright so it worked for me now if I go back and check out into let us suppose say model and then I will find grand ok I'll go here at the place of make I will change it to model and then contains and I will write here is brand alright let's see what it's gonna get I've run this query and it has given me all the records where the grand world was used in the model variable so this is how your model variable works or you can not know exactly model variable the contains function works for you all right now let's move ok let me show you one more thing as I said that convertible and convertible with the different case will be different thing for you so I'm gonna use grand as D in small or the lower case now let's see whether I will be getting in a record or not okay I've done that let's see what log window says Laguna says that no observation was selected from the data sets as held out cards there was zero observation read from the data set why because I have just changed the case of G here if I change it again I will get back all the records for the Grande like 8 observation all right so it would be clear now that contains is case-sensitive and how basically involves for you okay now I'll be moving to the next slide it says like okay so like it again are the same thing I told you that like as we add the contain is working or like works into the sequel server as well to explain like we need to use another data set with data taken about some aircraft operated by Air India we create the following data set right they have taken an example where data airline lens called sign 8:00 departure 10 arrival 10 now after that they are you know inputting them so input callsign departure arrival they have taken three variables callsign departure and arrival okay and then they have used data line and then the program so let us create the similar program for us so that you know we can work with it let me than the same program for you I'll go ahead and paste the code now if you okay let me create this intercept here so that it will be you know available every time whenever I will look for it so Satya and if I want to use proc print and run I'll get the output here yes there's doubt for me now let's see that what they wanted to do with the example now the following code outputs all observation where value of variable callsign takes the value starting with a i-894 load by character over here our data gives the output as those observation having called sine values a i-89 one and a i h9 to top kill data a India where call sign like this so again you are giving into inverted commas so this is also going to be a sensitive now let's say in the same data set I will make some changes data is equal to satya dot Air India now we're call sign like inverted commas and I can let's say if I give Delhi AI because it's call sign I have given nine let's see what I'm going to get oh no I haven't got anything it said that proc print data is equal to such a dot a India where call sign like a nine oh so you can see that I have got errors that is in highlighted into red color and it said that syntax error while parsing where clause and you can you know go through the log and you can find it out that what could be the wrong it said that syntax errors statement will be ignored by my statement will be ignored if I have within it right so you know that in SAS you need to terminate your statement using semicolon every time and if you're not using that it means you are creating a mistake so I have used it again and I will run it let's see now what happens it said that no observation was selected from this so now the query ran but it did not select any observation okay so let's go back to the slide and what it said it said that you need to use call sign this okay so how many variables you want let's see I will give two okay and then if I then know observation is selected from data sorted out Aaron Leah so it's not doing that for me let's see what I'm doing wrong I've written prop and beta is equal to sub K dot Air India where callsign let me change like eh I okay I understand after a nine there are three variables that is why I didn't get any record okay so if I run this now I'm sure I will get two records okay so basically you need to mention that what is the width and if that width is available over there in the data set then definitely you will get output or otherwise you want so this is how you know like works but you need to be sure of that the length let's do the same example with departure column so now this time I'll be using departure and the place of call sign and I will use de and for Delhi I have three one two three okay now let me run it and let's see whether I get any record or not okay I've got three records so this is how you know you're gonna work with like operator now then after this okay it said that the following code outputs all observation where the value of the label callsign takes the value starting with a i8 followed by any two characters hence two underscores over here our data gives five values as output all starting with a i8 so proper in data is equal to air and there their call sign like a i8 and two spaces it means to underlines which underscores you need to give and as soon as we they have given it they got five records or five observation as an output now I lived on this score allows us to have one character that takes any value if we use the percentage symbol it can substitute any number of characters the following code will be out will output any number of digits as following AI the prop in data is equal to air in there semicolon where called sign like a I okay now this is the best thing or I would say the best practice to use and the place of giving those underlines or underscores had it been I would have given percentage sign I would have got all the records in one go and this is the best practice to know or you can follow the same alright I have got the same result but I have this used percentage now you need not to know whether how many spaces you need to use or how many characters you need to use so let's say if I say P I've just done P so I will get all the records whether it is Port Blair within spoon a I will get both the records into my output so I've just run that and you can see that Pune and Port Blair are there because this is exactly how it works now there is another thing let's say if I change it to be and I'll use a percentage sign before it an it a sign after it so it will give me all the records where B is in the department name you can say Mumbai al-abaad and Port Blair I've got four records and in all the cards at the place of the courts I should say observation so in each observation we have B in it alright so this is how you can use like to filter out your data now it says the following will output only those that have departure and linked with I right so they have used percentage sign before the character now it will give you all the records where the puncher is ending with this character note whenever using like with percentage try to keep that value enclosed in single quotes and not double quotes they remember why later on you will learn the macros and you will see that if you use double quotes the macro value will raise to result potentially leading to long output so it is in the advanced as macros wherein we'll be writing a SAS variable into double quotes and with the percentage sign so to avoid that we need to write our you know values in single quotes with a percentage sign here while using light alright so till now we have you know today we have covered between and and then we have used where like contains so we have you know covered some of them some of the you know major topics today to subset our data you till now we have seen that how to work with base SAS software how you can import data from a raw text file from a excel file and how can you create a dataset and how can you create sub datasets from a existing data set we have gone through various SAS libraries which I shown you when I say libraries means we have already worked with SAS help says user we have a work library and they have created our own library so we have already seen that how we we work with a lot of objects in SAS software today what we are going to cover is about input buffer PDV and how we can compile and execute our code in cells so I know that input buffer and PDB are new terms to know but we'll be going through all these and we'll discuss how exactly these impact our you know day-to-day working with SAS software and base SAS so let's get started all right so all of you know like when we write a SAS program what exactly happens let's say I will talk about of just a simple thing right let me go to the SAS helps and here you can see that we have a lot of datasets so I'll just write a simple proc a program let's say data and I'll give it a name and I'll try to save that in world dot air and then I'll say set SAS help dot air times 7 so it's very simple program and you know that what I'm going to do here is I'm trying to create another data set called work and therein I'll be you know what I'll be doing I will just create a simple copy of SAS health dot air into work library with the same name called air so as soon as I you know run this program you go to the log window and you can see alright so what you can see here is that we have got the warning though it says that there were 144 observation read from the data set SAS held dot air the data set work dot air has 144 observation and two variables data a state will use totals process time the real time and CPU time so it gave you the complete information that our data had been created but do you know exactly that what happens when you write such a program right there are two steps in which or after that this programs gets completed right so this is what this slide is talking about that whatever program you write or whatever data step you write in SAS always says what happens the base is the data step proc gets processed into two phases the first phases compliation and another is execution so when I say compilation compassion is nothing but whatever code we have written as it happens in every I know programming language first of all whatever code you write that gets compiled and the second step happens that it gets executed all right so in the same way whenever we write the SAS program it gets compiled first of all in the first phase it gets compiled and in the second phase whatever old code is compiled it gets executed right so it may happen that sometimes we get some error in the compilation phase as well so in compilation change whatever errors we get in comparison phases those are basically nothing but sin syntactical errors right so if there is any sin testicle error that will get caught in compilation phase only and that shows that there is a problem or the you program whatever program you have written that's not correct so if you get any error in compilation phase that shows that the program which you have written is not written in a correct order but like sometimes you get errors in execution phase right that shows you program whatever program you have written the SAS code you have written that is correct but if you still got a problem or the you know error in execution phase so those are basically because of the data right so because of the data whatever error you get that happens in execution phase but due to programmatically errors in testicle error whatever error a warning you get in compilation phase that shows that your program is not correct right so it says that under SAS help under the heading data step processing the complete flowchart is given on what happens on back end or when the data step is processed right now it says that compilation phase during the compilation phase sass checks the syntax of data step statements as I already told you then it creates an input buffer so you remember in the topic I said that we'll be talking about input buffer then we'll talk about PDV so here is a term called input buffer to hold the current data file record that has been processed and then SAS creates a program data vector this is nothing but the P DV to hold the current SAS observation observation means a record or no then it creates the descriptor portion of the output data set right so let's talk in detail about this compilation phase right so during the compilation phase SAS check its the syntax of the data step statement so this is the first step what's as basters that it checks for each and every syntax whatever you have written in your program it checks for that validates whether the program which you have written is correct or not right now I am specifically talking about the data step right now because the first slide is about the register so we are going to work on the data step right now and I'm talking about the compilation phase that in first step it checks for the syntax of the program whatever you have written the number two it creates an input buffer to hold the current data file record that is being processed so now let me tell you let me just go back to the SAS system here and I'll click on this a dataset it's going to open the file of the a dataset now here you can see that we have two variables date and another is international air travel that is in thousand okay in date we have a date in mmm dd format and then we have international in your travel in thousand that may be the amount let me call him attribute I have just double clicked on it it says his name is air okay so I guess this is this may be the distance or this may be the you know the amount in dollar a thousand I know I'm not sure what exactly it is but yes we have two variables here and number of observation when I say number of observation means we have in total 144 observation in the dataset so let us say the program which I have written here that is going to replicate the same data set into the work library and this is exactly what the program I have written now you can see that I have written data work dot air and then set SAS held dot a so from this I have set a data set called a from the SAS health and its gonna replicate the same data set into the work library okay now what exit actually happened in the back end whenever we will run this program first of all as a first step according to the slide that SAS program is gonna check the syntax whatever has written so once if it will validate the each and every syntax of the program what I have written if that is good to go then SAS will go to the next step and what is the next step is as to the slide that it checks the it creates an input buffer to hold the current data file record that is been processed so what happens the dataset which we have here SAS selects each row one by one so you know one go it processes it process one row and for each variable it creates the input buffer it means it locates that they turn that record into the temp ready I would say yeah input buffer is nothing but a temporary memory allocation which happens in the back end okay so it locates each record with a temporary a memory allocation or what it does that it allocates data record into that temporary memory of SAS for the time being right once it creates or it locates that the memory that recorded the memory once that record is fulfilled that complete record gets shifted into the another destination like in the world or air okay so let me just open us Excel here so that I can tell you in detail that what exactly happens in the back end okay let's give it a minute let the Excel we open in my system we call it what happens in the black box now black box is nothing but the back end of SAS system here we go so let's say we have two variables a and B and we have your number of Records here we go these are the very we have number of observations here so whenever I will try to create you know duplicate or a click of this data set in data set into the SAS space so what will happen let me do the see okay we have for observation so what SAS will do as soon as you know I would write the similar kind of code whatever I have written here that the data well dot a Lin SAS held or air so what I will do first of all it will you know it will observe that what type of variable I have if it is a character or if it isn't America it will you know similar kind of depending on those data types it will create an input buffer okay that will be in the back end it will create for a and for B similar kind of variables it will allocate this space for these variables into the memory of SAS or not exactly memory of says the system okay in the back end and first of all it will create the spaces and then it will fill the values one by one and as soon as this step is complete what it will do it will shift these values into the destination what I have given to the SAS when I say destination means in my case yes that is a work library right this is the first step what it does in the first to go it will do this this process now the iteration goes back to the second row serial number and then it picks the next two values it overrides the original ones not be they basically overrides and deletes them from from the input buffer and then flips it into the same you know input buffer or the memory allocation what is happening for the temporary for the temporary okay and then all right so as far as the first recordin gets shifted to the destination just after that as I already said that it works a record by record right so as soon as you can see that first record got shifted to the destination like the second record the counter or the pointer goes to the second record and the input buffer which was created for the first record got emptied and then the values for the second record gets shifted into that input buffer okay once the record is completed here what how is basically it happens it happens for each variable at a time right first of all the value for that if an a will be shifted here and then the variable forum value for variable B will shift here and as soon as both the you know the complete row got shifted the complete row gets shifted to the destination and again it will get emptied right now next the pointer goes to the third row and then it will shift their values to the input buffer so similarly one by one like each record gets shifted to the destination via input buffer right like how it works basically this information so what happens that each line is processed in order from top to bottom to vanish it top to bottom so like first of all the serial number one it will start from serial number one or the observation number one to observation four and then top to bottom means first of all a variable a will be shifted and then variable they will be shifted so similarly one by one by you know shifting values from each for each variable into input buffer and then from input buffer to the memory it gets shifted to the destination right this input buffer is nothing but called PD right the PDV is programmed data vector okay this is just a called program data vector but again this is the same thing where it saves the input buffer for the intermediate period or time period right and then it creates a descriptor portion of the output so descriptive portion I told you in the beginning that what basically descriptor portion is in order to see that again we can go back to the SAS environment and I will just write another cross let's say croc data and then I will say data is equal to work dot a talk I need to give the name of rock that is prop content and I will say run and as soon as I will run this prog you will see the output okay something went wrong here let me close it and I need to go to the log window and let's see it's the procedure okay content it's a Content proc contents now it's showing me running let's see that what happens in a minute as soon as the complete proc will run do win you know get some output and what basically I want to show that it will open another window with the results now here you can say this is a proc contents and what it is telling you about that what is the name of data set what member type of this data is how many variables are there how many observations are there indexes is there any index observation length is it compressed or not when what is a date when it was created what is the data representation right so it is a complete information about the data set so prop contents give you the information about the data set and this is nothing but the descriptor portion of data set so I have already told you in some other session that there are two you know parts of data sets are there number one the data and number two the descriptor portion so data portion has the actual data for a data set but the descriptor portion has the information about the data or you can say the metadata so what happens once the PDV is created with created it creates the descriptive portion of the data okay so that is the last step for the compilation we have finished till here we have finished the compilation phase of a data step and as we move to the next step which is called execution so what happens in execution that I turret of Lee inputs the individual records as we already told that it works from top to bottom in case of data error it writes an error message to the log and continues to the next record right so if there I mean if you get an error error in comparison phase you program will not at all run because there is a programmatic Alera but if there is a data error then definitely for that particular record of particular observation you program may not run but it will as soon as the dilation will move to the next record it will run successfully so the execution do not or does not stop till all the records are read however if there are mistakes in the record that will appear confusing and there would be I mean the long would be created for them right under files we have a raw data file called revenue to revenue - let's read the data using the in file and input statement similar to what we did last time see what answer we get ok so data where we need to let me just search for this data content so revenue - you can see here this is the data revenue - and this is the data which we have here in the file ok it has the location name then trade then they are direct so just give it a minute and I'm going to just shift it to the or move the data to the SAS just give it a minute ok so I will be writing the same program here as we have in the PPT so let me just close this file and I'll go back to my slide this is the which I really want to write data revenue to okay before I think this what I can do is I will create another location where I will be saving my you know the data set which I'm going to create so let's say the in this folder only I will create another folder and I'll give it a name called Sathya and now I can create a library with this name so I need to go through the here I will write live name I will give it a name called subject to but I can say it Satya and then I need to give the location in and as soon as I mean you know run this it will create a library name for me so I will go to the log and it says that was successfully assigned as follows so if I go here in the Explorer you can see this sub there has already been assigned now whatever data that I am going to create now let's say Satya dot revenue - and now this is the time where you need to concentrate that how I am going to import the data of this text file the revenue - into SAS okay now just imagine this text file here and the code which is written here in the SAS environment let it says data satya dot revenue - sub T dot revenue - this is the second level of name it means the name is the library and the second name is the name of the data set which I have given here then length hub dollar fifteen so hub is nothing but the location so like Frankfurt London that is the name of hub dollar fifty it is the format if you remember the session which I took about format and informant so the lens regarding any variable is given after the dollar sign so this is what I have done like I have given the length as 15 for hub the another thing is afraid so I have given a length of 15 again to source then type is a direct or indirect or other the length is 10 and then revenue what was the amount of revenue which were generated so for that I have given length as 8 so first of all I have given length I can you know define the format in this way or for each variable I can decide the variable later the format later on as well now after that I need to give the location where my file is located so for that I need to go to the location this is the location where my file is saved I will go back to SAS system and change this location I'll change it here only now then I need to give it in double quotes and then D Ln dl n is nothing but the delimiter which I have used so you can see into the file the txt file there it is Commerce separated so that is what delimiter is nothing but the comma okay and then input input variables the name of variables like hub source type and revenue as soon as I will you know run this statement you can see that the revenue to got created here in the contents of Satya so if I double click on it you can see the same kind of data set would have been created here but unfortunately we don't have the revenue values you can see that I don't have revenue values I am getting missing all across ok you may know it because the value which I had here when I was you know giving the program writing the program I have given it as a revenue and I didn't say that it is a bad character so it by default book as a numerical value but the data which I had was formatted as in dollar so that is why it it didn't import that and there is an error what I can do is that I can try to import this in psalm of character let's see that what it's gonna do now it will override the existing data set I have double click here and now you can see some of the values so this is how basically you know it creates but again you can see that the my values were comma separated so that is why you're not getting the full value here we will you know look for the other prohibitions how to import this demente limited value but right now what I am going to do is I wanted to show you that how this P DV and input buffer works here in this situation so you can see here the same situation what we were talking about that when we imported this data everything works fine except the last column now the reason that SAS by default using the in file and input statement can read only standard numeric data but the data which was that in the file is not a standard right it is formatted so it is non standard data and if it is not a standard data then data with the dollar sign in the column is non standard numeric data and by default when there is a missing data and the end of the row says does not the following so basically you can see if it is non-standard SAS will not assume it as a required format and it will treat as a missing so that is why in the first case when I didn't use the dollar sign here I'll go back with the same again I'll run this program and I will this time I will write proc print data is equal to Sathya dot revenue to and run so as soon as I'll be running this you can see the data is everything right all right except the last column because all across you can see the missing value in the last column okay so this is exactly what he was talking about so such a situation where you know it misses the last column okay we will exactly not dis because here the reason was that it was the formatted data so that is why I didn't accept any value but by default when there is a missing data at the end of the row suppose that there hadn't been that you didn't have data in the last column so it would have loaded the next record to finish the observation or writes a note to the log so the list over option prevents SAS from loading a new record when the end of the current record is reached okay so this is exactly what happens in case of miss over in file path file name is over if SAS reaches the end of the row without finding any value for all fields variable without that values are said to missing what happens to the back end right this is exactly what we were talking about but let's see let's take an example of miss over first so when I say example of miss over let me write our data lines let's say our guard data then I go that Satya dot test and here I will write input and I will take a variable cells let's say named no I will not taking in the syringe I'll say abc3 variables and then I will say data lines for a let's say 1 min to se3 here I will take five no five six here I will take only 3 then 5 then 6 comma 7 comma 8 let's say 7 comma a comma 9 and then again 9 comma 0 and then 1 okay I'm going to run this program and you need to observe that what exactly SAS is going to do right so I'll say later on after this I will write proc print data is equal to sathya dot test and run okay you can see here this is the basic difference which I wanted to show yes look at this data now what is happening like first of all I give three values for all the variables like a B and C so it's a is equal to 1 B is equal to 2 and C is equal to 5 we got the values for all variables then similarly the same thing happened for the third row as well for the second row that's 2 5 6 but for the third record we have 3 5 but for see we don't have any value in 3rd record so what is as did it went through that it filled the D values like 3 & 5 and then we have got 6 years where it gets a 6 long it got the 6 from the third value right sorry from the next row 6 and then after that it escaped 7 & 8 we don't have the 7 & 8 into the data set it frequently I mean directly moved to the next row I told you while using the input buffer if the record gets completely into the input buffer the SAS pointer automatically moves to the next row so if this 6 got shifted to the previous room I mean at this place and the 7 & 8 got skipped now the SAS pointer starts from here 7 8 9 so we have 7 8 9 here and then again 9 0 & 1 had it been a 5 here it could have been skipped so let me show you that once again run this program now you can see that we have five here my dad got skipped we don't have that in the data now the situation comes if I use miss over here as further slide you can see that if we use miss over with the help of I mean in with using in file so let's say I'm going to use miss over I can use years in file data lines and then missus let's see that what it's gonna do now okay so this is the new result which I have got but as soon as I have used miss over you can see but the data which we have entered like one to five we have one to five then we have two five six we have two five six then three five and the missing value in the inputs input statement similarly we have three five and then the missing value in the result so by using miss over we have overcome this challenge where there was no value for a particular variable similarly we are getting a missing value in the data set so was about to talk that if we you know what happens in case of PDV suppose that we are not using miss over then automatically its uses the value from the next row at the place of missing value and it skips the remaining values right I know that it may be bit difficult to understand but the main crux is here right it fills these three values for these the input back buffer I'm showing Excel right let's say that I have these values for a B and C so as soon as we run this code what it does it creates an input buffer for a B and C in P DV let's say this is the final destination and these are the input buffer let's take it as input buffer and this is the final data set okay now what happens in case of input buffer so as soon as we run this program this first values gets shifted into the input buffer for a 1 for B 2 for C 5 and as soon as these where is get filled it that SAS identifies their descriptor portion it shift is these values into the final data like this and then it sees these values now for each input buffer on the location they have the kind of data step or the data type they should have now it moves to the next row it fills to 5 to 4 a 5-4 B and 6 foresee everything goes well it shifts these values to the second row of panel data and then again empties these values right when is it empties it assigns a missing value in each field right now what happens it moves to them the iteration moves to the third row now here what we have is that for a we have three for B we have five but for C we don't have anything so it directly moves to the next row which is six right so it fills in the six here if we are not using miss over okay in case of no miss over it fills this situation right it moves these values to the final file and then again it fills the missing value in case here and we were here at the six right the input buffer got so the pointer moves to the next row then it gives these six and seven it goes back to the next row fills seven eight nine and it works for the next iteration in the same way but what happens in case let me just run it for other rows as well it then in phase it then it moves to the next row it fills the values like nine zero then one from the next row and it fills these values into the final data set so this way it creates the complete data set without miss over and we get the value for each variable but what happens in case of miss over if we use miss over with the data lines so what will happen in case of miss over in first case let me create like this in first case it will full of fill the three values like 1 2 & 5 it will move them here and play the value move to the next row it will fill 2 5 6 input buffer gets filled in it shifts the values to the final file then again it empties is the complete cycle which you know gets follows in every time now it moves to the third row that 3 5 it gets over there so it will insert 3 and then 5 we are using miss over so it will skip it will accept missing value because it does not have in a third value for the third variable in third row so it will pick these values except these values move to the next row empty them again missing missing missing and then capture the third row next row that is 6 7 8 it will fill in 6 7 8 in it gets completed it will move through the data set empty them move to the next pointer 7 8 9 7 8 9 follow the same process same cycle moves the values to the data set empty them here and move to the next row and again it will encounter only 9 and then 0 nothing is there for the third variable so it will delete that as missing shift the values in the data set and go back to the input buffer then missing missing missing relieves the memory and then it will go to the last row and it will find only 1 and the 5 and then missing ready for the third variable and then it will shoot this and then finally act as a frolic there is no record in the data set the input buffer will get released and it will not hold any value so money you know following with mr. world and without miss over these two type of betas had got created now if you go back to the SAS data sets and the results which we had you can compare they would have similar kind of result you can see here is the first without miss over you have one two five two five six three five six seven eight nine and nine zero one and this record set got created or the result got created when we were using missing value or miss over so you can see that we have three misses well missing values and rest the values which we have here in the data are exactly the same we planned for so this is how the data set got created by using miss over and without miss over but following exactly the PD V and the input buffer that how SAS locates the temporary input buffer and how it holds the values to pass through the final data set now what happens in the back end we have already seen that what happened but let's go through the slide that when reading this reading from a raw data file when I am reading from the SAS data set so this way there is small difference right reading the data file says data's file would be easier for SAS because it need not to create the descriptive portion again but for when reading the data raw data file says need to create the descriptive portion again okay so there is you know the the major difference which is there between these two processes now consider two example the raw data file and the SAS data set we have similar kind of function or the program written here let's move to the next file when wind when reading from the raw data file the following is done in the backend by this statements so just read that what what is happening in case of compilation phase later checks now first of all we are doing it from the raw data file right she checks the syntax in the comparison phase creates the new variable defines the length of new variable creates the input buffer creates a PDB and run creates a description of the data set or output dataset so this is what has all happens in compilation phase and then in execution data creates the data set in file locate the path input it moves the raw data file to the input buffer from it moves through the PDB and then burn implicitly output implicit to return okay so from input buffer it creates PDV and then from PDV it creates actually that's it so this is what all happen in case of raw data file but when we process the same with help of from my existing data set what is going to happen let's see that so when reading from a SAS data set the following is done in the backend by these statements the data checks the syntax right locates the input data set creates PDV creates keep and drop slacks or drop create runs the descriptive portion of the output data set so this is exactly what happened in case of raw data file but in case of you know SAS data set what it happens that data initializes the PDB sets the data transfer from input dataset to the PDB keeps rock and run to these flags and it automatically you know execute the program and create the final data set so there is that I would say if I talk about the optimization view or the Stein saving then definitely creating a data set from existing data set is easier lists resources are used and it's easily created right in less time now question when cisely eats from an excel file and creates an output SAS data set what routes does it follow in the backend reading the data from raw data file evading the SAS data set a completely different method what do you think your answer would be depend on the understanding what we have done but let me just tell you it follows the first one that reading data from the raw data file but again you need to think that why it is happening and why I said that is the raw data file not the sales dataset or a complete different method okay let's move another question to leave the raw data file containing standard values into cells dataset we use the following syntax we have already done it and several times and we have created several data sets data sets by using such a example or the raw data or the day or data lines okay in this case the code is the following this is the same code which I have you know we saw in the previous slide and they created this data set okay let's slightly talk about the delimiter that what a delimiter is that a space blank is a default delimiter in SAS so you may remember that I was using just any space between these numbers so that is default delimiter I can use comma I can use : I can use any specific a special character and I can specify that as delimiter but space is a default delimiter in case of SAS so a space blank is a default delimiter the dalm option can be added in the infant statement to specify an alternate delimiter ok the L M is nothing but the delimiter if apart from space you are using any other delimiter comma colon or any other number you can specify that so the example is given here that in file path and file name and then the LM is equal to in quotes you can give the delimiter like they have given comma here so this is exactly you can you know give the data like this okay list input then this is the another way of inputting the data into side from the raw data file then the method for inputting text data into vigeous saw is called list input and the variable n values are input in the list and they are all standard value they call that all standard values are values that SAS reads directly without any special instructions we have done it multiple times so there is nothing new for you by default the length of character variable would be it want to be noted that by default the length of character variable would be 8 bytes in case we want to accommodate bigger values we need to explicitly define the length of variables and we talked it about in the format that we can specify the length whatever you want but by default SAS takes it as 8 right now the syntax here is given like health sorry length wherever one and then length one so length one if you do not specify anything it would automatically be eight and if you are going to specify like 15 here dollar and then 15 reads gonna accommodate 15 byte space in the system ok a vertex finalized the following dates in this format like mmddyy format how can you read this data so I have already covered this in the previous sessions you need to think about it how and what in format or format you are going to use to import this data let's this is the PDV I have we have already talked about let me show you that how you can see the the PDB basically how is that works so I just need to give here as put underscore call underscore let's see whether it's working or not you you know I'll tell you that how exactly it works but right now it's not working let me go back to the previous example here this one I just need to use this here okay here you can see this is nothing but the PDE basically right it's it tells you the values for ABC all three variables and with that you have got three system defined variables like underscore and underscore and then underscore and underscore underscore and underscore is nothing but the observation or the iteration number so we had five records we have got six iterations right in case of error and the school until unless it is zero the because it is for each row or the observation through it has only two values 0 or 1 if it is underscore error underscore is 0 it means there is no errors if it well its value is 1 it means there is a error in that particular record its value cannot go more than 1 okay it will always be 0 or 1 1 indicates that there is another 0 indicates that there is no error now rest that you have all missing values in in + PDV so as I have already told you that then we use input buffer + PDV so what basically happens that you by default you get missing values in the PDB because it gives a reset after every iteration to the missing value and this is exactly what you are getting in the PDB right so in order to see the PVV nothing but you need to use put underscore all underscores in your program and you will exactly get the PDB printed into the long window so put this for to print into the log window and all underscore all is for all the variables which are involved in the processing so here we have you know all the variables these are the two system defined variables error underscore error and an underscore n underscore and these are the three variables which were there in my data okay so 5 observations 6 iterations and 0 errors 3 variables this is how my day doesn't got created now let's move to the next slide okay let's let's go through the slide once more than the previous temporary file is stored in the memory of a computer it is like a transistor whenever SAS data sets are created the pdb works in the backend the content of a period changes in every iteration and they can be viewed by entering the code what is that code let's put underscore all underscore the content of a PDB when using put underscore all underscore are viewed in a log window after the data set is created the PDB is deleted automatically right so when the dataset is created first of all it the values are set to default that is missing and then the PDB is deleted automatically so this is what when you know create we created data set from the raw data file or the existing data set let's see the another case is creating SAS data sets from existing SAS data set we have already seen that what functionality it uses in the back end but let's see again both the processes would be involved whether it is a compilation phase or execution phase compilation is a very good time used whenever we create a new program and then execution this is what we have already seen okay so in the compilation phase checks step one and step two you have you can you know go through and check it out okay now comparison is over we move to the execution so basically if I talk about here you have you know the for observation and that is why they have given the compilation is step four for each variable or the for each observation basically similarly the compilation we have already found in detail that how it's gonna look like and what values you are going to see in the input buffer and PDV if you want you can you know go step by step so let me go back to the log window once again values you are looking at the slide right now it's nothing but these values right how the values are being set to missing and how in the error and underscore variables are getting so you can see that automatic variable underscore error underscore an underscore n underscore are dropped from down to dataset though those are not moved to the data set the final data set which we create since end of file is not raised iteration three next completed the same way okay very steps we have already followed these okay so in all you know these data set or a while creating various data sets we have already covered the following scenarios like creating data set from existing SAS data set copying entire data set changing order of variable or key drops similarly the combination of length and key Brock will you will work with the length first creating the variable and keep and drop being applied all right so the major task of today what we covered was to see that how input buffer PDV are working and then we saw that how miss over is being used miss over is also a very important command similarly we have other like similarly miss over we have other commands as well so we will be looking and then as well in later on in later sessions but we saw that how to create you know dataset from existing data set and then we saw that how to create data set from raw file we are going to cover today at the Keith and drop a variable statement then we'll be covering labels and after that if time permits will be covering formats so just to let you know that in our previous call in our previous session we covered a lot of statements to subset and read datasets how to create data sets and we use set statement and copy proc I told in the last session that we'll be covering level and formats but due to you know the tall time constraint we were not able to cover them in you know third and fourth chapter so we'll be doing label and formats today but before that I would like to tell you that how to use key and drop variables before I tell you that how to use them it's very necessary to tell you that what exactly these things are right so keep and drop variables and where we use it so it's really important to know that we're to use these keep and drop variable so let's move to the slide it says that drop statement specifies the name of variable to omit from output data set right so before I read anything here let me show you that what and how we use variables okay so don't get confused with the term variable variable is nothing but the variable which we call columns in SAS or basically the columns which we call variable in SAS so you must be you must remember what I have keep telling that the term which we use column in other databases or in other systems here we use variable for them right we call variable and four rows in other systems we call them here observation all right so I am talking about the variable it means the columns in SAS now let me show you some examples first so that I can you know connect with the topic which we are going to cover today and it would be really I mean it would be helpful for you to you know understand that what exactly I'm talking about so let me just show you a pro print I'll be writing data and let me see that whether I have my datasets which I created yesterday in SAS satya library yes I have a lot of datasets here so I will mention Satya dot and then I will say let us say subset three okay so subset three semicolon and then I would run so as if not like I'm going to run this statement you can see that I will be getting all the variables available in my dataset so subset three I guess which which we created out of our sales from SAS help library and we had you know a lot of variables or columns in this data sets so as soon as I have ran this I have got in all the variables from the data set whether it is year quarter product line product group quantity per category then profit and then total retail price so we have got all the variables here but in case like just a question that is suppose I don't want to see all the columns here I don't want to see all the variables here you want to see only some of the variables some of the columns let us see you you just want to see you know the product group and then the quantity just two variables what will you do it's a very good question and basically we are going to use vas keyword here okay so let us see that how I'm going to use very good so I'll go here into the SAS window okay and into the editor window now at the place of running or submitting the whole statement I will submit VAD and then I will give name of the column so let us say product underscore group and quantity right so I mentioned the name of two variables and I will terminate my statement so product group is the first column and another would be quantity so I'm going to submit these statements and we will see that what so you can see that I have got only two variables now in the output first one is product and another is quantity right if you want we can add a third column as well and I don't remember that how many columns were there let me you know scroll back to the previous result so we have you know let's say profit okay I'm going to add profit in my list I have I written out three variables and I'll run it so you can see I have got three variables an output first is product group second is quantity and third is profit right so this is the way how we have been using that key word there is nothing but variable that how many columns you want to see right so in similar way in the same way we can you know like while creating a data set we can use you know the disturb our keyword so let's say I told you that how to create data sets so first of all you need to write data then the library reference and then the name of your output data set so let's say I will give it let's say sub underscore I will say let the sales okay this is some let's just upset for okay this is the name which I have given to the output data set now set and then the input data set which I am going to take so I'll be taking input data set as the same SAS help dot o our sales okay and now I will use this statement so it means it's gonna create okay let me run this and I'll be getting some error as the vert is highlighted in red so how to overcome this challenge which I really want to do but says is not permitting me to do okay we can go back to the log window and you will see first of all you will get that statement is not valid or it is used for used out of proper order means something is wrong into this statement here so you might God have a gist that what exactly I wanted to do right I wanted to pass only three variables into the subset for data set right if you read the log in a proper way it would say that SAS data sets at therefore may become incomplete when this tape was top-10 observation and eight variables so if you'll go back to the you know library Sathya I have got this data set being created but if I click on it you will get inner on a pop-up so it's opening so I have got this message here that data set has zero observation but it will tell you the eight wherever which we got into the long window as well so that's particular but the thing which I really wanted to do by writing that code that I wanted only three variables to go into this dataset so now how to do that this is what we are going to cover in today's topic that is keep and drop okay so how we are going to use that and what is the meaning of keep and drop keywords into the statement it's now read into the slide okay so drop statement specifies the name of variables to omit from the output dataset okay so drop and then variable list the Revel list contains the variable you want to exclude separated by blanks and keep statement specifies the name of variables you want to write to output data set and keep variable list wherever list contains the variable you want to include separated by blanks so it's pretty easy and it's as easy as it sounds drop if you want to exclude keep if you want to include into the output data set right drop if you want to exclude variables separated by blanks keep which you want to include into the output data set write further I'll be talking about that we can use these keywords with both input and output but if we are using it with output data set then it will have exact meaning as we are reading right now that if you you are using dropped with output data set it means you are excluding if you are using key with output data set it means you are including but opposite thing happens when you use the same keyword with input data set so for example if you are using you know drop columns with input data set so it's gonna drop those variables and include rest of them and keep keep will include those columns so again I mean it is the same thing it is just what one of the smaller confusion that whether we use with input or output it's gonna do the same thing right but it would be very clear once we will take some examples right so let's move to the next slide it says that example let's consider an input data set with the following variables it says ear movie name 0 his growing villain Janner and box office if you use the following code that drop your movie name hero drop sorry drop ear movie name hero heroine and then output dataset will have the following variables villain Jenner box-office if you use the following code Keith's movie and these all these things then output dataset will have these followings so let's do the same thing with our example so right now what I have said that data set so let's say that I'll write keep okay keep these three columns now let's see that what's gonna happen in our output data set so I will write again proc print and then run I'm not giving the name of data set it will print the latest created data set for me I've used keep now you can see that my data set have three variables okay but before that let me go to the log window so that it would be clear that it does have created so now you can see that there were nine one to nine hundred twelve observation rate from the data sets a cell daughter cells and the data sets of Tia dot subset for has nine hundred and twelve observation and three variables so you got to know that how what exactly we did we use key and these three columns right now I'll be doing the just opposite of this I'll copy the same code go below and then I will write drop okay so now it should behave just opposite of that whatever we did right away right and this the example we used keep and then product quantity and profit select ordered group quantity and profit so we got these three things into the resultant data set but now I'm using drop this means what I'm going to do is I will drop these three variables and I will get rest of the variables available in the oh our sales data set presenting SAS help library so now you can see that I have got one two three four and five variables here and each of the variable which I mentioned in my code that product group quantity and profit those got dropped from the list all right so it would be very easy now that how to use them now before I move ahead just let me tell you that right now the key and drop statement which we have used we have used them as a statement there is one thing thing in SAS called options I have not told you the describe that what an option is when I will be telling that water an option is I will tell you that how to use keep and drop with options how to use them okay so I will be telling you that later but right now that we have used keep and drop as a statement in SAS program all right so let's move to the next slide now type the following code and see how to keep and drop statements differ so again they have you know used the same example which we have just done so there is no need to do it again because we have already covered all these examples so I'll be moving to the next slide now that's assignment it says create an input data set called cards as a replica of SAS help dot cards subset only those observation that have the make even by Adi BMW Chevrolet Ford Honda Hindi mercy's Benz Suzuki Toyota and Volvo within that filter to keep only the vehicle drive crane as front-wheel drive keep only these variables make model type engine the end data set must have magnified observation and four variables alone so let us do it together so that you will understand that exactly what should we do and how we are going to do it in a SAS program so we will have to write a big SAS program now so in order to do that let me just first of all check whether we have this data set in our that has held dot cards or not so I'll say data is equal to SAS help dot cards and I'll run this statement it will tell me that whether I have this data set available or not because it depends whether I mean it depends on the time when you would be installing SAS into your system so it is possible that you may not have this data set available on your system all right so I have just done this and I have got the output on the output report viewer window so it means I have this card's data set available SAS health library now let's go to the problem the problem is that to create an input data set create an input data set cards as a replica SAS help dot cars so I will be writing a data step first of all and create that into Satya and I will say cars you can see that I already have a cards in Satya a library but what I'm going to do is let me create a cards to okay and then I'll Pat I'll give the set statement it says let's subset only those observation where make given by this okay so first of all I need to write the same statement which is set SAS help dot cards and run so this is going to create me the exact replica of SAS hello dot cars data set into Satya library with cards to name but I don't have to do it there is a proper condition what I need to follow is that I need to create subset of only those observations where make is given by these ok so I will have to mention these things into a condition where make and I guess you must be remember it you must have remembered it that we used in that how to use it in first thing is aadi second thing is BMW so B and W third thing is Chevrolet so it's gonna be Chevrolet right then we have for then we have Honda then we have Mercy's bench then we have Suzuki then we have Toyota and the last but not the least we have Volvo okay with this what else they want within that filter to keep only the vehicle driven trained as front-wheel drive okay with this they want to keep only those vehicles where drivetrain as front-wheel drive so I will have to write next condition and drivetrain I guess drivetrain is one of the variables here is equal to front-wheel drive I will have to guide this in exact case drive okay now my statement ends here but they still want something that to keep only these variables make model type and engine size so I will have to write the keep statement as well keep I will have to give these variables like make space model space type space engine underscore size and then I will terminate my statement so I put all the conditions here as well as they were required in der into the code now I will run this statement and let's see what happens oh it ran and very fast okay now as pulled ah a result it says the end data set must haves 95 observation and four variables so let me go to the log window first and then I will check that what happened here it said that there was zero observation read from the data set says held our cars where zero oblique district and obviously Falls where Clause okay so what it said that there was some problem with the code which I have written now what could be wrong with this that I need to check okay okay so the variable engine size in drop keep or rename list was never been referenced so engine size there was a problem with engine size what I have written here engine size is the spelling correct e/m gin1 discourse size it says that some problem is something is wrong with this engine size I will have to go back into the SAS help dataset or data and I will have to check ok engine size is not with underscore it says you can see that here so I will have to you know change is that into my code that it's not with underscore so see these are the things which you need to be sure oh I have just deleted underscore and now let's see that whether I am able to run it again it gave me some processor mirror that they were 0 observation read from the data said this ok now earlier it gave me warning that the variable engine size in this list I have never been referenced but this time it's telling me that the data sets at theack r2 has zero observation but what is the reason they were zero observation read from the data set okay I might be doing something else wrong so let's see what I am doing wrong now okay so front drive train front and what is the condition here it says makers drive tain as front wheel drive okay so I have front rear all so I guess this is exactly what I'm doing wrong I need to select the front value only okay and I don't think that there is anything else because I cannot find any other value and do it apart from this all front and rear so at the place of writing into my code because it might be possible that SAS is not finding this complete value front wheel drive so I'll remove each and every thing I'll just keep front and let's see now what it gives I have just ran it let's go back to the log and it says that there were 76 observation read from the data sets a Cell Dot cards the data sets at thearc r2 has 76 observation and pool variables this is exactly okay might be the data is different here or it is also possible that the value which I have given for this Ford Honda Hyundai must his bench are not in the correct order so that is also I mean one of the possibility that it did not show me all the you know fields so you can see that I have or D I have Chevrolet type seed an SUV I have not put any filter on type but in order to see several a Ford Honda Suzuki Toyota I don't have much this bench here okay so maybe that I'm missing the records for this musti's bench because maybe I have not put them in a correct order or the text might have been wrong so see you might see that what are challenges I'm facing right now by working with the real data you will also face such challenges or similar kind of challenges because until unless if you are not writing anything in a correct order or in a correct key since us in inverted code codes you gonna get error or the challenges with the data mining what you are doing so in order to pull up the correct records or accurate data you need to be sure that what you type in all right now we have just in this example we have this written similar kind of query now what one changes you can see that I have written keep statement before the where condition but in the given result they have used keep statement after the where condition so it's up to you that where you want to write it so that's all ok guys a quick info if you are looking for an end-to-end training in SAS for data science we are in the LeapPad provide you that course and you can take those details in the description ok guys we've come to the end of this session I hope this session was helpful and informative for you if you have any queries regarding this session please leave a comment below and be allowed to help you out thank you
Info
Channel: Intellipaat
Views: 29,697
Rating: 4.8583331 out of 5
Keywords: sas training, sas programming training, sas tutorial, sas course, what is sas programming, sas certification, sas programming videos, sas programming structure, sas as fourth generation language, sas programming for beginners, data analytics, why sas, sas framework, sas programming concepts, proc data step sas, sas format and informat, what is sas analytics, sas intellipaat
Id: xnnlxTAimOQ
Channel Id: undefined
Length: 238min 39sec (14319 seconds)
Published: Tue Jul 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.