SSIS Tutorial Part 61-Lookup Transformation Modes Full Cache, Partial Cache & No Cache Demo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to tech brothers tsunami today we are going to learn about lookup transformation and specially the modes in look upper so we will learn full cache a mode partial cache a mode and no caching mode what we will do will create a simple SSIS package in which you will read the data from a flat file and then we will get the data in the lookup from a sequel server table and we will find out if there is only matching record then we want to load that record and if there is no matching rule redirector to the no match output is very simple package so we will be comparing the data from the source to the lookup reference data set next we are going to enable a sequel server profiler so we can see that how many times the SSIS packets and the query to read the data from the sequel server when we select different modes of lookup as we will be reading the data from sequel server table in the lookup of transformation next we will experiment and repeat our execution for full cache a partial cache a and more cache a and then observe the results and understand these modes in detail now let's go and create SSIS package but before we do that let me show you the source file we are going to use so we have customer ID customer name street and city and region so here what you see on a customer name we want to look up against a table that we have in the database so we want to say if this incoming record from the file does match with the sequel server table column value that's a customer name column then just load that record or let it go to the matching output and if that does not match you send to the no matter we are not selecting any column from here or the goal is just to make sure the values does exist in the reference table or reference data set before we load the record from the source so now we need to go to SS DT sequel server data tools or bids and creator SSIS package so we are in SSIS packages a tab new SSIS package bring the data flow task here as we will be reading the data from the flat file source go to the sources and here we have flat file source we are going to make a connection for the flat file browse select the file now column name is in the first data row that's correct good columns all the common columns are command correct go to advanced chain the data type of these columns so for customer ID we have a it should be integer for a customer name bar car 50 is fine Street Parker 50 is fine City Bar cut 50 is fine state is fine and region code is only 2 correctors who are chained a 1 now we go to preview everything looks good retain null values from the source as the regional values in the data flow that means if we are getting some blank values in source and we want to convert them to another in the flat file output we can use this option hit columns we are good now let's bring the lookup transformation so we go to common transformations bring the lookup transformation and open it by double clicking and then here we have full cache a partial cachet and no cache a so we will experiment with this part of a lookup for now selected redirect rose to no match when we are handling the rows as if there would be matching record we want to send them to the matching and if there would be no matching then we will be sending to the no matching output go to connection make a new connection and here we have the connection I'm going to read this one create a new one so you can see what we have to provide we will be providing sequel server name or instance name and then we have to provide the database name and the schecter connection is what can find it ok here you can select the entire table or you can write a query I always recommend writing a query because you do not need all the columns from the table so in this case we need only the customer name so we copy paste our query paste it here and provide column name but we need so it is a customer name review the data that has come in correct and you can see that there is a duplicate data so we have omelet three times here and there is only one M rasa so let's see it okay or close go to columns now we'll be mapping the incoming go from column and that is our source columns coming from the flat file today lookup reference data that's our customer name so if you see here if I try to map these it will throw me error cannot map the input column customer name to do look up a column customer name because the data type does not match so what I have here in the table actually these are Anwar card I have defined them when I was creating a table and you can go ahead and take a look up so you go to the customer to table let me refresh the table list and here we have customer - if we go columns and then we have customer name and worker 6 so that's what it is now what we can do we can convert that to the N worker in the our query here in the lookup we can say cast as and worker and we can say 6 as customer name and go back to the columns and no map it it should be working just fine and if you have watch my movie s-video sign movie if you have liked my video regarding the warnings and this this is the place it should place their send a warning say okay customer name is coming as a worker fifty and you are trying to put into your match to the worker or six or n worker six but we would not see any visible warning but as it can work it is expectation like okay you are expecting the datatype you have defined is a worker fifty but you will not be getting more than six character for your customer name so if there are values come in they will be going to no match anyways so here we go to last option we have full cashier and we have everything set what we are going to do now we bring your to multicast here and then we are going to join our outputs so output you know a match output will be going to do one multicast sorry multicast not to look up why I'm using multicast because I can put data pure or I can just use them as a dummy here and I don't have to write to this equal table otherwise I have to create a sequel table for the entire load so I really don't need to load the data anywhere I just want to see the lookup functionality so I don't care about that you can put in sequel server table or you can write to the file whatever you like now as we our lookup isn't full of cash a mode what does that mean to us when we have a lookup in the full cashier mode it would read all the records from the source table and then put into the memory before even start processing or matching from the source Rosa so this is what is going to happen let's go back up quickly and enable profiler go to tools sequel server profiler and then you hit connector go to the temple itself I will choose the tuning and then here in the even selection you will see the RPC completed our pieces statement and batch completed and here you have column filters we are only interested to get the queries for the database called tester because that's what we are using they look up so I have put the filter tester database name is equal to test and hit OK now we run it if you see that there isn't any query I'm running in SSMS over even in the visual studio or beta or SSD T so there isn't any anything coming right now now we can go ahead and run our exercise package and see what happened I'm taking it very slow and telling you all the details I want to spend time on this video it is very important and now you see that the lookup transformation has processed the four rows even they were some duplicate records it has processed all four of them and it has a it is telling us ok it has a cache to total of 4 rows and the number of unique row is added to the cache is 2 so it is processing all of them better in the cache a there are only two unique rows or values are there now we can go back here on the sequel side and see profiler and if we look at here what we see so SP statement completed and that's called selector this customer name from this table customer - so that's the statement we see here that's the only statement is completed so it will the statement first and then it completed here so we do not see multiple statements if we are coming here these are the statements is just starting and all that but it is only one statement at the end that's a completed so it's not passing the values or anything so just it is running one statement and getting all the records 2d cachet so if you go back here let me see if it tells us number of times it try to connect it to the database so anyways it's connected only one time and ran that statement and got all those results for us in the memory and we can use that against the flat file sewer so now what we are going to do we are going to run for the partial cachet so right now how many rows we have we have 15 rows here and let me go back and you when you will run for partial cachet you are going to access the database multiple times so that depends per number of rows we have in the source and at the hall uniquely they are so first of all let's say if we have the same record here as we are selecting up-armored we are going to query our database for our MERIS wom this is the only column we are querying gates we in the lookup we have only the customer name so what is happening here each time do we get the value from the source it go to the database once it's good at database and then it will bring the matching record in the memory and then it keep that rakin in the memory and expect that next room might match with this already cached value if it would not match then it go back to the database and bring a new value so that's how it will keep building the cache it and bring in the new records in the cache it and we will be using against out the source first and if there is no value available it will go back to the database err so that's why it is going to work out so if I show you here you will take a look for our marriage can't go grab the record and the next time is not going to go to the sequel server table and it will just find the value in the cachet so it is it will have okay for next summer's will not go to database because the value is already existing there and then when M rasa will come it will go to the database and bring that value if there is a value existing so in our case there is a value resisting for M rasa and next what will happen then next time Armour come even after three roses so still the value is in cachet so it will use that and then again it will keep using for this one now let's do this experiment and take a look close this one close the file change to the partial cachet it okay let's run this one and you will also see information in the output as well so if you see here it is sent total number of data rows processed is nine so we are reading nine rows from our flat file and let me show you here varying from good information that I wanted to show you it is seen the lookup has performed the following operation process eight rows okay because we are getting eight rows from here and then what is happening issued to database commands only why to the reference database why it is issue only to sequel commands because there are only two unique values here so when we have armor armor armor and we have M rasa so it does not need to go to the database every time because the values are already existing if I was change this one today Robert let me show you pour in profiler and then we will change it up next part it is sin and perform the six lookups against the cache it so as the values were in cash a so and it not need to go to the database again and again it performed to look up against the already cached value so first time it cached armor it went to the database and then cached this value especially this column you know what we are selecting from the lookup table here in the lookup we have select and only customer name so it found that and then what happened it cached that it kept up getting the value for this one and this record and this record and this record from the cashier next time it went for Raja to database so there are two queries we sent to database one for me and one for Raza and then these will cache it as the values were found in the database and the next time any value came from the source it was able to find a match in the cache a so it didn't go to database again and again okay let's go back here and take a look on this query and let me show you here if you see here this one is a good armor and that's what it is going to the database and next is completed and it's it is sent as a rate of P one parameter so it made a parameter because we are selecting the customer name from the table and we have we are matching on the customer name so it took as a reference and then the increment value or source value came as armor so it replaced that value here and then to run that query against the database and next when it find up the value for increment value coming for Raza it ran this statement against the database ur so that's what it did it ran only two statements against the database and we can see here in the query as well that's the that's intelligent but every time you have different recorder coming from the source file that has to go to the cache a if it is sorry it has to run are you against the sequel server if the the value is already not captured now let me change the file little bit and show you what happened now if I change its value to the Robert now when we run this one and maybe change this one as well to Raza instead of M Raza now we have four values here we have armored that's the kind of duplicate we have M Raza that's duplicate we have Raza and Robert and learn if we run this one in a partial cached it is going to run for four times as there are four unique Rosa so if you see here it should for database the sequel statements to the database and it has passes eight but also it has performed for lookup using cash as it catches the the first record parameter then next it was able to use that value for these two rows and the same thing happened first it took cash for Raza and then it was able to use the M Raza when the second of source record came it was able to use from the cache it so that's what we learned here with the no cache a if there are values it will go try to find out in the database if it will find it it will be put in cash a and then it will look use that for next incoming source records from the cache it if there is no value existing for that one it will go back so let's take a look on this one let me so for Robert you can see it went back and get that for Raza M reside went back and get that and then for Raza it went back and get it and for armour it went back and get it now we don't have any robert here so you can don't see the robert here you don't see em Raza here as well and if I will go ahead and make a change here I will have another Robert here so what do you think as there is no matching for Robert it will query twice to the database it will not put this there is no Roberto for the lookup table so it can't put in Turkish a so each time value coming from the source that does not have matching the value in the reference direct has to go to the sequel server or any other will a DB or relation database you are using in the blue copper so let's go run this one and now we have to Roberts at the end it will not be able to use those the values so you see here issue five database commands and if you go back in the profiler we see it too queried for Robert it did query for Robert and didn't query for M Raza it did query for Raza and it query for Armour so you see that v query is a trainer with different parameters but the last ones are for Robert it was not able to catch because there is no value in the lookup table that it can cache it for Robert so it has to go every time to the table and try to find and if there isn't any value it will be going to do no matter so that's how it will be building the cached for matching records and if available it will use from cache a if not it will go to the database every time next what we are going to do we are going to use a no catch it when we say no cache a each record would we get that will go to the database and try to find the matching record so it is not using a the memory in the SSIS as we have seen in the full cache a it put everything in the memory and use it and then in partial cache it kept building the cashier with the matching record and use it in no cache a it will go to the sequel server for each of the record but there is a one difference it will go to the sequel server but if there are values right after each other it will be able to use the lost value so here if you see are made three times coming here it is not going to go three times to the sequel server to adjust to go for the first one and match I'm a charmer on the Duke up reference data from the sequel server table and bring that record and next time when it has the armor it will use that value but if the values are coming right after each other it will keep only one value or the row for the cash it will not keep building it so if Raza comes here am Raza and all and at the end you have Ahmed it has to go back to the database and use it so let's go and take a look now run our SSI is packaged with no cachet and here you will not be seen whole lot information like the way a number of time is went to database and all that I don't see that here I was trying to find out ur I don't think so missing anything but in a partial cachet at least it give you more information how many times it went to the database and how many values it use from the cashier but here I don't see that much now we have to come back to the sequel server profiler and start from here and say let me find out the very first value used so here you can see that arm it was the first value it used and then next that it was just a building a query it completed and statement this is a statement it was placed with dead the rate Ahmed next you see that it did not run again for armor it went for Raza so that's off if you see our file I told you if it will try to match for Ahmed it find a match and then it is going to keep only one value or rec row in the memory so it was keeping Ahmed it was keep it so it used on next time I find Armour it's used it next time it find armory to use that value from the cache a and then it came to the Raza because that was different now we have two M rows as so you will see there is only one where it made to the sequel server let's go back and take a look here you will see only em Raza and then in statement and now you will see armor again so it did not use the cachet for armor again because the last row it was saved it had saved was em Raza and then armor came after after that so sorry this is different file so here so right here so the last value was em Raza and when armor came again it did not use that if armor would be here on right after another these three values it will not go to the database so that's what I'm trying to tell if it will find the last matching record and in the next matching you have the same values then it is going to use the previous whatever it is available in the cashier so it is keeping one value or the row in the memory and if the value does change for the query and it does it has to go to sequel server and get that value again so it is the kind of every time I will say it is very expensive process every time it does go back to the sequel server and get that value so if you see Robert here it is going to do in sequel server table and getting the value now let's do the last experimenter as I was telling you I I here if we have armor here and then again arm up here right after Robert you will see that's what I'm trying to tell you like if one value is different and that's how the data that he'll you will have it you will not have hundreds of thousands of the same values right after each other so there is if you are using a no cash a you have back you have chances like you for each of the row you are going back to the sequel server and running that query so this is running pretty much for each of the row you are going to have from the source and go back to the sequel server and run it it might be good idea when you have a very smaller data set coming from the source maybe 100 200 records and you have 20 30 million maybe 40 million records and you don't want to put all the records in the memory and you or use the full cachet because you have very small data set coming up from the source and that it is okay to run maybe 50 or 10 or 20 endo sequel statement directly on the 20 million or 40 million records instead of bringing in 40 million records in the memory and by using full cache and then use that so it depends I will have another video on on where I will explain which your cache a transformation we should be using but coming from coming from the detail you have learned in this video you can make decision by yourself as well you understand how exactly these modes work up so let me do one more thing last time so for these three okay it will go run very first and get the matching record so there is a matching record and keep that and use them from cashier for these two records for M rasa it will go okay and then sorry a for other it will go next it will go for first M rasa and find the matching record and then use for the next record here and the finally next it will go for ma it will not have a matching record because last record it has m rasa in the memory so it will go to database to get the value and then of a Robert it will go and then again it for honor it has to go because the last value is Robert that we have saved or queried against databases so let's go back and with that we are going to end our video start the package let's come back to the profiler and now we are going to take a look up so right now what we see here the very first is running from here and that's correct next time it is the just it find the value in the cache so it is going further razza so next if we see that it is 1 for aza and next if right after our site is 1 for M reside find the value as we have the value in the table so it is only running one as there are two right after there is M R as R right after the first one so it is it would run only one statement and then it has to run three statements because each value is different so if you see here it is running for armour and then it is running for Robert and that's how it will run I think so I did not save this value here that's the one reason you can see if I'm saving this value now we saved it the last one I was expecting to be working from let me see we have Robert and then we if we query Armour because armor is the last one after Robert so run this one now it should go for armor at the end because the last value is Robert okay one one thing I missed it here if you guys see the loss is the Robert so in Y because there is no matching record for Robert so it was not able to over write up in the memory for Robert so it is still keeping the last record armor so that's why it did not go for the second time even there is a 1 record between them but there is no matching record for this record in the memory in the lookup table so it could not find it so it kept the armor and then that's why put last it did not go back to the database so if we will change this one to M razza let's say it find that now it will go for sure because it has it has found the value and now for this armor it will be overwritten by this and then it has to go to get the value from database so let's run this one one more time I don't know somehow I like experimenting in front of you guys and something show you all those outcomes even I never came you know when I do demo by myself and learn it and say ok it is how it should work and then when you are doing on the video sometime you have different scenarios so here you see almond here you will see rasa and here you will see I made again so if you want to take from the top again this is the first time it and then you have rasa open the file sorry this is really slow video that's what I'm makin but I want to take down so first are made then Raza then M Raza here and then it has armored from here this one and then next matching it will find forum rasa that will be overwritten in the cachet so for last summer it has to go to the database er so if you go here for M rasa and the last one it ran for army so if it will find a matching record it will bring and put in memory and then use that for the next record if it will not find a matching record for the next one it will kept that matched value till up you know it will find the next matching and it will use that matching value in d memory or cache it so thanks very much for watching this video and I hope you learned something out of this and you can make use of these modes and you can understand how exactly the work and how beneficial they can be or how expensive they can be depending on the resources you have like you have for for fuller cachet you have tons of memory on your system you never use full cache you always use no cache or partial catch it so you can maybe make use of a full cache a when you have a lot of memory and put every record in the memory and start using data and when it comes to no cache or partial cashier when you have small datasets and instead of bringing millions of record in the memory maybe it is okay to go back and you know run those queries against your database and bring the results back so that totally depend upon your data so you study your data and by having the knowledge of these modes select the best that works for you thanks once again and I will see you next video
Info
Channel: TechBrothersIT
Views: 34,643
Rating: undefined out of 5
Keywords: Lookup Transformation in SSIS Package, SSIS Package, SQL Server Integration Services Interview Questions and Answers, Cache Transformation in SSIS, SQL Server 2014, SSIS, Database, MSBI, Business Intelligence, Business Intelligence Tools, ETL Tools, SQL Server development, SSIS 2014, SSIS Tutorial for beginners, Tech Brothers, Real Time SSIS Scenarios and Solution, Full Cache Mode Lookup, Partical CAche Mode SSIS Lookup, No Cache SSIS Lookup Modes, Lookup Table
Id: Lo8f45O26Q4
Channel Id: undefined
Length: 33min 32sec (2012 seconds)
Published: Mon May 25 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.