How to setup a Netapp SAN (Part 1 Introduction to SAN's)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this very first video in a collection of videos about sands in this we're going to take you from beginning to end on sizing your sand understanding the fundamentals to actually installing a net app in the first few videos we're going to cover mainly theory just to understand what a sand is and its advantages over local storage and then as we move through the videos we will get into practical hands-on so to start with what is a sand or sand stands for storage area network and it's a piece of hardware that sits on your network and communicates to your servers to allow a single location or for storage sands are generally classes enterprise level hardware as you'll see from these first set of slides they haven't built in a lot of specialist software and redundancy for mission-critical kind of application there a lot of manufacturers out there as this slide shows we've got Dell which are famous for their a.m. for their equal logics we have NetApp for their filers or used to be called toasters there's some new ones that are coming up on the market such as nimble and you've got your emcs now these vendors all have their advantages and disadvantages as well as price points and within their range of sans they'll have different levels this videos predominantly aimed at the SME market that's the kind of four to five hosts we've maybe 30 to 40 guest operating systems in a Wintle environment typically we're a few sir sequel servers a few file servers a couple of exchange servers maybe all with different workloads typically as we go up into much larger organizations then they'll go for the higher processing sounds and to be doubly blade type servers but very much when you're going into this field you will have specialists dedicated teams the reason why I put these videos together is I think it's important as an IT manager to make sure you've got a good solid understanding of your infrastructure now while most companies will still probably want to get out outside helping setting up their sound environment hopefully these videos will give you a good solid understanding so you can make sure whoever sets up whether it's in-house or external you're able to make sure you get the best out of your vendors and understand exactly what you want so here we have a typical demonstration of how a Tsang communicates to its environment now this slide shows a fiber channel that's in that the sand or disk array communicates with the hosts which could be a hypervisor type host so a hyper-v or VMware or it can connect to what we would call physical servers that have got an operating system stored directly on there now up until a few years ago fiber channel was the way forward and this typically required knowledge of fiber networking fiber switches which are different to typical Ethernet switches HBAs and knowledge that wasn't necessarily available to your average SMB and this pro hit me not only thus and expensive but the infrastructure required for it prohibitively expensive now in the last few years a new protocol called I scuzzy has emerged so what I scuzzy is effective is it encapsulates the scuzzy protocol into a standard IP packet and most of us will be familiar with scuzzy from the local storage this was before we had SAS and SATA but the concepts exactly the same but it's encapsulated in IP packet and actually uses standard switches and standard Ethernet ports to communicate between the hosts and the disk array now just be this makes the equipment and knowledge easier to transfer from your standard networking infrastructure into I scuzzy now that's not to say that there are some differences and things that we need to be aware of like standard networking we have to be understand the kind of bandwidth that's required we also need to think about segregation of data to make sure that our management and our data for our main network to our client machines is actually overloading the I scuzzy network so throughout this video we're going to concentrate predominately on I scuzzy because it is the cheapest and arguably the easiest protocol to set up so the standard set up as we can see here you've got of host servers here we've got our switching and we've got a disc array what you'll notice is that almost all the hosts will have diverse routes to switches and in the disc array we'll have a diverse route to both switches one of the key things when we set up this storage infrastructure is to make sure we've got resilience we're putting all our eggs in one basket so like I said why would we use a sand is it not too risky we're putting all our eggs in one basket that sand breaks or that switch that's serving that sound breaks and we lose everything at least when we got local storage on each of our servers we lose one server we lose a collection of guest operating systems so there has been a good compelling argument to put a sound in so there are advantages over local storage one of these ease administration it's far easier to get the overall view of your storage infrastructure from one management tool as opposed to having to go into each individual server and and review it individually when we talk about storage there's always two main aspects we've got capacity ie how much storage do we need but we've also got performance and these are not mutually exclusive for instance we may want huge amounts of data terabytes where we quite easily purchase four terabyte disks two three four terabyte disks to serve the amount of storage space we need however we also need to consider performance and we'll get into much more in depth in the next video about performance but ultimately a physical disk can only provide so much performance so often we need to have multiple disks in our disk array to be able to meet the performance requirements this means that often we may need more disks for performance than we need for storage or vice versa and if we are trying to calculate this for lots of different servers the workloads can change on those servers and unless we're very specific about what that server does like it is just an archive server or just a file server or just exchange server it's difficult to care loads so by using a storage array what we can do is tackle both of these kind of requirements we could have large number of disks to handle the performance side of things but we can also match the capacity of those disks so our overall capacity requirements will also be met enhance resilience might seem a little bit strange when we're putting all our eggs in one basket but if you think about it when we have our local storage zone and you so much resilience we can put in place if we're putting everything into one single storage array we can invest money and time into making sure absolutely everything has got a failure cut redundancy so we have redundant power supplies redundant controllers redundant cache redundant paths we can build this redundancy and we only have to do it once this way you get much more bang for your buck us up sands also have advanced software capabilities so above and beyond just serving data they are able to do complex caching makes the increases the performance so the caching algorithms can do read ahead so your accessify early the sand tries to work out what you're going to do Nick's so it copies it into cache and we can put great big pools of cache in one place so instead of having lots of little bit suppose a cache on individual local servers where sometimes it may be empty and sometimes it might be full we can pull all that together put large amount of cash in there and we can get the best use out of it there are two types of cache in sand you've got your volatile and non-volatile and that's for the different type of operations so when you're writing a San will cache the write into memory in non-volatile memory and it will send the request back to the server to say that disk that data has been written even before it's been written to disk and what it will do is it will build up these cache and into a sequential write and write flush that cache to this when the disk m is at a level that it can accept those rights so this way the sand can respond to peak loads by writing to cash and then flushing to disk as and where it can this has to be non volatile because if we suffer a power failure to our sand that information that's held in RAM needs to stay there otherwise we will lose data when the sand comes back up we also have volatile cache and this is the reading side so as I said the information that's read up can be held in cache in case it's going to be reread by another server or in some cases the algorithms can read ahead and predict what it is likely that you will be wanting next this can be cheap around because if the sand crashes or needs to be rebooted it doesn't matter that that information is lost because it's already on disk and we can re-enter into cache when it's accessed again in addition to this the software can do high-performance stuff like deduplication which reduces the amount of data it's physically stored on disk because the array has all the data for all our servers we get much more efficient deduplication and we'll cover this later and we get lots of management tools such as snapshots and replication and thin provisioning against that we'll cover later but meaning we don't need to carve up all our storage for our host machines in advance we can all allocate space almost on demand so yes it is riskier by putting everything into one place but we can build enough safeguards around this that that risk is certainly outweighed by all the great advantages we get so here's a demonstration of a particular problem we have when we got physical machines and we're storing everything locally when we're sizing our servers we have to take into account peak loads now if all email servers typically first thing in the morning 9 o'clock everyone turns up your exchange server gets hammered as everyone's downloading their emails from the night before and then throughout the day it plateaus off sequel server it could be sitting idle for 80% of the day and then you have a department that runs a lot of high intensity reports and you suddenly got peak performance so the problem is we have to size these individual machines to take account the peak performance of the peak requirements and therefore quite often we've got disks sitting there doing nothing for 20 30 40 percent of the day and then we are having to invest a lot of time and money to make sure that the server can serve the requests when that peak requirement does come and as we scale this out over too many servers and we look at those workloads throughout the day we'll see that quite often the CPU is very spike in idle for a few minutes in spite so the idea is if we can move all that into a single location then instead of having to have our systems sitting idle for 20 or 30 percent at the time we can take those diverse workloads and as long as we size our sand correctly we can get much more for Less disks because they are taking into account that sometimes one load is quiet and another lotus peak and obviously if we got 20 servers that have all got constant hundreds in peak loads putting that onto a single sound makes no sense at all so when we're starting to look at sizing outs and we need to make sure that we fully understand our workloads and what our peak and idle times are what our overall performance requirements will be when you see this demonstration here we've got two disks that are sitting idle and then later on and a we've got another two disks at a sitting idle and at some point we got for this sitting idle all you do is bring these all together and actually we can reduce our number of disks by two or indeed we could keep those disks and utilize them for something else so this is the back of a standard NetApp and what we'll show you is the music resilience as built in so we look here we have dual power supply in the event of a single power supply prior failure the other power supply can run the whole system now they're shown in orange orange means it's hot swappable so this means even in the event we do have a power failure we can simply pull out the old one and get a replacement one from NetApp if you've got the right support within four hours and pop that's a power supply back in and okay so you're running risky for four hours potentially but hopefully dual failure is highly unlikely this isn't unique to NetApp this is standard even within servers we run to your power supplies but all sound technologies will have dual power supply and then the second time resilience is jeweled controller now we're going to talk about the different types of your controller later but we have active active and passive active and the different types of controllers we that are available and we will go to into that more detail but ultimately what this the controller is the the brains of the system its processing the requests and requesting the information from the physical disks handling the caching it's where all the RAM is stored so we have dual controller means at the event one failing the other one can take over and serve those requests we have a very high level of redundancy on them in terms of the disks themselves we will use traditionally raid 6 or Internet at Brady P which is a type of raid 6 and this will give you redundancy in the event of two disk failures you will also set up Hotspur hot spares and most advanced sands what they will do is they'll use they will look for predictive failure and this as soon as a disc shows any kind of corruption at all before it fails it will actually copy the information from that disc onto a hot spare first it is far faster to copy the information from a disk that is in the process of failing then to have to rebuild a disk from a raid parity and what typically the NetApp will do is copy that information to a hot spare and then I'll put that disk that he thinks is failing into a maintenance mode and it will carry out lots of different checks on it and if it thinks it has is going to fail it will flood this disk has failed and we will sort that out with a new disk again generally room for hours and that new disk will then become the active hot spare and then not only do we have dual controller on each controller we have dual network as well and this will help us when we're looking at the whole topology about having multiple switches and multiple paths so that we can handle NIC failures on the sand or switch failures or a NIC failure on the server itself so as we spoke about before there's dual controller and there's two types of dual controller these active act and active passive active passive is really simple basically you have one controller that handles all the load on one controller that's identical that sits there idle until the first controller crashes or has a fort and the second controller takes over and handles 100% of the load this makes sizing really easy because you know in the event of a failure your other controller will be able to handle the load exactly the same and as provided your first one had no performance issues your takeover one will have no issues either the disadvantage of this of course is you've got all this expensive kashin and processing power sitting there doing nothing now the other type of dual controller is active active load and inactive active load load both processes are active and both of them are serving data now the advantage here of course is that you're getting as much bang for your buck in that you're getting double the amount of cash and double the amount of processing the problem being is that an event of a failure of one of the controllers the second controller will take over the duties of the first one now the problem is is if the first controller has been operating at 100% and the second controller is operating at 100% in the event of it having to take over control the controller obviously can't handle 200 percent and as such we'll introduce latency issues and processor issues now the two main parts of this is really CPU and bandwidth as we'll see later bandwidth really isn't an issue on sans the disks will always be far slower than what the bandwidth is available so the main thing is ready to look at CPU and this is something getting an active active system you will have to monitor because the trends of CPU are different throughout the day for example during normal operation the CPU may be quite low but in the evening when it's doing one of its fancy tasks such as the drooping it will be running at 80 or 90 percent so actually having an active active can be quite beneficial because you are being able to use your processors effectively in the day and then in the evening you can use ramp both processes up to 100% to do the DGP now the invent of a failure it's not a big issue if you pause at DGP in the evening because you know it's not going to make a big difference to your environment so the fact that you do need more CPU for that process won't be affected if you're an inactive failover and likewise you'll only be in this situation for approximately four hours why a new controller is being shipped to you and you install it so the question is having slightly reduced performance for four hours as a disadvantage along with having two controllers that have been able to work fully and serve the data and getting the advantage of all using all that quality processing equipment it's a tricky one and people have different opinions personally ie loved equipment sitting there doing nothing it's just not right so I quite like the active active setup but it puts management's responsibilities to keep an eye on the CPU load and to make sure that we're not running both now in an active active load the disks are eerily split so it's not like you're going to have major IAP problems or or anything like that when you go into a failover because the disks will follow you to that controller so you'll still be using the same amount disk it's just the CPU that will take that extra hit our advantage of using a Sam most people know of VMware or hyper-v is a virtualization solution it's great because it allows you to sweat the assets of the server in the same way as we spoke earlier about the sand and loads we're exactly the same thing with servers in terms of memory and CPU in that for a lot of the time they may be sitting idle and other times the loads will be really high we can put all these guests onto a powerful server and make sure that the server is running 8090 percent of its load the whole time and we're getting the most out of our hardware in the event of a failure on the host if we're using shared storage another host can take responsibility of serving those VMs because it has access to the physical disks that those guests are stored on of course there's many more questions here about making sure you've got enough capacity on the house to be able to take that over but also we have things like resource scheduling so and you can quite often have DRS where the servers you have more physical hosts and then the VMS will be moved around depending on workload indeed in even even in the evening you can shut down entire host servers to save energy if you can put those VMs on to other servers that are not being used as much during night all of this requires shared storage so it's vital that we have a SAN and this is just another another good advantage so we spoken about the redundancy in terms of what the net app or the the sound itself has now we talked about paths in here we've got a standard sound setup here we've got two switches and we've got a host and the idea of this slide is to show about the concept of multipathing the idea is that in the event of a particular failed system we can still get data from the sand to the host and each route that the host takes to get to the data is known as a path and therefore we talk about multipathing so I'm going to take you to a slide here that will show you each different component failing and how the data will continue to be served the event of this controller failing here we can see that data can still be got to the host by this controller that will have taken over responsibility which we saw in the last slide for serving the data we can see the data can come from here down host and of it so in fact actually it's still got dual path and likewise this how this controller fails this one will take over and again we have dual paths back to the host now happens if one of these switches failed as well that's fine because each controller has a path to each switch and each switch has a path to the host so now we're operating on a reduced system because this path is no longer valid but the host can still get to the San likewise this switch fails well the same thing the sound can talk to the switch and then the switch still has at least one path back to the host we have a NIC failure or someone's unplug the network port off this particular path here well we can still see that we have access to both controllers and likewise the other NIC fails we can now see that we've got access by this switch to both the controllers so provided our usage doesn't exceed 50% of the bandwidth available in the event of one of these failures we would actually see no performance degradation at all and this is a standard set up in a nice cozy environment now I spoke earlier about one of the advanced software features called DGP and to give you an understanding of how the duping works if we have a server running Windows 2008 r2 as its base operating system we all know that that install will take 30 gig of memory we have another server sitting on as another guest on the same host and it's another installer Windows 2008 r2 we have - pretty much identical copies of an operating system that's taking up 30 gig and another 30 gigs that 60 GB of data that's been used where a lot these files are the same now of course the server will have additional software applications and there'll be files within the the windows 32 so the windows in the system 32 folders that are not the same but this can be a large proportion of files that are including things like Windows Update reinstall Service Pack 1 that's an extra gig and it should give across two servers what the duplication does is it has a file on server 1 and a file on server 2 because both these servers have got these files stored on the same sand the sand has the overall view of everything so the C Drive the server 1 and the C Drive of a server to a both stored in a datastore called Windows servers and there's one of thousands of files as a demonstration here that's 100 make big what the sand does is it will scan at regular intervals normally out of hours and look for files that are identical and in the event it finds an identical file what it will do is it will replace all the additional instances of that file with pointers so when a request comes in for that file or go to the pointer and then be directed to the physical file and it's going to have significant savings certainly when you put all your OS drives onto the same datastore and we'll talk about this later when it comes to design likewise there's other places where you'd have a lot of replication of data if you are backing up say a sequel database and you put that on a datastore and you take it back up today and then it back up tomorrow and back up the day after a large proportion that database is going to be identical obviously there'll be some changes but the idea of deduplication is it will actually say well in five instances of that backup 80 percent of that data is the same so immediately you get an 80 percent saving on those duplicated files and this is all handled behind the scenes you don't see anything but what you do see is a statistic saying how much of the space has been saved likewise if you've got a large file server that's got hundreds of word files even things such as having your company logo on all those files that's going to be the same bits of data within the file so the key here is although I said earlier it was looking at the whole file he actually is looking at the individual bytes and it will match bytes and put the pointers there so even if you've got two word files that have got a 25k image embedded into them although the rest of the file may be different that part of the binary stream where it is the same will also get savings through deduplication one word of warning is salespeople push this heavily in to you you could have 50% savings and they'll try and get the overall cost the sand down by saying well actually you think you need a 500 gig you only need 250g because there's a 50% saving I'll be very wary of this and always trying size appropriately and if you really are getting pushed and need to cut down amount of space I would suggest that you actually ask for a lonely unit to actually test the kind of levels of deduplication you'll get and this brings me to the final slide of these introductory to sands we looked a lot of advantages and a lot of them are quite scientific and easy for me to describe I call this the magic part and this is because it's not really an exact science and therefore they can be quite a big gulf between what the salespeople tell you what the sandbenders published on the web and what we get in real life however there is some significant advantages in performance in using a sound above and beyond local storage every single physical hard disk including solid-state but I'm predominately talking about SAS drives will have a performance and that will be how many I opposite can serve and what its latency is and you would expect that if you put two hard disks together you'll get double the AI ops with the same latency or you could have the same number of AI ops with half the latency and I'll explain all this in the next what the magic does with using Ram caching is to take those hard figures and increase overall performance this is great but is very difficult to calculate when sizing your sand because it's very dependent on the kind of workloads what you're doing at the time the amount of data etc so I personally recommend sizing your sand so what would be required without the magic what is the magic well data comes in as it comes in it hits the RAM and it's stored in the RAM when the workload is significantly reduced on the disks beneath or the set or the RAM becomes saturated ie full it will then commit that data to disk this as we all know it Ram is infinitely faster than physical disks this is great because the operating system is getting a confirmation of the right of data in fractions of a second so provided our RAM can flush to disk fast enough then the RAM becoming full will see a good performance of antigen the additional advantage and certainly in some sands is that sequential writing of data is far faster than random writing of data so both filling up the buffer in the RAM we can get sequential bytes and then this can be flushed in one go so in terms of writing the RAM really does make a significant difference and not my experience in storage see it shows that latency we've been writing is always minimal it really doesn't make a big difference and if you actually are starting to see high latency in writing then there's something seriously wrong in your in the design or in the infrastructure readings somewhat different when the server and user requests a piece of data it will come through the RAM and it will check to see whether that piece of data is already held in the RAM if it's not what have to go to disks and read from disk on the way back out it will copy it to RAM as it serves it to the individual host machine there's two varieties here if another host or the same post comes back and requests that data again for some reason it's already sitting around and can be served much much faster the more the algorithms and this is the magic that I talk about can predict what you may be you coming to read next so if you are paging a table out of a database and you request a page it will read that from the disk write it into RAM it's highly possible that the algorithms in the sand technology will understand that it's a sequel table and possibly be able to read the next page into RAM before it's required or required and it's the magic or the code that is written to understand what the predictive read is that will vary between sands and you'll hear the sales people saying are allows is the best technology and it can be more predictive and that's how we get stuff into RAM there's a couple of issues here firstly Ram is not infinite so if you're absolutely saturating that Ram every time you read you can't be cashed into RAM or as it is caching to round stuff that hasn't been accesses marked as dirty and gets pushed out RAM so when the request comes in for it I'll have to read from disk so that said it's very difficult to work the predictive loads in terms of what the RAM and how much RAM it can give give advantage now when we're actually wearing performance stats and in certainly net app we can see what the cache hit rate is well that's effectively telling us is how much of the requested data is actually coming from Ram and how much is having to come from the disks and this will give us a good indication of whether we've got enough RAM of course when sizing your san the the sales and pre-sales engineers will be able to give you an indication of how much RAM you need and of course the RAM comes down in price we see that the filers have more more RAM and what's happened more and more now is where we solid state is still too expensive to use as your main storage we're using it as a kind of bigger RAM because it's cheaper than having gigs and gigs of volatile Ram and almost using it as a permanent caching State and uncertain sound such as nimble are doing is very very effectively and they can have some amazing I ops and latency rates and some of the other vendors the NetApp a little bit further behind and of course it's a licensed feature and eventually our machine will see more and more solid-state being used as the primary storage and we'll beat it what we call tearing the data so your solid state is where your sequel databases and your exchange databases will be NGO file server staff is on your second tier and your backups are on SAS satyr disks are two disks and and then it and the the files will become intelligent and they'll understand by looking at the packet analysis what you're doing and actually store the the data in the right place this is again this is already happening and this is getting the magic what makes better use of those physical resources I'm Joan silat and I'd like to thank you for taking the time to watch this video if you have any comments or questions you can contact me by any of the means shown below
Info
Channel: James Sillett
Views: 160,278
Rating: 4.8463578 out of 5
Keywords: netapp, storage, SAN, VMWARE, Iscsi, Sillett, Aggregate, computer, Storage area network
Id: PpEVpjoFiXQ
Channel Id: undefined
Length: 35min 0sec (2100 seconds)
Published: Tue Feb 17 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.