Archiving Files in Linux

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video lecture we're going to talk about data backup and compression on Linux we're gonna look at the basics of how we can take our files create archives and then compress them to save disk space so let's take a look at the fundamentals of how this works and why we might want to use it in Linux all of the archive systems in Linux are based on tape drives because UNIX has been around for 40 years and at the time it was developed the primary way of archiving data was to tape so even if you are not actually archiving your data to tape today what you'll find is that the commands you are using have their lineage began in data centers with tape drives so you may notice some things in the man pages that kind of talk to points like the fact that tape drives are serially accessed right they have to be fast forwarded or rewinded if you've ever used tape at this point it's possible maybe you've never used tape I have so it's one thing to keep in mind that the the core of these tools developed in a time when tape was the primary archiving medium and who knows plenty of companies larger companies especially still have tape archive so you may still find yourself in front of one of these machines someday but again we're going to talk about these commands in the way you might use them in a in modern time where you'll be archiving things to files on disk so one of the main commands are going to use for archiving files in Linux is called Avatar command and tar stands for tape archive and the idea of creating a tape archive or a tar file is to take a bunch of files that are sitting around your system and put them into a single container this would be similar to when you go shopping at a store when you leave you take all of the items that you've purchased and put them into a bag so that they're easier to carry I'm sure that in the windows world many of you have dealt with zip files and tar is very similar to on one on a a certain level to zip files the main difference is that tar files do not compress the contents of the file so what will happen with the tar files it just really allows us to collect a number of files into a single place that we can then email to someone or backup to tape or but just kind of is like a single bundle and be a little bit easier to manage so just like you know began putting groceries into a bag makes them much easier to handle putting a you know a couple thousand files into a tar archive makes them much easier to move around so this idea that we have one file aggregating another and the commands are going to use for this are for creating far tar files is that the tar command and we're gonna look at how we can create archive files and we're going to look at how we can view it in an existing archive file so we'll come back to that when we get to the command line section of the lecture one of the things that's important we always talk about the fact that you don't need file extensions and Linux but it is kind of just nice to put a dot tar extension on the end of tar files so again when we get to the command line you'll notice that I'm using that convention because it can be difficult to know if you're dealing with a tar file or if you're dealing with just a regular plain text file make sure you don't get a lot of information on the UNIX command line about that so these are a couple things we'll look at let's think about visualizing what a tar file is so on the right hand side I have a series of individual files they could all be in the same directory or they could be from anywhere around the filesystem and tar allows me to bundle these up into a single file called a an archive file or a tar file and so again what happens here is that these files just get bundled into this single container and they don't get compressed and I think that this I think that this image is a good indicator of what happens here you know if you've ever tried in like an email program to just attach a folder for example on your hard drive that doesn't work usually what you need to do is take that folder of a couple hundred files and zip them using the windows zip archiving utility and while that exists on UNIX I'm not going to talk about it because most UNIX systems use tar and gzip or B zip files which we'll get to in a second so now while zip does exist in the UNIX world it's good to get an idea of how to work without it and use kind of the traditional archiving tools so again in this case a bunch of files get wrapped up into a single bundle when we want to go ahead and open that file up and get files out of it what we're gonna actually do is untie the file and what this means is we'll be able to take all the files out of that bundle that we just looked at and put them somewhere else on the file system by default all the paths will be relative so we'll kind of see that what that means or the ramifications of that as we look at this as well so let's jump to the command line and see how this works and look at some of these command line operations in action right now we're looking at my home directory and you'll notice that there are a number of things in it but right now what I have is I have a directory called stuff to back up and so let's say it may be like every day you know you know you start to put some stuff in here at stuff that's valuable to you and you wanted to go ahead and archive that stuff it may be copied to an external hard drive or maybe email it to yourself or put it on a USB Drive or you know you could even write it if you have a tape drive or some other type of backup device so my goal here is to create a tar archive of that file and so what you'll notice is it's a directory and if I do an LS of that directory you will notice that there are two files in there and they're fairly small files they don't have a lot of they don't take up a lot of space but they'll be helpful in terms of demonstrating how to use tar so what I'm going to do is my goal here and I'm going to clear this is to create a tar file of that backup directory so the command we use this tar and what we want to do is issue a couple of commands here so I'm going to do c4 create so we create a new archive v for verbose because I like to get some feedback and see what happens and then create a new file and in this case F will indicate what the name of the tar file that we create is called so I'm going to call it my backup let's call it my first tar backup tar and in the slide section of this lecture I mentioned that you should put the tar file extension on your tar files and you totally should because otherwise it becomes difficult for the user to know what they're actually dealing with and what I want to put in there is I want to put in there my stuff to back up directory so I'll be the only thing that's in my tar directory but all the files that are in there will get added to that directory or sorry to that tar file when I hit enter and then what you'll notice is that added to my tar archive is the stuff to backup directory and then the file rename to test and also demo and the reason why this printed is because I used the - V command up here for verbose so I can see what's happening and now if I do an LS - this directory you'll notice that I now have and what's nice is tar files come up as red you will notice that I now have a tar file and you'll notice the size of that is 10 kilobytes and one of the things to point out with tar files is no compression occurred here this is just a collection of files so again if you went to the grocery store this is just putting some you know if you have 4 pounds of apples and you put them into a bag you still have 4 pounds of apples so it doesn't get any lighter and so in this case we just have a collection of files and one of the things we can do is we can go ahead and we can look at what's in that file the way to look in that file is to use the tar command again and I am going to use TF I'm going to call them we go to my first backup that tar and what that does is it just shows me what's in that file so when someone gives you a tar file you also have the ability to go in and see what's in that tar file so if you are not the one who created it you can explore what's an ATAR file by using the - TF command-line option so what are some other useful things we want to do well let's say I've got this backup of my directory here so I have this stuff to backup directory and for some reason it gets deleted so now it's gone and I've deleted my stuff to backup directory although my stuff to backup directory is gone so what I can do is I can actually restore it from this tar file and the command to restore a file in tar so we use CBF create and verbose and filename to extract from a tar file we're going to use xvf so I'm gonna do tar let's go back up and clear this first tar xvf and then my first tar back and what'll happen is if I just give that command it's gonna go ahead and it's going to uncompress that or unarchive that file and dump everything into the current directory using the relative file hierarchy that's stored in the tar archive and so you notice it tells me what happened and if I do an LS now you'll notice that my stuff to backup directory is back why because I've rigidly bundled it up in my tar file and then I brought it back by on archiving it so that's the fundamental concept of tar you can create tar files there's a number of options so if you take a look at the man page for tar and we're really just kind of touching on the fundamentals of tar you can actually append things to an existing tar file notice you can use the are option for that if you scroll down and look at some of these options you can actually glue or concatenate to tower archives together you could delete individual files from an archive some less or not on tape you know a bunch of other things that you can do as well to list you know like you said we looked at listing things in an archive so just take a look at the man page to see some additional options there's a lot of power here but what we want to just focus on today is how we can create basic tar files and how we can go ahead and add files to them and then finally extract those files back out of that archive once you have a tar file one of the things you might realize is well I'm gonna go just put this on to some external hard drive or I'm gonna I'm gonna actually go ahead and write it to tape and we won't actually talk about how to do that in this lecture but one of these you realize is it well it'd be nice if I could make that smaller and Linux does support compression just like every other operating system and usually when you receive a tar file from somebody they'll compress it because it just makes more sense if you can make the file smaller even in this day of you know where a lot of people have bandwidth and hard drive storage is getting cheaper one of the things that we can do is we still want to kind of take up the least amount of space that we possibly can so what we can do here is we can go ahead and compress these so just like an mp3 takes a music file off of a CD and and makes it smaller compressing at our file is way to compact that file there are two primary algorithms that you'll see used for compressing files on Linux they both have the zip name in them the algorithms are gzip and bzip2 first we'll look at gzip these are different from the wind zip algorithms that exist on windows and again you can actually work with wind zip files on unix but i'm not going to go over those in this lecture you can do a youtube search and i know you'll find some excellent discussions of how to work with those files so gzip is a pretty standard compression algorithm and what happens is that it will actually go ahead and compress your file and make it smaller and dot gz is the extension that will be Auto appended to your file and so it's pretty common for us with gzip to archive to compress tar files and so the way you gzip a file is you use the gzip command and the way you unzip a file is you've got an zip it and that's G unzip and so whereas if those of you that worked with Windows zip files will note that it's kind of a single step process you right-click on a file make an archive and it both collects them into an archive and compresses in UNIX we have that just like everything else we have the ability to kind of do things individually or in combine commands to do things in the way we want to so the act of archiving and collecting with the tar command is one step and then if we choose to actually go ahead and compress we can do that with the gzip command so we'll take a look at these on the command line in a second but first let's talk about B zip 2 which is another algorithm used for compressing data and you should will see things like the Linux kernel and a lot of things compressed using B zip - there's probably other ones out there these the ones I'm most familiar with so you know it's important to know how to work with both of these and one of the things you'll notice is that these will also create files that you know using the B sub-2 command you just go ahead and zip an ATAR file you can zip anything by the way it doesn't have to be a tar file but it often is and then when you do a B unzip to it we'll take that and unzip it you'll notice I did a bad copy paste here because B zip 2 files will have a BZ - extension not a GZ so actually this slide is wrong because it will give you an error because you're trying to use the B unzip - to unzip a gzip file which you cannot do all right let's jump over the command line and see how this works in practice all right so we're back on the command line and we've got our files here and I'm gonna look at the size of my first backup tar file and you'll notice that it's 10k because it's got a couple of files in it and so what we're gonna do is go ahead and we're gonna compress it and then look at the size of that file so first thing I'm gonna do is I'm gonna use gzip to do this so I'm going to gzip my first back my first tar back up that tar file and now I'll do an LS - LH of just that file so we can see what size it is now and you'll notice that it actually went ahead and now it's 279 bytes in size it got really really tiny so that that was a pretty good compression I was not expecting it to go that far but the files inside are pretty small so that's actually a really good thing so we went from 10 kilobytes to 279 bytes there's a couple of reasons why it got that small but it gives you a good idea that the compression of these files is can be pretty good if they're not already compressed remember if you're trying to compress things like mp3s or files that are already compressed they really can't go much further but things like text files they can get compressed pretty far so now if I want to unzip my file I'm gonna use G unzip and I'm gonna go ahead and unzip my file and if I look at it again now I'll notice it's been renamed to just be tar and now it's back up to being 10 km and so again it's always a two-step process you're gonna tar your file then you're going to gzip it and then you're gonna unzip it then you're gonna untie it there are ways to combine those steps but first let's look at B zip and how that works so next thing up I'm gonna be zip to that tar file because I've uncompressed it so it's now just back to a plain uncompressed tar file and if I do an LS dash L of the file you're going to notice that it's 263 bytes so just slightly bigger or the same size I don't remember what the other size for us but same process again if I want to unzip it I can use B unzip too and then look at the size of the file and now it will remove the bc2 because it's no longer compressed so again we can compress files and uncompressed files relatively easy using these commands and if you take a look there are some you don't need to use b unzip to unzip and guns g unzip to unzip if you look at the gzip command you'll notice in the man page that there's actually the d command so you could actually use gzip - d to decompress or you can use g unzip and i believe be zip 2 has the same option available I just like using G unzip because it's a little more straightforward to what I'm trying to do all right let's talk about some shortcuts because tar and compression utilities are used together so often it's pretty much common that we see them work together so you can do all of these items in one step just like WinZip does on Windows you can actually compress and tar files at the same time and so what you'll notice is the - Z command option will gzip a file on the - I will go ahead and actually be zip to a file and it looks like an L but that's an I it's just a weird font issue so in this case if I go ahead and tar a file and I'm to create a tar file it'll actually go ahead and create that instantly so let's go back on the command and then look at how these work you can also use tar to uncompress a file so before we used xvf but if it's a gzip file if you say X V Z F it'll uncompress and untie our file and if you say XVII F it'll unbe zip to a nun tar file so what's nice is you can do this all in one step so you don't have to use G's you don't use the tar command then the gzip command then the G unzip command and then the tar command you can kind of get in this habit of doing it all on one line so let's go back to the command line and take a look at at least let's do this with gzip and that way it'll be just good enough that we can see how this works so I'm gonna delete the my first backup tar file that we've got here cuz I'm gonna recreate it and now what I'm gonna do is go ahead and recreate that from my stuff to backup but this time I'm gonna compress it so do you do tar and last time we talked about CBF but this time I want to go ahead and actually compress it so I'm gonna do C z VF and again you don't need the V I just like to get some feedback while I'm compressing files and I'm gonna call it my second tar file dot tar oh and it's an interesting piece of feedback because I forgot to add the actual file I wanted to put in the tar file so it's fine I'll leave that mistake in there you know if I'm gonna create a tar file I need to tell the tar file what to put in there so let's go for us stuff to back up there we go so good air sometimes you forget what you're doing so if I do an LS of this directory you'll notice that it says tar if I look at the size of it you'll notice though that it's actually compressed and so one things I want to point out is notice when you use the combined command for gzip and tar that it doesn't put a dot GZ command on there and so you might think well what does that how does that affect me well if I try to tar xvf that file right now right it actually worked or didn't work so let's do RM - our and it is putting it back so it's realizing that it's actually compressed and uncompressed it if I go back in RM to my second tar file tar and then let's recreate that tar CC VF and then let's call it my third tar file tar.gz and put stuff to back up in there now you'll see that it actually created the tar.gz file with the tired GZ extension so notice tar was smart enough to realize that this file archive even though I didn't say uncompress it was actually compressed and went ahead and uncompressed it for me which I wasn't expecting so that's kind of cool so again if it's compressed it looks like that'll do that automatically I'm much more in the line of thinking that you should have dot r dot GZ extension so it's easier to communicate to other users exactly what the format of the file is that you're working with so again if I do an LS s shell of this directory you'll notice that it's still the compressed file size and actually if I just show you real quick if I just G unzip that file right and now I do it I would go clear then do an LS dash L of that directory you'll notice that it's back up to ten K so notice I can just uncompress the file so again that previous example I had I just wasn't expecting it to do what it did so that's kind of cool but again just to be polite to other users and your future self you should definitely look at using tar.gz so that's a quick introduction to creating tar files and to compressing them using gzip and bzip2
Info
Channel: Jason Wertz
Views: 21,412
Rating: undefined out of 5
Keywords: Linux
Id: _CItUGbd3dw
Channel Id: undefined
Length: 21min 19sec (1279 seconds)
Published: Fri Sep 20 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.