Snapshots vs Backups vs Replications

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Time to re-evaluate?

No. It's never been a viable strategy.

👍︎︎ 42 👤︎︎ u/nyintensity 📅︎︎ Jun 13 2020 🗫︎ replies

1990 called. They want their debate back.

👍︎︎ 25 👤︎︎ u/mdj 📅︎︎ Jun 13 2020 🗫︎ replies

This is good. I have similar conversations reguarly about the difference between snapshots, replication and backup.

The only point I disagree with, which he only mentions bfriely, but backups are are NOT archives.

In one job, I had to restore 12 or so users Notes email boxes from about 5 years prior - several years worth of monthly full backups. If I recall correctly, they'd all expired from the NetBackup catalog, and they'd changed from DLT to LTO tape in that time, so we didn't have the catalog information for where the data was, but they still kept the tapes.

So basically spent 6-9 months running pase-1 then phase-2 import on hundreds of DLT tapes on the 2 rusty old DLT drives they'd kept. And then restore the data to a dedicated PC and burn it to DVD to give to the lawyers (2 copies - defence and prosecution).

About 20% of them failed, either at the import or restore point. But the case went through.

The prosecutors won. I have no idea if they used any of the evidence I'd restored via my very long restore process which ended up with about 100 DVDs being created.

Oh well, it kept me off the street for 9 months, even if it was boring as hell. But things have changed a lot in that time. I'm talking about 15 years ago, restoring data that was already about 5 years old. s boring as hell.

👍︎︎ 8 👤︎︎ u/birdstweeting 📅︎︎ Jun 13 2020 🗫︎ replies

I don't actually see an issue here if you actually architect the solution properly. Yes, snapshots stored locally on the same array aren't a backup. But if you're replicating the data to another array offsite and keeping snapshots on it as well as the source array then you actually are implementing a backup strategy. In a perfect world you would then also be taking periodic offline archives of the data as well but even without that you've met the basic logic of protecting the source data and making it recoverable in the instance of a site/array loss or logical corruption.

👍︎︎ 3 👤︎︎ u/irrision 📅︎︎ Jun 13 2020 🗫︎ replies

Veeam leverages snapshot technology for backup, right ?

👍︎︎ 2 👤︎︎ u/paul1939 📅︎︎ Jun 13 2020 🗫︎ replies

Don't tell Netapp

👍︎︎ 3 👤︎︎ u/dkrizzy 📅︎︎ Jun 13 2020 🗫︎ replies

We had a client report to us (GFS provider) that some one, didn't know who, deleted 5 million files off their system. They used our auditing tool, found who did it and when, and were able to restore all 5 million files from SNAPSHOTS and snapshots alone. No replication or backup (as they are defined in this video) were needed. This was an informative video, but I disagree with a lot of the principal points regarding storage and backup practices.

👍︎︎ 1 👤︎︎ u/lead_reveneer 📅︎︎ Jun 15 2020 🗫︎ replies

Snapshots have never been considered backups... whomever told you that was just plain wrong.

👍︎︎ 1 👤︎︎ u/slingshot8908 📅︎︎ Jul 08 2020 🗫︎ replies
Captions
have you been told that oh you don't need any backups of replication because snapshots will fulfill all your data protection needs so today's video I will share with you what's the difference between snapshots replications and backups and just maybe you may want to reconsider your Diaw strategy snapshot is all the backup you ever need is often something only storage vendors will tell you more often than not these are specifically vendors that don't have a complete set of data protection strategy or solutions don't get me wrong I'm not saying that snapshots are bad snapshots have their place in the chain of data protection but it is certainly not backup neither does it replace the use case of replication having said that your data protection needs to differ from the next person so there may be possibility that snapshots is all that you need but often times most enterprises will have a combination of the three data protection capabilities and technologies so let's go through the three data protection methods a little of how it works and what use case fits it best snapshots are also known as point in time copies of pit copies by definition it's the view point of the data at the point where snapshot is triggered snapshots are by far the fastest and most efficient data protection method to protect data sometimes in certain systems is almost instantaneous so let's look at how it works you would have the master copy and as you write more and more data assuming the green box is your data that that keeps going and going and when you initiate a snapshot what tends to happen is they just put like a little bookmark or marker every time you write subsequent new data on it this there's a journal happening here that tracks all these changes let's just make more new data and the next time you trigger a snapshot it happens here and another journal happens so as you can see the longer you keep this the journal becomes larger and larger and impact performance so why do we not like snapshots or rather why snapshots are not backup is part because the ticker interdependency between all these snapshots and what you want to recover say for example you want to recover this point in time it is actually a combination of this green box and the master copy so assuming any of this component is corrupted or destroyed you literally don't have anything else to recover say for example if the master copy is dead or corrupt or whatever it may be you are not able to restore any of these two other copies as well also a lot of times snapshots in most storage subsystem live on the same storage you would have your master here and all your snapshots on top of it this is actually not the best practice in general because a failure in the volume or the storage simply means all your backups with it it's a bit like all eggs in the same basket having said that snapshots because it's so fast and so powerful and it's just doing reference and pointers it is very very good for recovery purposes so for example I have a post here all I need to do is if I won this point in time I'll just mount this case together with this and I get the recovery point that I want it's not just our grid if you only need to recover and retain backups for just a couple of days and also it's highly dependent on how often do you take it this is because the longer you keep snapshots the more resources it uses to keep the journals and all the blocks there's change many vendors have unique implementations to help alleviate this issue but still really is just delaying the inevitable the limitations still exist now let's look at replication as the name suggests replication simply means copying of replicating data to another storage it can be on another system in the same data center but often it is remote to protect against DC failures as well there's generally two types of replications asynchronous and synchronous replications let's start with async async replications often mean that data is replicated at a given interval perhaps every five minutes changes are then replicated to the remote site so in the event of disaster the worst that can happen you will lose up to five minutes of data and it's often articulated as what we call recovery point objective or RPO equals five minutes sync replications on the other hand replicates all IO as it is written to the storage the system will commit both local and remote writes before and nology to the host that the right is good in many cases mission-critical apps that cannot tolerate any loss of data would often opt for sync replications similarly in terms of RPO sync ratifications is what we term as RPO equals zero that simply means no data loss so why would anybody pick acing replications then as you can tell sync replications demand on bamboo it will be extremely high and latency sensitive comparatively a sync often time have generous allowances of bandwidth and latency making it significantly cheaper the advantage of replication is in its ability to recover very quickly with minimal data loss in the event of a complete data center failure or a primary storage is completely lost you often time have a really copy of the data and it's you're ready to resume business having said that it is not without its caveats because every data block written is replicated that simply means if you have a corrupted block that is written or somebody accidentally or maliciously deleted a whole bunch of data all this will also be replicated like the saying goes 30 block in the D block out this make it great for business continuity protections and installations again primary storage failures but surely not so great if you want the ability to row back to any point in time which brings me to my very last Python backups have been around pretty much since the beginning of time all the time it's evolved to resemble little like a combination of snapshots and replication you make a full copy of the primary data every time you run it back up which for most organizations it's once a day and assuming you run at 8:00 p.m. you get a point in time replicas of the data exactly as how we look like at the end similar to how a snapshot will be so assuming you do it seven days a week you will now have seven independent copies of data for 8:00 p.m. for the last seven days assuming the third copy is corrupted you often still have the second or fourth copy to recover unlike snapshots or replications backups are also perfect for long-term retention because as long as there's capacity and resources you can store it for as long as you want you may be thinking the consumption of storage for backup then be massive and surely that's an issue yes of course but there are many capabilities out there like video and compression that will help with that problem and I I mean today I will not go into depth with regards to that but the biggest issue with backups is generally time it takes the longest to protect and also takes the longest to recover without going into details on the advanced backup a recovery capabilities incremental for ever's and deliver appliances which have improved recovery performance over the years regardless it is still the slowest of the three technologies we spoke about today so depending on your needs and requirement you may only need one of the three data protection methods or a combination of all three if I will summarize my recommendations for most cost-effective and fundamental form of data protection for every enterprise backup is a must I cannot stress enough about backups you need to have backups a short term data protection between three to five days snapshots is the way to go but optionally I will still recommend backups and for fast recovery snapshots and replications is the way to go mission-critical applications definitely will be replication with backups hopefully that has been useful I know it seems like a lot for people that are new the data protection and they may all sound the same in some sense there are subtle differences although as always if you find the content useful or is something that you can use please subscribe and drop me a like a comment till next time
Info
Channel: Charles Chow
Views: 19,163
Rating: undefined out of 5
Keywords: data protection, snapshots, storage snapshots, netapp, backup, synchronous replications, asynchronous replications, replications, netbackup, networker, veeam, san, nas, storage, storage area network, stretched clusters, backup and recovery, data protection explained, dell emc, veritas netbackup, cloud data protection, disaster recovery, data availability
Id: BcA13YbUv-4
Channel Id: undefined
Length: 8min 43sec (523 seconds)
Published: Fri Jun 12 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.