Continuous mode backup and restore - Episode 14

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] hi everyone welcome uh azure cosmos tv live tv uh episode 14. and uh joined this week by my good friend govind kanchi how are you sir flowing welter thank you very much yeah why don't you uh you tell us a little bit about yourself yep uh my name is as he rightly said governed and i work out of the cosmos db engineering team with mark and folks and i uh i work with customers i also work on set of features and help customers be successful yeah great thanks uh so everyone this week uh we're going to talk about uh our new continuous backup uh and restore feature uh government you want to talk a little bit kind of about i guess the history of uh our backup and restore yep definitely start i think maybe yeah so yeah as so cosmos db when we launched what we did is we wanted to make sure that first we focused on the slas the availability sla the latency sla and the performance and the consistency slas as part of that whole thing that whole approach of making the database as completely self-running we also focused on the fact that we provided uh backup two free backups or two free copies of backup uh starting off right so the history there is that uh what we did is we would take a backup or try to take backup every four hours and we will keep the last two copies at any given point of time over the years what we did is we added ability for customers to increase that both the frequency as well as the retention to be increased beyond the eight hour capability so now you can basically go ahead in periodic backup which is what we called the original backup where you can go ahead and say that take a backup every one hour or every 24 hours and retain this for x number of hours so that's what we had right but there were a lot of challenges with it as i think a lot of our customers uh i'm sure mark you have had that discussion and the folks who are uh live today they would also know that we would uh we would see customers coming in and facing challenges about oh we want to restore backup right now and uh we have to wait for the ticket to come in and so on and so forth it's challenging there was no powershell of cli we basically decided that we need to fix this whole experience so about two years ago we started on a long journey where we basically decided that we need to fix this and the way we try to fix this is allow customers to do self-service restores across 30-day window to address various scenarios and this is what we call as the continuous backup that we are basically now ready to gm yeah okay yeah great uh did you want to share some slides and kind of talk about kind of our our new backup or how do you want to oh yeah i think we've got our docks up here yeah i think yeah i do have a dock up here uh and i think what i'll do is i'll walk through what are the scenarios yeah for continuous backup and that will be also a good validation how much of stuff that we have taken into account that customers require right first of all the big changes uh from the periodic backup is that we will take backup in every region your account is present okay this is different from the periodic backup in the way that in periodic backup we would take a backup in the region where the lights are occurring and then we will use the blob storage replication to basically provide that backup in all the regions right a lot of customers had challenges and asks around it so we said that with continuous backup we need to make sure that we allow customers to choose or we follow what where the customer's data is and make sure that the backup is also present in all those locations interesting so why why is that why would we uh back up in every single region they're replicated too no i think that's a great question right so what would happen is uh there are multiple scenarios here customers wanted to do a restore of the backup in a particular region let's say you are present in four or five regions you want a backup to be restored in only particular region uh and that is not that was definitely not possible earlier right with this kind of scenario you get backup in every location the other thing is that the backup itself when it is present in all these locations for our uh high availability scenarios we can immediately roll back without doing cross region so we can uh and we can it's all the data is available locally we can just go back and restore the data locally so we don't have to do any cross region calls and uh to make to do the restores i see so you're saying that i could do a restore uh into multiple into a new account that's going to have all the data in multiple regions in just like the original yeah so yeah let me describe this scenario let's say take this particular account which is present in i would say three regions right uh so let us say you would basically you wanted to restore this particular account in us west right so what we would do is it's very straightforward you and i will show you a little later you basically come in and say this account to be restored in this particular region only right so and then you can basically you can go ahead and do your testing or you can use that backup for doing your analytics and all kind of things right so everything can be done in particular region so that's what we would do i see okay um good what about uh data consistency with backups in multiple regions yep so yeah that's a great question again right so what would happen is the data consistency follows the data commit in each of these issues as long as data is committed in each region that is what is getting backed up okay so you don't as a so you remember depending on the consistency that you have chosen right if it is uh strong then you will get yeah the data and backup are completely synced up yeah so if you're running multimaster with a relaxed level consistency say session or weaker um the data backed up is essentially going to rep is going to represent where the replication cue was at that point in time basically yes basically yeah you completely right the idea there is that whatever is local available there that is uh that is present cool yep so let me describe few of the scenarios and feel the things and we definitely document what is restored what is not restored what we don't restore is a good thing to remember because we are database we restore data what we don't restore is all other things around it right the firewall the v-net and so on and so forth right those are the things that that we don't restore but data and index we definitely reach to it makes sense yeah so four different scenarios of the restore one is that let's take this i don't know whether it's visible here hopefully it is yeah uh uh let's take this account and this account was deleted at p3 and you want to basically restore it right so this is one of the scenarios so if you want to restore a deleted account that is definitely possible with cosmos db and you can restore with powershell uh the portal as well as pli the other scenario would be that as in this case at least the scenario that you touched upon right in the beginning is restore the data of an account in particular region let's say this account existed in two regions and you wanted to make sure that you want to restore the account in a particular region you can do that then we have something more unique and that is where the backup actually the power comes in right is that the cosmos db backup has to be touched only when you have made a mistake right it is not a tool for doing uh your regular hidr kind of scenarios right so what would that mean is that let us say you made a mistake by mistake uh accidentally you updated or deleted data right so you want to go back in time and get the state of the data in another account and that is what we basically provide uh today completely with cosmos tv i see if uh if i delete an account uh how far back can i go uh to restore that yes great question this doesn't change so today uh just as in periodic backup and the continuous backup it uh if you have dropped an account or deleted an account we will store it for 30 days so you can come back between the time okay so if i delete an account and i get busy for like 29 days i can go back on that 30th day and say oh sorry can you restore that yep yep and we do have customers who have done though and uh i myself have done those kind of mistakes so yes right now it is 30 days and it definitely works out for most of our scenarios i see and i'm also guessing that if say you make a mistake uh in like say overwriting data or deleting data uh whatever it may be once you restore it you'll need to i guess replay the data back into your operational system uh i guess is that that would be correct that's completely correct right today we are very conservative we are a database who doesn't want to be news right that's why you know all the things that we do and availability front-end secret and everything in the restore scenario in the version that we have come out with the continuous backup today we always restore the data into a separate account right and then it's up to you to basically find out what data is present by doing query or and then basically moving that data into your present location right that is what you should do for by uh for uh any accidental updates of your data that is what you should accidental updates or deletes of your data that is the approach that you should take and i would suggest don't hurry in that uh that particular process because you want to take time right and uh and go ahead and look at what the state of the data is so yeah go ahead uh so what's kind of i guess why why this i guess what's the difference between this and kind of the the periodic backup uh that we used to have were there issues with the periodic backup that yeah that make less desirable i guess in some ways i mean so you're completely right so let me open up the periodic mode right and i think it will it will make a life simpler the first issue is that periodic backup would take back up only in the uh let us say the paired region right that is and in case uh and we basically fixed it we provide now data residency or backup residency choices uh in cosmos db you can decide where your data is available right by doing the replication to right location or choosing the regions backup was always replicating to the paid region now we fixed it that is but let's come to the issues right the biggest issue is that how do you restore it today you have to file a support case and that's not definitely something which is very physical or the right thing to do so this is the first problem that we have right then the second issue is that even if you once you least or it is not exact timing right because cosmos db when we take the backup the backup is across multiple partitions the chances of the each partition being instant is not right that's one thing that we uh the challenge that we are trying to work out right so and uh uh the final part of this is that uh the the ability for you to use shell cli or even arm for that matter if you are go by that or not you can use pretty much all of those mechanisms to do this we want to make completely self-service that makes sense so it's uh definitely not having to file a support tickets hugely beneficial and i guess what you're saying with because of how the snapshot was taken it's not really instantaneous is it so you could have inconsistent data across partitions is kind of what you're saying with this the older yeah there could be challenges right it is a best effort kind of a thing where the data could be a different state across all the partitions but with the continuous backup you get the data backed up uh using our new technology underneath which ensures data is in completely time consistent across all the partitions i see i get a question from a viewer here uh is this new backup policy work with the data in analytical store so if they made a restore in the past what's the date what's the state of the data contained in the analytical store yeah that's a great question it comes at the right time so cosmos db an analytical store right if you see uh the cosmos db backup when we restore we will give you the restored data now if you want the state of the analytical uh store also restored so we do have work which is upcoming and actually by june uh by end of this june if you're really interested we can provide you a script which we can use but in another month's time uh we will be providing a powershell and cli support for that where you can just invoke a state or an operation on cosmos db's account saying that you have whatever the state of the data in this particular collection on account is please enable it for the analytical store so we are working on it yes ah okay great uh all right i'm gonna ask you uh a question that i get all the time uh so how long does it take uh to do a restore yes so as long as you get mark some uh uh best of the drinks and best of the ice cream to go with um so then you can get it restored as soon as possible we'll just give you the no i'm just okay what we can definitely do is uh uh the simple answer is it depends on the size of the uh the data we do have customers who have tens to hundreds of terabytes right what we have seen is and this is and there is no sle right now but what we have seen is a customer has restored about 17 terabytes in about eight hours right but what i would say is uh 45 minutes uh to for a terabyte to these two to about one and a half which is the worst case that i have seen that's the amount of time okay so 45 minutes 45 to 90 minutes for a terabyte of data to restore your on average your mileage may vary yes okay that's a good rule of thumb uh for people to keep in mind uh on that so very cool okay yeah are there questions otherwise what i'll do is i'll walk you through a few of the uh a few of the things one is the provisioning what do you need to take care and i'll walk you through some of the restore things yeah go for it okay wonderful so uh the screen uh that you see i'm trying to provision a new account right and what you can do is you can come in and when you choose the backup policy for a given account rather than periodic right which also has the redundancy options and will not focus on that today let's just choose the continuous mode right we are going ga uh end of this month uh or the at the most a few more days here and there maybe second week at the most and the idea here is that uh then this preview tag will go away but even today we will support customers as they are in production and we will support uh accept all that support tickets and so on okay so if customers want a go live uh license so to speak or whatever the cloud equivalent of that is since we're not a packaged product but uh but we'll support them as production even though it's tagged preview yes we have hundreds of customers today using it in production oh wow okay i think that part is completely uh something that we stand by and we have proud of the key part here is that there are a bunch of features that we want to line up today we will base we are very confident about the sql and the mongodb api in the single region write account and the rest of the features which we are planning to support are documented in uh in our faq basically okay okay this is the only thing that you need to do nothing changes now the question here would be what happens to existing accounts right if you have an existing account in the periodic mode and we will mark has a link and we will share it we are running a preview of where you can apply for converting your periodic accounts to uh to the continuous mode uh in a month's time period you can actually then uh take a powershell or cli or in a portal to convert the periodic accounts to the continuous mode i uh i shared the link on the screen there and i'll leave that up a bit so people can jot that down or just take a screen cap of it uh whichever whatever you prefer it's fine thank you very much mark that we are running on and it will help our team as well as the customers to basically experience that we will do the migration so you can nominate your test or production accounts uh but in months time you can do the migration yourself i'm guessing this is zero downtime of course yes yes okay okay so now let's come to two scenarios i will just walk you through the so once you create you have to restore so what would you do right uh and i will walk you through uh our ux uh which is which is uh which is pretty clean and we are we will keep improving it so you can choose the time of course and time shows the red one the red all the dates in the red basically are non-choosable right uh you can choose a particular time a particular date and particular time and which location that you want to restore into right and then you have a choice of restoring the complete account or set of containers so depending on what you want to do you can do that choose the resource group and the target account right once you choose that uh we will the restore will definitely cost you uh going forward that will be 15 cents per gb and this is based on the usps cost right and that's it that's all you need to do let me just show you um powershell and cli experiences i think you want some of our customers only prefer those because they can actually check in the scripts and all that and they don't use those and the format remains very straightforward and let me just increase the size there a little bit right it's saying you need to give us the target resource group target account name the source account name there is so timestamp and so on and basically the location that you want to restore it so it's very similar for cli uh also so if you want to do cli it's pretty much similar you are giving us the target this source uh you are giving us the resource group location uh so those are the things that you need to do okay got it okay so let me go back here so there is one thing that i want to walk through so you can do definitely this right but there is another sophisticated or advanced scenario that we support and that is basically let us say you are one of those customers who creates containers and drops them and creates databases and drops them so what we also provide is a way to look at all the events so you would have created an event at certain point in time and you want to choose just that particular time right just around that time you can do all those kind of things so those are data plane operations i'm seeing in here oh no these are all control plane operations okay these are completely and actually only one of the things i think only you will you will appreciate uh on this front is the resource model it's a very rich resource model which we have uh which we have basically exposed there is a backup policy it provides the create mode so it tells you uh what exactly whether it was a restorer or some uh not when it was restored uh what was the restore source so you can actually literally form a chain of your uh restores and so on and so forth when and which region was dropped and all those kind of things are basically available here i see looks like i've got a some arm template samples i'm going to have to update in the future here uh thanks govind no i think you are you may you i i'm assuming that you are uh you are on the top of these things all the time and i heard that you are doing all the bicep stuff and all that i am i'm right in the middle i got about i don't know a dozen or more prs open in the arm template uh repo right now um but yeah i took and um ran the bicep uh decompile against all my arm templates uh samples in there and then uh and then did a build off them uh back into the back into the arm template with the has like basically the hash for the bicep file they have to be they have to you have to make sure that they kind of round trip up to you anyway yes and that's the important thing right i am excited about bicep because i think there is whole whole set of customers uh asked around it so i'm looking forward to all those things me too uh i don't know if arm temples will ever go away i kind of wish they would because i really don't like authoring those things um bicep is certainly less uh verbose yeah i think you you are the one of the few people who is diligent i would say diligent about uh making sure that we have the complete uh what you call the way of interacting with surface area of our product right yeah so if you remember this was yeah this is the simpler version right that this is you can just go ahead and say that this is what is the backup policy looks like but as you can imagine as you rightly are saying right editing this adding this some stuff if you make a simple mistake you go back it's not fun it's not fun i'm looking forward to bicep anyway so that was the event feed so we provide the event field here in a much rich fashion on the ux but you can always definitely go back uh to our powershell and cli they also provide the complete event feed on these things right so okay so you can do all these enumeration of all these events and then choose the right time and come in right and that i'll just show you a very simple uh way so you can get something like this right saying that all the accounts which are present or in the live account uh and when they were created and this is what we are displaying there we are displaying the databases here but then you can also displace all the containers also saying this container was created at this point of time and and so on and so forth right so i'm there is uh i don't know how many people appreciate this there was long time asked from a lot of customers to provide when an account was created when a collection was created now we have this capability thanks to the the resource model that's true so uh for folks that aren't aware uh you well i get this question all the time because i'm the arm guy yeah but when you know somebody would call in and or open a ticket or whatever and say hey someone in my company created this this account uh can you tell me who and when and i'm like i can't do anything arm doesn't record it uh but now that you have this thing that's running against our our log store uh we do know uh when that happens so that's that's cool actually yep yeah so that's that i think uh so i i hope some people like that and uh basically can build some uh sophisticated uh analytical tools in terms of what they want to do there right that's one part that i definitely wanted to touch uh today so i got a question for you um why only 30 days like like i i interact with a lot of customers uh that have regulatory requirements around data that they have to keep uh stored uh and today or i have been for forever basically is uh telling them that they need to go and double write that data into blob storage uh and archive it there um so why 30 days and i guess what's the story for folks that need long-term data retention can they can they use something like this yeah i think uh thanks mark i think you and i i think we face this question every day and we are completely completely listening to our customers just matter of time let me start prioritizing those asks but long term first of all let me address why just 30 days today we have basically taken this step to align with the gdpr and other kind of retention uh asks that customers have so that's why we are aligning with 30 days and there are challenges in terms of going beyond the 30 days but for beyond 30 days say one year to year and all those kind of things we are working on a capability actually uh it's completely fair to announce that we have we will have a private preview and you should definitely reach out if you need that but first we will expose that capability of long-term retention on the cassandra api because that's where some of our major customers are landing and then we will start exposing it for our sql and the mongodb api customers so definitely something that we're working on so okay so coming down the down the road here down there definitely yeah so another question uh uh same viewer here uh is a point in time restore available for all apis yes i think the hard one right so thanks for watching that one uh yes today it is available for sql and api uh and we are trying to make it work for cassandra table and kremlin i think the uh table in kremlin will land in uh earlier if we uh get enough customers on that front but today it is uh focused on the sql and bongo uh we are actively working on cassandra and uh other apis too yeah i mean and i guess i mean you said earlier you've got hundreds of customers running on this uh to get to get those others into you know preview and of course eventual ga uh we need to get customers in there first uh to harden the heart in the service and make sure it works before we um before we bring it into more of a public preview availability then of course eventually ga so yeah yep make sense yep uh yeah so uh what about our back like do like we have that backup operator role in our uh our back uh today that defines the actions that you can do a backup or a backup restore i guess it is uh in there uh what's the our back story for for continuous backup yes great question right so first of all you uh and while designing this feature we uh resp we talked to a lot of customers and a lot of customers didn't want the restore uh capability to be available to all everybody right so you need to uh owner will owner of the account will definitely have capability to restore but then you can you as owner can assign this role to other principles to perform the restore operation right i think that's the key part and what we have taken and hopefully folks will have time to go through this uh we have there is a there are two concepts here the scope where you apply the uh restore capability or the permissions and uh how do you create that role right we do have a role uh which is basically the permission that you really require is the restore action permission right it's a cosmos restore operator right and this restore operator is what you require today for doing the restores this is what you require uh at this point of time okay so this is a brand new role this is just a brand new role and because our goal is that uh that continuous backup becomes the future and all the investments go ahead and so we wanted to the cosmos backup operator rule that was designed more to show who can see that the restore capability itself right so we want to over time because we want to keep this these two features independent we are not reusing that particular role right now we have a completely new role i got so we're gonna eventually we're gonna deprecate the periodic backup is what i'm hearing uh yeah that's uh yeah yeah i think what i would say is uh yeah yeah once again yes eventually i mean we're microsoft we don't actually anything it'll be you and i will be retired maybe before we get there yes i got it so this role here uh can operate or exist without access to the data plane is that correct yes completely independent it is completely rp driven there is no data plane interaction got it okay all right so brand new brand new role new actions yes new actions and uh the key part here is that you can apply it to the account resource this is where you need to apply under subscription got it had a question earlier from a viewer who was asking can they migrate and i just want to put up this aka ms link because we do have a private preview uh for customers that are using periodic backup and want to migrate uh to the continuous backup so we'll have that uh where we have that now in private preview and i'm guessing that eventually we will of course uh is that uh are we gonna have that 4ga yes you're completely right mark okay i think uh so today we are doing the private preview where you nominate your accounts we will do the migration for you but in about months time we are completely ready with powershell and cli as well as portal uh fingers crossed uh so that you can do the migration yourself got it so i'm guessing that'll surface either through well it'll be portal and i'm guessing there will be a new commandlet in powershell and a new uh cli command uh for bash yes completely so okay yeah yeah cool um let's see i have another question for you what about so you know we have the new uh the new encryption with customer managed keys that yes uh we released uh that's i think i guess it's now ga right uh i don't remember but either way so what how does this work if i've got uh customer managed keys and i'll even throw a curveball at you here what would happen if uh because i'm guessing we back up we we encrypt the backup with the same key customers are using um what if i rotate that key and then i try to restore from a from a backup prior to that yeah a lot of questions so first of all the cmk itself today uh the existing cnk capabilities that we have right now we don't work with the this particular uh uh feature but we are targeting it in uh some more time in the future we are targeting ability to work uh to make this whole thing work there is different set of identities that team is basically are working on once those are in place we should be able to uh get this feature out this feature support out right that's the first part now second part is that what do we do with the uh data and everything and we will document that whole uh approach that we are doing i was advised that i should wait for that because this is going to get recorded and posted i should wait for our official um capabilities and log and then basically say about a few things about it but rest assured the idea is that once it works with the cmk and it is uh as a feature uh we will also make sure that we document how we will take care of the key rotation and everything but if you if you really think about it right uh the theme that we are storing in keyword is the one which uh is basically just encrypting the the key itself right it's not just about the full data because that will not be the right thing to do because we have customers who have tens of terabytes of data uh every time re-encrypting all the data would not make sense got it okay good yep i just i've put a link for the docs up that you're showing here so if people want to go and take a look at all these docs that uh govind has so lovingly authored uh you can go and check it out thank you very much i think you're uh you are the uh leading star there with all the arm and the cli stuff you just no sir no no i'm just a i'm just a follower uh okay um great uh anything else uh you wanna show us uh what have we not covered uh today i think we've i think i've pretty much hit all the questions i had about this this is really exciting i know customers have been wanting this for a long time i mean um for sure i've heard it for well since day one since they joined the team about the need for continuous backup and point in time restore i mean this is what modern databases should provide for their customers so uh totally makes sense so very exciting thank you very much mark i think we are very excited to bring this feature with our customers and most importantly we would request customers to choose continuous backup as uh their way rather than the the default which is right now periodic backup because that provides the ability to restore to a point of time are restored to any point of time within the last 30 days so and it is completely self-serviced across powershell portal as well as cli or even arm if you are a sucker for that so we would request you to use this i'm a sucker for arm everybody knows everybody knows all right well golden thank you so much uh great show this week uh really excited about our new continuous backup and point in time restore capabilities uh good luck uh bringing this uh to ga i know customers are gonna be pretty excited and uh want to sign up i'll be curious to see how many how many sign ups you get for uh your new preview uh to uh to go and do all the conversion from uh the periodic to the continuous so uh yeah thank you very much yeah you're welcome uh thank you everyone uh for joining us this week uh next week i've got a really special guest a person on our team uh mae chin she is a engineering manager uh and she has a unique uh she's building something really unique she's building this new compute layer within cosmos db that's being used for a bunch of really cool services you may have heard of our new integrated cache is built on this we've also got our new apache cassandra managed uh what is it called managed instance yeah apache cassandra managed instance yes uh there's another piece on there uh also provides the dedicated gateway uh as well for customers uh so if you're not familiar like cosmos is a multi-tenant service uh and what this is doing is providing kind of a single tenant uh compute uh layer uh for you to do things like have an integrated cache or do host apache cassandra provide this dedicated gateway so anyway i want to give you kind of a behind-the-scenes look uh and meet uh one of the engineering managers responsible for building this brand new kind of uh piece of cosmos if you will it's kind of uh it's kind of very different and uh just find out what it takes to kind of build services like this how to operationalize them uh just give you kind of a behind the scenes uh look into how we how we build and run cosmos db so i hope you'll join us next week uh for that and uh again govan thank you so much uh for talking about continuous backup thank you very much and hope to see a lot of folks on the next one oh wait hold on i got some maybe some questions uh nope just people thanking us oh i gotta i do have a question is there any impact uh if we configured cosmos db with uh private link i don't think no at the beginning it doesn't matter because all we're doing is just restoring just the data right so yeah yeah yeah so okay well there we go got all the questions covered uh thank you again everyone uh and i hope to see you next week here on cosmos db live tv sure thank you thank you [Music] so
Info
Channel: Azure Cosmos DB
Views: 393
Rating: undefined out of 5
Keywords:
Id: ggC24Pm7cME
Channel Id: undefined
Length: 41min 6sec (2466 seconds)
Published: Fri Jun 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.