Cohesity Anywhere Driving to the Cloud with Sai Mukundan and Radhani Guturi

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so quick level said we saw so far about obviously the vision but also the span FS and then God have covered some of the elements of how we can support multi-protocol axis right taking the iceberg analogy that only just skimmed the iceberg as far as what you just saw what you're going to see is you know the ability for cohesive to work in the cloud integrate with the cloud as well as you know run our software in the cloud right so broadly speaking when we look at use cases there are three use cases that you know we can bucket what we have heard from customers right the first one is you saw the ability for data to be backed up onto cohesive now when we take these virtual machines customers are always interested in the ability to you know now provision them either on demand or on a policy basis for testing purposes in the cloud now that's that's use case number one that we have heard from customers the the challenge here is obviously the fact that in the on-premises world you are dealing with you know one kind of format let's say the VMDK format and then depending on the cloud vendor of your choice or preference you're dealing with different formats right whether it's EMI in AWS in Azure it's PhDs and Google has their own format so the use case number one is focused on how you know Co he City can enable customers to take them from data without any software of ours really running in the cloud and provide the ability to do test and F right so that's what we hear from customers number one the number two the one of the second the second use case is we talked a lot about you know long term retention and moet alluded to that you know as part of the broader vision the ability to leverage the likes of Azure blobs AWS s3 glacier and Google storage class offerings as well for long term retention now idea here is you want to keep a certain amount of data on premises obviously in order to be able to recover quickly but then seamlessly also recover data that is in the cloud one of the things that you will also see from from you know emerge across all these things is I think Ray alluded to this fact earlier about what is it that we index so the ability for us are to index the VMs the files within those VM decays will also enable a lot of the search and recovery mechanisms as we walk through through the demo and I'm hoping to not run into some of the issues with with with our own demo right then the last thing is this hybrid cloud scenario this is the ability for cohesive to run the same software that we run on premise we call that the data platform and take it and run it in the cloud and we call that the cloud edition so the idea is you know customers are always looking at hey this is all great on-premises I have you know vendors who can do that but what is what is it that you can offer in the cloud and they're looking for a solution that universally addresses this this addresses this both from an on-premises as well as a cloud perspective so we'll talk about the cloud edition ability an idea here is again a test of disaster recovery but then also protection of your in cloud assets in order to be you know a complete Dior solution from a failed back perspective so with that let's dive into each one of these use cases and we'll follow that up with some demos as well and then deep dive on the architecture right so the first use case is you saw this aspect here about the VMs landing on the platform the idea now is enable those very same virtual machines to be spun up as as your instances or ec2 instances in AWS case and spin them up as VMs in the cloud now the challenge here obviously is the fact that this step number two is is the key right in the ability to convert and then be able to deploy in the cloud so that is what you know we really you know enable customers from sort of like take your on-premise infrastructure and then now run it in the cloud perspective for the purpose of this demo I'm going to show an approach which is on-demand so that we can actually walk through the process of what is the information that we gather and enable this conversion and deployment this can also be set as a policy for instance you know cuz we have heard from customers that I I want to back it up let's say every day and then he say three to four copies of the last three to four copies of my data in the cloud that I can then quickly spin up as needed right so that can be all set up as through the through the policy controllers that we have in our platform and you can hurt the the virtual machine data to whatever format runs on the cloud sorry I missed the first so do you convert the virtual machine automatically from the local VMware environment to what could run in the cloud yeah so there are two parts to that conversion right so basically what happens is you obviously have the OS and then you have the Associated data right so the conversion depends on which cloud vendor your you're looking at because there are certain tools that the cloud vendors provide that we leverage in in the case of let's say a short we can actually do the conversion ourselves because the VM vdk to VHD conversion is a lot more seamless there's most of mostly an ad or whatever kind of difference right so the idea is then we we we have obviously indexed the data again back to the indexing and the power of it right because we have indexed it we know it's either a Windows or a Linux and then we also identify what what is it that is the boot disk in this case trial but I think thinking about AWS there's a kind of a matrix to guarantee that the source machine could be launched in the cloud with this method this is true yes yes yes so because you know not only not only the workload that could be migrated in this in this ball in starting from for example p.m. that is not running was compatible with the AMIA ami or whatever in a double yes yeah so then yeah that's kind of what I was getting at which is in the agile case we take care of the conversion in the AWS case there are no difficult there are a little bit there are tools and they can take actually call the import/export capability so we leverage that which takes care of the underlying OS conversion itself is AMI and then because we know what the other data discs are we can go ahead and attach it to those very same instances okay right so ultimately what net net what happens is these VMs now have a life of their own in the cloud and you have successfully taken them from a non trim to the cloud format right so this is use case number one and in that case you you've mounted the data for the VMS on your cohesive cloud platform is that not yet so in this scenario you can see that all I have shown in this picture is the VMS running in the cloud so the worst data for the VMS yeah so the data actually gets copied to let's say the AWS case to the to an s3 bucket on a temporary s3 bucket that where that gets registered and then after the conversion is is done then those temp images are deleted and then the VM is now spun up as an ec2 instance which with attach EBS volumes and so the the similar equivalent would happen in the azure case as well we would spin up as your compute and attach the the standard disks as as as your calls it so that's that and you convert the data from an s3 object to an EBS data structure or some time yeah so like AWS itself provides import snapshot which is different from import instant so that can take care of raw disk and then convert it to a EBS snapshot so then we can once the snapshot is created we can just attach the second use case is I want to split it up into two parts right one is the retention and the ability to send data to the cloud in an optimized way and then the Recovery Act so in this scenario what's happening is on this picture you again have the data that is landing on the on premises you set a retention policy again and then based on the policy the data will also get moved to just the cloud storage in this case so what we are actually using is the likes of AWS s3 glacier as your blobs GCS regional multi-regional near line cold line right those are the those are the storage classes that really come into play from a long-term retention standpoint and Rodney will get more into the the you know the architecture in terms of how we actually deduct this data because again to move its point earlier the storage efficiencies that we offer on premises we maintain and thank you that bring customers the same ability in terms of how we send the data and store the data in the cloud as an extension to this obviously it's great that the data is there but you need to be able to recover back right upon demand you can the the use case of you know recovering it back to the cluster where it came from it's pretty straightforward it's it's pretty you know seamless right but it can also you can also end up in a scenario where let's say you know you're now want to spin these the data up whether it's virtual machines or other data in net new into a net new cohesive platform you almost want to lose it is either a seeding mechanism or the fact that you just want to do say back to that test abuse case on a net new Kohi city right so the idea is then you can spin up one of our instances this is the virtual instance this is a physical box regarding you thank you regarding seeding who are you doing that is there any mechanism for example the company has a lot of data and wants to import that in the cloud yeah do we have to use standard for example I do edible us mechanisms or how do you do do your first something to do that yeah so we have seen to two ways in which customers actually do that one is for customers that you know have a pretty good connectivity you like say a direct connect with AWS or an Express route with Azure then much of that seeding can actually be done over the van right given that they have pretty good in a direct connect with the data centers of the cloud providers in cases where either that is not possible or just the fact that you know the initial data size is pretty big then we have actually customers who use the snowball capability from AWS and then as a matter of fact we are also working very closely with Azure or who have a similar offering you know coing GA pretty soon it's called the azure data box and so we are working with that team as well in order to enable it so the way it works is everything we talked about in terms of getting the data now up there in the cloud actually goes into this seeding appliance so to speak on premise so that's you know reasonably fast and can handle pretty high data sizes and then those those appliances are then shipped off to either AWS or ashore and then Google also actually as a matter of fact has their own then they would then hydrate the data and put it in the customer storage account and once it's available then we take advantage of the fact that there are events that we can monitor to say that the data is now available in their account and from that point on we can just do the incrementals that go up into the cloud and Rodney will actually be covering the sort of the inner workings of how we use that as a reference and then subsequently we provide the D Dupin incremental capability for example let's imagine we already have the data which is there already let's say somehow contrast did you whatever no we're here okay so the data that you're getting into the snowball device is already treated somehow by adversity or is that raw data which hasn't been processes and indeed you tore something like that no it has already been processed okay so the data that goes into the snowball exactly so when you heat race it's on the cloud provider side it's already kind of ready to be consumed and managed by a biker exactly yeah and thanks for asking the question it's good to have the rest of the audience go back to where I was in terms of the recovery we can recover it to you know a virtual appliance and a new physical cluster or spin up our software in the cloud which is the cloud edition and provide the same elements of like the granular approach of being recovering of recovering the data the key here is granular and the ability to search and recover right because a it's hard to find the data right we all know we have a number of files images that we create on our own personal machines and then pretty soon we have no idea you know what what we should really looking for and so that is the ease of use but more importantly the efficiency also is important because the cloud vendors mean a charge by data egress so you want to be really clear about what is it that you need to pull back whether it's an individual VM or a set of virtual machines or even to the granularity of files right so that's again the power in our platform and the technology behind so with that I'm we're going to get a little bit into the weeds in terms of the technology and then we'll come back and do a demo thanks sorry I'm Rodney I'm senior staff engineer at Koh City and I work in the cloud team so psy has briefly talked about the long term retention use case and I'm going to spend some more time on this particular use case and explain how our platform efficiently deduces the data before writing it to the cloud and also how we support the granular recovery from the cloud so as we seen earlier so basically the customers would come to our platform and then create some policy that dictates how frequently we back up the data and then how frequently we archive it to the cloud so as per that setting the city would first back up the primary workloads and once the data low ants on curiosity we can take those backup images and store it in the cloud as well so the data could be virtual machines physical servers databases whatever we support from our platform so if it is tape we would have simply taken these full images and we would have written them in a sequential manner but with cloud we can leverage our d-duck technology so that we can reduce the amount of storage we consume in the cloud which in turn reduces the storage cost for our customers so let us say this is the first image backup image a that we are going to archive because this is the first-ever image we have to cede its data obviously but while we are doing that what we do is we deduct this data and then we build this index cache locally so this index is where we keep track of all the data blocks that exists in the cloud for each of those data blocks we keep some information like its fingerprint and its exact location in the cloud so I mean for a backup like an over Barrett toss or NetBackup or Backup Exec are you looking at the format of the back up stream to determine you know what files were actually in there and stuff or it's not at a file level it's like it's actually at a data block level so we don't need to look at what for the I mean what exactly is there in the file but once the data for example if it is a VMDK file it just bits for us right so in that VMDK file we carve out these d dupe blocks and then for each of those block we yeah I guess I understand that you to have the block itself hashed and then D do P across multiple you know versions and stuff like yes okay yeah so this index is basically we use this index to dedupe the data while archiving this particular image as well as other images that we are going to archive to the cloud so once we determine like which blocks are new which doesn't exist then we are going to transfer that data to the cloud and this particular because it is a first image we are going to call it as our reference archive because the reason is typically the images that we are going to archive subsequently will use most of the data blocks that are sent by this particular archive so after we archive all the data for this particular image what we do is we take this index and we store that also in the club so the idea here is basically even if this cluster which has ceded this data becomes unavailable because of disaster or some other reason now all we have to do is we can simply go through this index and then build the exact same state on any new code city cluster it could be like virtualization cloud edition or some other physical coisa tea appliance so once we rebuild this data then we can basically allow the same level of access from that cluster as well but you can only replace know that back to none deco history yeah only two other curiosity you could simply be a single node virtual Edition could be a temporary virtual edition there so is the is the bucket associated with a cohesive data platform is that how it would work I mean span is multiple I guess the question is gonna have multiple cohesive data platforms being sent to the same necessary bucket or something like that yeah that's possible you can register the same bucket with multiple Coweta boxes but then each Co he city would maintain its own prefix or directory so that we know like what objects are returned by that particular cluster we have unique IDs universally unique IDs that are generated across this and then we use those so just to elaborate on what he said what we have seen with the way customers do it is they typically have if we take since we have talked about a de business or for self or a lot I'll take the example of GCS right just to give you the picture there as well so in the GCS case what we have seen with customers they would have one account let's say for the entire company and under that they would have different projects right and when we and each one you know might be each of the cohesive solutions that they have let's assume they have five sites uni and in fact this is actually a real customer we have and without naming names I'm kind of laying out the scenario for you is they have number of sites all of them talking to this same GCS account with different projects and each for each of these cohesive is is then sending the data to that one account under the various project sort of IDs that they have right so that's the typical configuration that we see with with most customers hopefully that so I mean the data portion and their index portion are separate objects yeah there are separate objects so let's see how other images are process here so let's say now we a is already transferred and now we want to archive images B through K the data that is different is depicted in different colors here so when we are archiving these images we use the index that we have built in the previous step and then we determine which blocks are new which doesn't exist in the cloud and then we transfer only those into the cloud so this way we significantly reduce the amount of storage that is consumed in the cloud and this will basically reduce the storage long-term detention costs for our customers so in addition to reducing the storage consumed we also provide granular recoveries from cloud so our customers can recover data at various granular levels they could recover the an entire image or they could recover a particular virtual machine from an image or they could even recover a single file from an image so all users have to do is they just need to go to our recovery page in the UI they start searching for what they want to recover these should be a file virtual machine our image once they search for the file we show all the results we show all that demo in a bit then they can they'll just create a recovery task once the recovery task is created let's say in this particular case the user is interested in recovering some data from this particular image see here what we do is we read portions of this index and write it into our cache locally we use this index to figure out what exact blocks are needed to be downloaded in order to satisfy the user's request some of those blocks might already exist in the local system so we don't need to really bring those data blocks so once we determine which data blocks are actually needed to be downloaded we download those data blocks from cloud and then combine it with what we have already have in the on-premises cluster and written it back to the user so this will reduce the because we are downloading only the data blocks that don't exist and are also required to satisfy the user requests we reduce the amount of data egress from cloud and it will reduce the egress cost now let's talk about the use case three another use case where basically Co he City follows from the Twitterverse how much time does converting and mounting take for a terabyte database so if it is as you adjust it just depends on the bandwidth network bandwidth because the conversion is pretty quick whereas in case of AWS we depend on its input import/export tool and the import in our experience depending on the size it could take anywhere between one to two hours and in addition to that the transfer call transfer speeds it depends on the size of the VM it depends on on the network and obviously in the like he mentioned in the case of AWS there are some other elements because we are relying on their tool as well so that's hopefully that's the color you were looking for so so this use case allows basically our platform allows seamless movement between on premises and closed as you see this is a hybrid cloud environment so let's say our customers have this physical through city cluster deployed in their on-premises data center and they also have a cloud engine running in cloud now imagine a client application which is reading and writing through a view or volume which is posted on the on premises cluster and they can simply set up a replication between this on-premises cluster and club okay certainly one second before you go through this animation I have a question from from Twitter again and we're being asked if you ever have to send full backup to the cloud again with your solution so we actually provide flexibility to our users they can choose how frequently they want to do periodic pulls if they want to and when they do that even then we do our D do so it's up to our customers so going back to this once the users set up the replication then Co city in the background would seamlessly move the data to our cloud edition and let's say now something happens to this on-premises cluster the data center becomes inaccessible because of disaster or whatever then the client application can simply redirect its request to cloud edition right and they can seamlessly access the data that they have written on the conference's cluster so this for the replication to the cloud cannot be scheduled like during peak times so this is all driven by our policy in the power see they can choose like when to do this replication they can define blackout windows if they don't want to replicate on certain durations can you throttle the bandwidth yeah they can throw a little band and you're going directly to instances of your software and yes CC - let's say or as you're right right they're running and so like I measure I'm Amazon for instance how many nodes is a minimum and how I'm what kind of instances so in AWS three nodes is the minimum and we use 4x laughs so basically the the the idea is we you know we if you D quit it to what you have on the on-prem side or in general you have three components right you have compute you have some metadata that that you that you need some storage for and then you have the capacity to you right essentially when we are picking sizing and figuring out what instances you are trying to optimize it for you know what we need from our platform perspective keeping in mind that obviously you know the the compute has a big bearing on what house you know what it costs for the customer as well right so that's the trade-off that we are basically making trying to keep it as close to the on premise in terms of just the value that our platform provides for customers and then optimizing it for cost element as well are you able to in between replication shut your system down bring it back up the Amazon yes and script them can i automate that yes yes yeah I mean that's almost I would say you know table stakes in the sense of just again back to the cost optimization standpoint right and what we have actually seen is we can leverage it and do it our way and in some cases customers have their own tools that they that they use it used for in terms of just monitoring instances periodically and shutting them on and off so that can very well be used so now pass on to say who will demo some of the use cases that we have talked about a fingers cross guys thanks Rodney so what we're going to do is showcase as much as we can the the demos highlighting all these various different use cases right so the first one that I'm going to showcase here is the archive functionality right I baked basically break it down into three steps that you need to do in terms of achieving the ability to take data on premise and move it into the cloud before we do that this is just a quick dashboard of what our system looks like you obviously saw this in the previous demo as well but I like to break it down in just like three sets of tiles the top one gives you a quick overview into this is the protection SLA is what's good what happened the second tile the middle one there is more about the system itself in terms of the health in terms of the other you know any alerts what's happening with the nodes that are present in the system and so on and then the last one is a quick view into just the performance of the cluster itself sorry the the two of eight views protected what does that mean yeah so if you recall you know Goff's con you know demo where he talked about the ability to create views which have different protocols NFS SMB or object s3 so in this case what is what it essentially implies is there are eight views created in this system of which I have set up a couple of them to also be protected protected can mean just backup for application erasure coded protected but it's it's replicated personally so if I if I actually show show you that and so you have all these various different views so those are the or the views that that we're talking about those eight views and then you'll see that some of them are protected and some of them are not that's that's what it essentially means so back to the three steps in terms of just achieving the archive capability as well right so the step number one is we have anything that is cloud related is considered an external target 2co he city in this case I have a couple of targets registered as Miranda WS but the process is pretty straightforward in terms of how you register a target so you select you give it you give it a name and then up here you have all the various different targets right you have Google AWS ashore and depending on what you pick the credentials vary so we talked about like in Google's case it's the project ID back to the example that I was sharing with you Ray and then if you have to pick the AWS s3 example then you're looking at the bucket name and and so on and just to highlight one other interesting aspect is we also offer direct integration with glacier this I think one of very few vendors who offer direct glacier in a Walt integration as well so that is step number one in terms of just creating your external target right the second step is the policy so I have I'm going to actually create one here so let's give it a name storage field day is cool right you guys agree with that I assume yes so then so here you have a couple of options right there is the backup section in terms of you know how often do you want it backed up let's say we in for the purpose of this be backing it up every day retaining it on premise for let's say 10 days right and then there are a few other options on the backup side one I can for instance highlight block back to Joshua's question on on blackout windows you talked about it from the perspective of replication it can apply very well on the backup side as well right so you could add a blackout window here that says I do not want it backed up let's say between 8 a.m. and 6 p.m. this would tie in nicely with the use case we heard earlier wherein backups only happen in the in the night and then during the day it's the cohesive object store in that in example that was being leveraged for storing data right so this is how you would enforce a policy like that and then then you have the archival which is where we will now see the targets that that by registering right we have a show we had eight of the nurse I'm going to pick that pick one of them here and then select the frequency typically see this frequency to be a lot lower than how often you do backups so let's say once a month and then retain it in the cloud for a little bit longer period of time right and then I just create this policy so there it is right there safety is cool policy the last step is the ability to now create a job that leverages this policy right so I'm going to go into jobs create a job I'm going to select a virtual kind of job area and I'm looking at backing up some and archiving some virtual machines so the first thing is select the policy that we just created as you can see back up Rita in ten days and then archive 2s3 hit next give it a name sticking to the same theme you know we create a job and then view box is a construct where you can think of it as the physical cluster being logically separated depending on the datasets you have for instance you might have one which is heavily you know media a lot of videos and images that typically do not do very well right now compress but not essentially d-do in this case I have one that's media view box that I actually have turned off duplicate and then another demo view box here that has deduplication on now I think there was a question earlier right you know what at what point do you specify sort of like the erasure coding and protection schemes the view box level is would be the answer and then the same--you box is where we are setting up all these other attributes as well so then hit next here I've actually registered a V Center ahead of time so it that shows up here and you will see the power of you know how the ease and simplicity again right I'm now able to come in filling up all these various different hierarchies and data centers and classes available from there and I'm going to start searching right we like I like to use the search functionality so let's say I'm looking for this this Windows 7 vm right I'm able to quickly again you saw how quickly it was to find that rather than having to traverse the hierarchy and things like that right so quickly find that add it to the cart right and then basically just create a job now what's what's happening now is I have basically you know the backups running as you can see and once this is done then the archive will also kick in in this example but you will actually you know you can have your frequency and typically even though I showed you three steps it would really boil down to two for in most customer environments and the reason is because most of them have policies that you know and admins created in terms of what SLS they want right how often does the backup replication archive happen so in that instance really perhaps even the target is created right back to the example of the Google projects that we had so really you're just creating a job at that point and you know assigning it to a Polish or picking a policy that was there and then you're up and running one additional thing I'll also point out in some customers scenarios we have also seen that especially those that are more like say VMware heavy there are you know there's we realize automation and orchestration that they can they can leverage and they use that to automate much of this sort of like life to to the life of the VM where they deploy the VM it can then automatically attach it to a protection policy and then when the VMS are done and it's destroyed then it gets removed from the job as well do you have VRA plugins that you've created it or do you guys have just leverage have customers leveraging api's yeah so we have actually two ways in which customers do it we we provide a plugin that's available Solutions exchange in VMware there are also other third-party players who have built you know the who specialize more in the area of we realized who have built a product so to speak on top of leveraging art plugins for example like Saab Labs is the one that comes to my mind they actually leverage or api's and provide that same functionality using their yeah we realized integration the other integration just sticking to that same VMware theme is also virial as operations we have a large customer I think of it as you know they manage just Co he city and every other a product that they have through we realized with operations which is more of think of it more of as a knock essentially and that integration is also available as a management pack so so so that's basically you know you can see it's it's going through its it's running and then it'll trigger the process of the of the archival as well right so now I'm tied into the archive was the whole recovery mechanism right again the ease of use of how we can recover the data so for that we go into recovery click on recover and let's say we want to recover either a VM or a file I'm actually going to show you more granular recovery and start show you how we can actually recover an individual file so in this case again I'll start searching for some machines here right so let's say I'm interested in something that has a match of a key right but I'm not going to stop there it's it's still a lot of results so I'm going to start applying some filters on this right let's say the filter is I know the job and and then I can I can just pick the job I want add it as a as a filter here and now you know from a lot of results we have right away trimmed it down to a subset of them right and let's say now I can just pick the file you'll see that you know we start with what you want rather than what point in time you want to go back to because you don't know perhaps you know a version of it was deleted it is be back and you don't know which version has the file versus not right so again start with what you want apply those filters and then pick the file that you want in this case I can choose whether to recover it from a local copy which is the sign here or I can choose to recover it from the cloud as well right so again that flexibility and ease of use in terms of hey this is what this is the inventory you can go and recover what you want so that was the the use case which basically is you know tied in the aspect of archive and the ability to recover the data all right the next thing is I wanted to show you the use case which was more around how do you how do you now take this VM that goes back up and then spin it up in the cloud right so let me let me bring that up probably just see which cluster works and and then go with that right again I'm going to show you the sort of the more ad hoc approach of doing it so we can walk through that workflow but you can just easily attach it to a policy and keep it that way as well right so the first thing that would we would do in that case really there are two steps involved is we would go and look at a source in this case you need to register that cloud as a source and it depends on which cloud you are going to leverage if it's a short then we need the subscription details and the access key with that if it's AWS then you would input the the resource name and the key associated with that and the same thing for any other cloud so I have registered an azure source in this case so then I come in here again you will see this same theme start searching for things that I want to want to convert so I'm going to pick that VM that we had sent to this cluster so let's say it's this Windows 7 VM here is where you will see the magic sort of play out right I can now we renamed the VM give it a prefix and suffix and all that but I can now see the Azure subscription show up as a source and when I pick that it's going to allow me to select all the various different options these are the various resource groups that I have in Azure say and pick this one then I picked some pure options this would change depending on whether which cloud vendor you are talking about then the storage account associated with with this with this meaning this is our environment we have a number of these we layer on with obviously you probably have a lot less in a real customer scenario and then give a pick the virtual network the subnet and then again this this little pencil icon shows you you know what are the various points in time available have the ability to be able to able to select that right and then when I hit the finish button is what like you know the the task will get kicked off and then and then you know it will go through the conversion and be available in in the I Shore account and once it's available then you can log in and access it basically right what I also wanted to highlight is I showed you just one VM in this case but we are not we don't limited you limit you to one you can take the entire job move it over and that in that case we are doing a number of VMs at a time right one other thing I think ray you are the one who brought up this question earlier in response to my comment about hey we index the data right so you saw the ability of like you know the indexing to show VMs files and how we were able to do that this was not planned but I want to also show you the analytics workbench that moet alluded to in terms of how can we leverage the same thing in order to do some analysis of data within files right so so we have this workbench like he was referring to and the workbench can be leveraged in one of two ways right you can either use some of the the apps that we have already built for you very simple ones such as you know compressing videos or finding a specific pattern in a particular file or if the platform basically provides the ability to you know create your own apps on using you know I believe it's just JavaScript that you can just create your apps in and then those are available for you right so for this purpose I'm actually going to leverage this pattern finder right so so find something like social security number or something exactly right that was what I was and I was going to try in a time time on I'm on the wrong I'm on the the cluster that I should right so will will make will do that live rate how about that and we'll see how that how that plays out right so I actually have this pattern finder that was run you know some time back but back to your question about social security interestingly that was the pattern that we were looking for here if you see looking at the format of the data in this case what I'm saying is I'm looking for something that has you know a 3-2 numbers and then a four number thing right so let's see how we can actually run this app so let's say I went back to the view concept I want to pick certain files in a view I don't know which view box I want so I have the drop down here that's convenient for me select that I don't know any view information and I just I think God has created something that is SMB only so I'm going to start searching thankfully he has thanks i/o and then I'm going to say what is the file type right so let's say it's text files and then I add add this filter to it and now I just put this pattern in there right give it a path where the results have to land so let's say I say Map Reduce and then resolves sfd school right and then basically when I hit this this Run button is going to go ahead and parse the text files find that particular filter and then you know put the results for me in that directory right is that something you can save and rerun multiple times or you have to do that every time you want to run that yeah you can actually save and rerun this since we are actually not planned for this I'm just doing it so [Music] yeah that's a great question the reason I say that is as we all know you know gdpr is coming right it's right down the corner I think is a couple of months away so in those cases you want the ability to have these things you know prepackaged and say I'm looking for these patterns so that's something that we are actually working on pretty actively not only just from a GDP our standpoint but back to your point it will be very useful for you know just in other cases to find certain and I think the other thing is is that you might want to be able to import them because I think there's going to be a huge market and people either crowdsourcing or selling pattern search patterns like this okay I think that's that's already music to our sales guys ears say save and you know a package this and and think of other ways in which we can go to market with this so thanks for that feedback we really appreciate it can you remove the file from the backup or are you just being essentially just showing it where it's at or that it exists and we need to actually print that data on our own so right now this app it is the way it's built is you know to find all these files but as part of some of the work that's going into meeting particularly gdpr which will be the other elements as well will provide that second level of capability one thing that our platform already provides if I can go back to that views you will see that I there are certain views here that have this lock icon next to it because one of the use cases we have seen is you know customers are looking for worm capability the ability to say I am once the data is there this data is not touched write once read money may write so we actually engaged with Cohasset and have got our platform certified to be a CC 17 a - 4 F compliant which essentially says you are warm and that's what this this lock capability is doing to say I've taken a particular view and then I have now put an expiration date on on it to say up until that time no and you know the data is really you know you will not be modified on the on the platform is there a way to set permissions around who can actually search for yeah that was my next question and what granularity would that be yeah so that actually comes in with our the functionality wherein we are we are able to say who has access to certain things through as part of the our back so I'm I'm just going to try and see if we can we can provide that visibility into that as well so I believe that's under admin in terms of access management so you would go in here in this case I have just an admin but I there are roles that our platform comes predefined with you have admin operator self-service and the last one the data security role ties in with the worm functionality because you want separation of duties right somebody is creating data but somebody else is actually the enforcement officer in terms of when you know I it's the the data is no longer say in the legal hold and it can be the worm can be released so that's really the data security role so you can use these prepackaged roles or you can actually create your own role which essentially says I create my sfd school role that then provides Joshua in this case since you are all cool will provide you all the access but you can create your your own role right you get that you get and so the punch function that you invoked that pattern matching solution is associated with a role or I mean how does that how these role are back supply support for the the workbench pattern matching for instance so in general the the way our back works is you you provide access to certain functionality within the platform to different groups and it's the workbench a part of the platform functionalities are outside of it it is for very much part of the platform functionality so the way we have seen customers do it is typically they have you know groups ad groups already created so they would sort of map those Active Directory groups into specific roles that they want certain users or set of users associated with cohesively block but if it's tied to functions which is what I saw there so I couldn't say something like someone in this rule can only see us sourced or data like it's just based on they have action they see menu dropdowns or something right so the varied works is we we have the back to that concept of the view box and the views right we can actually filter HANA exactly provide you know if the ability to be actually have the ability to say certain sets of users have only access to that so that you know that you do through the view is exactly okay
Info
Channel: Tech Field Day
Views: 1,239
Rating: 5 out of 5
Keywords: Tech, Field, Day, Tech Field Day, TFD, Storage Field Day, SFD, Storage Field Day 15, SFD15, Cohesity, Sai Mukundan, Radhani Guturi
Id: _664AuihMlE
Channel Id: undefined
Length: 53min 19sec (3199 seconds)
Published: Mon Mar 12 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.