Data Warehousing Migrations: Lessons from Home Depot (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] TINO TERESHKO: Hello. Thank you for joining. My name is Tino Tereshko, and I'm a product manager on Google BigQuery. Did anybody see that keynote? The CTO of Go-JEK arrived on stage in a moped. That was amazing. Well, I'm joined onstage today with folks from Home Depot, Rick and Kevin. I am going to disappoint you guys. They're not going to make their arrival on a lawn mower. They're just going to walk onstage. Sorry. We didn't come prepared. But in all seriousness, this is a fantastic story. I'm looking forward to hearing Rick and Kevin talk about it, because one thing is to be a startup, to be born in the cloud and to build your infrastructure with bespoke requirements. The other is to have a complex, multinational organization with online, with mobile, with brick-and-mortar presence, and hundreds of thousands of SKUs and professional services, and many, many years of technology, innovation, and really smart engineers. And how do you move that into the modern world? [INAUDIBLE] Rick and Kevin are going to be talking about. And so what we're going to talk about is the concept of a data warehouse. It's a really novel idea, right? We're going to take all our data, and we're going to put it into one place, and then we're going to correlate and analyze it, and we're to be data driven. Awesome. But the reality is that data warehousing is really difficult. It's challenging. The complexity piles on as your business grows, especially, right? It gets really, really difficult to start getting business value out of technology, so much so that technology can stand in the way of your business, which is the last thing you want it to do. Technology should be an accelerator of your business. So it's really, with the advent of Cloud, and cloud-native technologies, and the scalability and high level of manageability, maybe, perhaps, it's time to rethink what a data warehouse really means. So BigQuery is the world's only serverless data warehouse. Serverless, of course, is a buzzword. But what do we mean by that? We mean that we entirely abstract away hardware. We provide a very, very high level of manageability automation, API first. We provide virtually unlimited scalability, low effort, low maintenance, to reach and maintain peak performance, and so on and so forth, having the ability to share data without moving the data around, bringing people to the data rather than the other way around. And we really have evolved as a service over the past six years. These are some of the things that we've delivered as features to our customers over the past years that really make it easier for folks to run a data warehouse on top of BigQuery. Well, today, we announced a number of really, really interesting features. And I'll walk you guys through some of these. Of course, you've heard about declarative machine learning in BigQuery, which brings machine learning to the analyst. Folks don't need to know TensorFlow, or Keras, or any of these really, really complex technologies. They can just use plain, old SQL to prototype and train machine learning models. Let's talk about one of those features-- clustering. In this particular example, I have a query that is selecting a date and user ID. Well, clustering increases data locality for high-cardinality fields so that if you apply clustering on your tables, you get vast improvements in cost, vast improvements in efficiency or performance. In some cases, the performance can be 10 or 100 times better than without clustering. And actually, at 3 o'clock, Jordan Tigani and Lloyd Tabb are going to be doing a live demonstration of clustering. You should not miss it. We also announced a new BigQuery UI. So it's material design. It's in the Cloud Console. It's localized to all the different regions that don't necessarily speak English, with other great benefits-- for example, Data Studio Explore, which allows you to pop out your results of your queries into Data Studio for interactive pivoting and further analysis. One feature that we haven't really discussed much, but we've had for a while, and Home Depot will talk about it a little bit in a little bit more detail is hierarchical reservations. So for complex enterprises that need isolation and resource guarantees that would like to have more control, more knobs in their enterprise data warehouse, are able to create these hierarchical reservation trees that really guarantee resources for specific purposes. But also, the beauty of this is that these aren't silos of resources. If a data science project is idle for some reason-- the data science team went on a vacation, and those resources are unused, transparently, seamlessly, down to subsecond latency, those resources become available for the rest of the organization to use. So as other teams continue to scale on BigQuery, the entire organization benefits. You get economies of scale. And of course, one very challenging aspect of running a data warehouse is ingest. We hear this from customers all the time. How do I get data into my data warehouse? How do I get it so that this data is fresh and available right away, and it doesn't compete with my query capacity? And we've really invested heavily in our ingest capabilities. Our batch ingest is free. It's powerful. We have customers loading petabytes of data into BigQuery every single day without affecting their query performance even one bit. And of course, we have a streaming API on the other side, which gives you access to your data in the real time. But enterprises heavily rely not just on native bespoke tools provided by cloud vendors, but on their partner ecosystem. So if you're a large organization that have a history of technology innovation, you are probably using lots of, lots of partners, lots of vendors. And we will continue to invest in our partner ecosystem. And finally, we have lots of customers from all kinds of scales, all kinds of complexities, all kinds of industries and ranges. And they're all eager to discuss, to share their stories and their journeys with you guys here. And potentially, hopefully, next year, some of your logos will be up here as well. Well without further ado, I'm going to welcome Rick and Kevin onstage to share their story with US. [APPLAUSE] RICK RAMAKER: Good morning, everybody. Everyone enjoying the conference so far? And thanks to you, Tino, for kicking us off here and helping us put our presentation together. So my name's Rick Ramaker. I've been with Home Depot about seven years. And I'm part of the IT team that's responsible for data and analytics for the enterprise. So my team is responsible for all the data engineers that are moving all the data into the warehouse and making it available for consumption across all the different business areas. Home Depot is doing a lot of things on the Google Cloud. We were here last year. You heard us talk a little bit about our dot-com sites. We've got a lot of other groups using the Google Cloud as well. Today, we're going to talk a little bit about our journey in the data and analytics space. So first slide-- so hopefully, everyone here knows Home Depot. So we're the number-one home improvement retailer. We got a bunch of cool stats on the slide about the size of our organization. If you haven't been to Home Depot lately, I encourage you to check it out. It has changed quite a bit. We got a lot of cool, and new, and innovative products on the floor. We're growing our services business. So if you're not a do-it-yourself-type person, and you want a do-it-for-me, we have that available as well. We are starting to put lockers in our stores. So if you buy online and pick up in the store, you can just get-- you can just pick up your products in the lockers. Lots of good things going on. We're expanding in the softline business. Lots of really cool things happening at the Home Depot. But in short, we're a big, we're a complex, and we're a very data-driven organization for a lot of our decision making, which kind of makes my job really, really cool-- most days. [LAUGHTER] So our plan is to talk a little bit about what led us to migrating our warehouse to the cloud. Kevin will jump on stage and talk through a lot of the technology and architecture decisions. And then we'll wrap up with some of the wins and the learnings along the way. So to kick it off here, a little bit about the Home Depot analytics landscape. So we run our business on the EDW, on our on-prem EDW today. So we have hourly sales reporting that needs to go out to the field. We have all of our supply chain forecasts for making sure all the right products make it to the stores. We have all of our marketing campaigns that go out to all our consumers. That comes off the data in the EDW. All our event reporting for Black Friday, red, white, and blue sales, all of the store performance scorecards for Monday morning, reporting at the stores-- all of this is coming off our EDW and needs to be there on time and be accurate every day. So we have very strict SLAs. And we have a wonderful business community that lets us know whenever we miss it and we don't have our data there that's accurate. So that's a little bit of our world. But it is very operational and part of what we do all day, every day. Our business community is also very invested in this space. We have a very-- self-service is very encouraged with our business teams. Our model is, we kind of have a two-fold model. We have more of our data consumers. And those are the folks that are running the stores and running the merchandising. They're typically using Excel. They're using MicroStrategy to get the data they need to do their business. And then we have the data analysts and data science community. They're utilizing Tableau, R, SAS, Python, all the different tools, to drive more of the ad hoc solutions on a regular basis. But one thing that we realized is that the demand was growing, and it wasn't going to stop. So we were having a hard time managing all that on our on-prem environment. We were told our on-prem environment was one of the busiest that's out there in the United States. And kind of proud of that, but it also presented some additional problems. So that led us to a decision point here. All right. That led us to a decision point for where we wanted to go going forward. So our on-prem solution had served us very well. It helped us get to a consolidated single EDW. We were finding a ton of value. But we were spending a lot of time on capacity management. And every time it came to refresh the hardware, there was a large cost outlay that we had to work through. And two to three years prior to this, we had to do a capacity expansion. And that was about five to six months worth of planning. It was a three-day outage. There was a lot of work to make that all happen. And then we used up all that capacity in about a year to two years. And that capacity was gone. So we were now to the point we had to do another capacity expansion with another-- with all of that work ahead of us. And then on top of that, our business teams are saying, all the technology is expanding. How do we start doing more and more complex analytics in that space? So that's what led us to having to do something a little bit different, and really, a migration to modern analytics. So we started taking a look at what that would look like to move to a modern analytics platform. And we know we wanted a world-class platform, let us use managed Python notebooks, get into the machine learning. We wanted to make sure our analyst community stayed very self-sufficient. And we wanted to make sure we were solving for a lot of the challenges on the development side as well. So those were our objectives. So we did go through a pretty in-depth analysis of the different platforms that are available. And we did POCs with the-- you name them, we did it with them, for the most part. We also did a very detailed ROI. So yes, cost was a big piece of this. And we did do all our homework there, from an ROI perspective, to make sure we understood all the different cost differences. And we also spent a lot of time collaborating across all of the key stakeholders. Security was probably one of the biggest ones we worked with, along with our infrastructure teams, as well a lot of the different business teams as well. And as a result of all of that analysis, obviously, that led us to select the Google Cloud Platform, and specifically BigQuery, to help us with this journey. So I'm going to turn it over to Kevin. And he's going to walk through some of the technology and architecture decisions that we made. KEVIN SCHOLZ: Excellent. Thank you. [APPLAUSE] So as Rick said, this was a once-in-a-lifetime opportunity. If you look at large enterprises and legacy data warehouses, these are large investments, a lot of data, a lot of people basically relying on this. So if you're about to pick up and change everything that you've known and move it, we basically said, let's start with our wish list. What can we do? So we put together this wish list of items. And we categorized it for you. So what we decided to do was, if we were going to go through this effort, we were going to go all in. We weren't just going to pick up some stuff and move it, do some [? IAS ?] stuff, and try to move things in. We decided, if we're going to do this, we're going to modernize everything. So we right away said, we don't want a port. We knew this was a huge step. So if we're going to get to a modern platform, it required that level of trust and that level of a leap to get there. The next thing we did was, as Rick said, we run the business on this. This is every day. It's required. So we needed a date. A lot of agile practices, you try to work on iterations, and you deliver when you can. But we have real dates. We have real timelines. And we don't want to have people between systems for that long. So we chose a time. And we gave ourselves a couple of years to get there. This is not a minor undertaking. But we knew, as part of that, as well, is we needed to be agile in what we were doing. We were going to learn as we go. We also were working with the Google team on features that weren't available yet in BigQuery. And we were pushing in terms of things that we needed to run our business on the platform. So we knew part of that would be to be agile. And we would have to change along the way. When we look at the people that were involved-- so we have not only our analysts, but our folks that run the platform-- we said, what do we want to do with them? So if we're going to make this big change, we have to first adopt new practices. So part of Home Depot's general change has been, we want to adopt full-stack teams, localized, very common. But in a data warehouse space, that's kind of a novel or a newer idea. It's happening more and more now. But a couple of years ago, it wasn't very prevalent. We also wanted to invest in all the newest technologies, so the cloud involved, as well as all the things surrounding that. And we wanted to make sure that no one was left behind. So we wanted to have a learning path for all of the folks that knew maybe the old set of tools or the old set of technologies to move over to the new platform and have the time to learn that platform. When you look at the tech, we made some big decisions here. Our teams on prem that run our data-- in our data centers, all of our technologies-- they do a great job. But that's a large investment and a lot amount of time. So we wanted to use as much managed services as we could. So as Rick said, when we evaluated different cloud offerings, that was a huge thing. We didn't want to have all the stress of the DevOps. It's hard enough to make sure the data is right, delivered on time, and with all the changing requirements. And the biggest takeaway from that is, we wanted to scale up and down faster. Rick showed you the chart of the CPU use, right? If we wanted to grow that very quickly, that's a multi-month effort-- large dollar times. With the cloud, you're able to scale it much quicker. So that was a huge win for us. One of the things we also decided to do was around ETL tools. So there's a lot of great tools out there-- Informatica, Ab Initio-- pick your favorite one. We decided, as part of this change, as well, that if we're going to be modernizing the platform, that we were going to modernize the way that we brought the data into the platform. So we took a much more developer-centric role. So today, where we are, most of the ETL and the data pipelines are co-driven, and they're developer driven, and they're using modern tools. And we don't really focus on some of those classic tools. There's nothing wrong with them. It's just the choice that we made. So from this wish list, we basically said, OK, how can we achieve this? So I'll fast forward the answer. Everything on here, we did. So luckily, we had great support from our leadership team. And it drove us to an architecture that we'll walk you through. So we're going to take a little bit of time with this and walk you through all the different pieces of how we did this. So let's start with the capture part. So looking at this from left to right-- so down at the bottom, our OLTP sources. OLTP is our operational systems, where you place orders, sales through the front-end registers, all of the interactions with the company. All of those are going to remain in those operational systems. They may move over time, but they are where they are. Today, we have thousands of databases, file sets all over the place, in prem, from customers, from partners, as well as streaming messages. We have a lot of things that are moving all the time. All these things have dependencies. And they're all scheduled. And they have tight timelines on them. So our on-prem scheduling is Tivoli Workload Scheduler. So one of our requirements is that we have to wait-- when a workload is done, we have to be able to pick that up from a dependency point of view and then be able to move it. For our main scheduling, though, within the cloud, we didn't want to bring that product out. So we use Jenkins as a way-- and we script that environment to basically act as that bridge into the way that we're moving. So here, you'll see Hadoop with a fancy name data factory. That's our name. So one of the things we wanted to do with all the complexity in the data center, with all these databases, with all these queues and all these files moving around-- If you look at that surface area, there is a lot of systems, and firewalls, and a lot of things you have to jump through. So we didn't want to expose all of that, and all that complexity, and all that work each time a team needed to move a different data source out to the cloud. We have invested in Hadoop a long time ago. So we have a large on-prem Hadoop infrastructure. So the teams knew how to move data to that platform. It was also very well secured, isolated, segmented, and it scaled really well on prem. So we made the choice of using that platform as basically saying, get the data here. And all the teams knew how to do that. It's relatively easy within the company. If you can get it there, we can then create a factory or a pipeline to move it out. It's bidirectional. So that, basically, is hiding all of our complexity of our data center and all the complexity of those other systems. I'd say, when we first started, we had a lot of teams. We built, probably, four or five different versions of this. But we've centralized on one version, because the more we can have one way of doing it, it makes it easier for all those dependencies. And we don't have teams spending time on plumbing code. It can basically move a lot faster. So the capture side-- that's kind of the story. We are making use of that large infrastructure we have. So our next part is really, hey, how do we get it out to the cloud? So there's many ways companies can peer with Google. We use our own way. It doesn't really matter the company we're going through. But we have large, 10G connections through multiple connection points from our data centers and our partners into Google. The first thing we do is, we basically wanted to get the data to the cloud, so a classic data lake, right? The buzz word around getting it there. But for us, what that really means is, source-similar data from those systems, in the format that it was, typically landed into a GCS storage bucket or into BigQuery table in its raw format. So that's source-similar, minimal changes. So I'll give you a sample change. If it's coming out of an older mainframe system-- the EBCDIC to ASCII translations have been done, any character set adjustments. Those things, we take care of. But once it lands there, it's not been modified. It's basically as it was from that source system. So that's a relatively easy step. We've got that on-prem piece. We move it out. Now, we're really into the processing phase. We've got it there. So the next part is classic ETL or ELT transformations that you have to do on those data sets, as well as we have streaming data sets. So we have a lot of sources of sales, of inventory, of order stuff that's basically moving out of those source systems in real time. So we wanted that to flow into the system, as well, in real time. So we have all these different ways. Most of our processing that we do today, it's either-- a lot of it is BigQuery within SQL itself. We use Dataflow. We use some custom Java running in App Engine. We have a wide variety of ways. Again, with the full-stack development teams, they have some freedom to choose amongst those platforms based on what's the right use for their use case. So the processing is much more fluid. What we tell the teams is basically, start with BigQuery. If it works there, it's easy. It's very easy to do it in SQL. There's a lot less code to write. You can go a lot faster. If you need something more custom, then basically head that direction if you need to. For streaming, what we did is, on prem, we use a lot of legacy or older systems, like WebSphere MQ or ActiveMQ, different things. We also have some Kafka, as well, in the newer systems. We basically wrote a part that runs on our Hadoop platform that moves that out to Google Pub/Sub. So all of the streaming stuff ends up with all those payloads into the Pub/Sub messages. So from there, that's where a lot of the App Engine pieces pick those up. And then they begin to transfer them into BigQuery. So that's how the streaming works. But then we had two real big parts of that that we wanted to go into. First was what we call our building blocks. So Home Depot-- it's a complex enterprise. We have a lot of domains. So domains, orders, customers, finance, supply chain, merchandising-- there's a lot of different areas. So the building blocks for us are those independent areas of data that are arriving. So there's a large domain in a series of subdomains under those. But they're landing into those, basically, nonintegrated, domain-specific areas. But as they're landing, they're landing in BigQuery. And they're in optimized performance structures that they're able to read from very quickly. And they're able to join. They're still able to be used for any of the downstream processing. But they're nonintegrated with other parts of the different domains. So the next phase of that is, we really take what we call an ADS, Analytical Data Set. It's really working backwards from your analyst or your user in terms of what different domains do they want brought together. So think about it as either a materialized view or a view without actually materializing in terms of just a query set that's driving against the series of those. But most of the time, it's a materialized view. So we create another set of tables. We then bring all the different building block domains together. We'll collapse dimensions. We'll do things to make it very performant. We also use nested structures here. So we'll talk a little bit about how we use nested structures inside of BigQuery to do that. But the idea here is these ADSes, these Analytical Data Sets, and these building blocks are the direct things, the direct data sources, that our analysts are going to use to basically query the data sets and drive the downstream. So they have to be performant. Almost all, if not all of them, are basically in BigQuery today, with very few exceptions. So the next part of that is our use case. So how do our analysts use this? So we've kind of got two different camps. We've got our typical business analysts that are using more structured or reporting tools, standard BI things, Tableau. AtScale, if you're not familiar, is an OLAP cube that reads off of BigQuery. MicroStrategy, which is a more classic reporting, and SSIS, which is cubes using Microsoft technology. Our data scientists are doing much more active, more newer things with ML, Datalabs. And we still have a lot of R and SAS in the environment as well. So they're basically reading out of these data sets. They have requests. And we want these to be self serve. So the Tableau teams-- once we get them out there, we have Tableau running at Google as well. So they are able to get instances running out there. They can connect to the data sets. They can then build self-service reports. If they need data or other components, they can look at the different data sets, pull from any of the values. They can read from them. And then if they don't have the data set, they'll work with our analyst teams. And we can help them get the new data sets brought in. OK, so kind of a quick review of what products do we use? So we use GCS. We use Pub/Sub heavily, Dataflow, BigQuery. We do use Datastore. From our compute side, we use App Engine, GCE, and GKE as well, StackDriver for monitoring. Analytics, we're using Datalab, which is one of Google's products. And we're beta testing Datahub, Dataprep, and BigQuery UI, which was, I think, announced today. But we've been using that for a while. On the Home Depot side, we have written our own tools to basically supplement some of these things that we made it easier for our internal teams to use. We call them the Data Pipeline and Analytics Engine, some SQL editing tools. And we basically provided a data catalog to be able-- so that teams can find the data. We have a large set of data. One of the things that makes Home Depot complex is the wide number of businesses we do under one roof. If you think of paint and tool rental, there are large, complete businesses under our roof, outside of just the pick up the merchandise and go out the front. So those domains make that really complicated. So finding a lot of the data and connecting it together can be a challenge for the business. One small callout here-- when we first started looking at this, and we showed you in the earlier slide, we did have a large Hadoop platform. We do not use Dataproc. We basically don't have much of the Hadoop stuff. We rely on BigQuery for that use case. So we didn't really bring a lot of the Hadoop stuff forward. Nothing wrong with the product. It was just, hey, we said we're doing a leap. So we jumped onto BigQuery. We do have some of our data scientists in some of the teams that do use the product. There's nothing wrong with it. But our core EDW doesn't use it a lot. OK, so what else did we do from an architecture point of view? So if you guys are familiar with any of the EDW things, we did do some changes here. So one thing I'll show you, our slot hierarchy. Tino showed you how that worked. We take credit for pushing a little bit to get that out there. And we'll show you our slot map and what it looks like. So we'll show you that in a second. The advantage of that is, what we've found is that it allows the teams within the company to have a fixed amount monthly that they're willing to-- x number of slots they can pay for. It can be dedicated to them when they need it. So if they need more, they can buy more. And the best part with the hierarchy is if, like he said, if they're not using it, other parts of the company can share. So everybody who goes into that platform or into that structure-- everybody wins, because you'll at least get what you paid for. And chances are, you'll get more. And we use that excess capacity amongst all the different teams all the time. So it allows you to plan for peak, but everybody wins. Another part-- we made a big decision. In our older system, we used a lot of surrogate keys. So you're coming out of the operational systems. There's nothing wrong with surrogate keys. But we were putting surrogate keys on to every table. And then we had translations back to natural keys. So we made a decision when we moved over here that we were going to have everything in its natural key format. We have a couple of structures that still use surrogate keys, when you're combining disparate domains, possibly from different business units where the keys don't match very well. So they're not eliminated. But we've basically reduced them as a standard way that we went in. And we'll talk a little bit about nesting. So we do use nesting. So if you're not familiar, in BigQuery, you can use a nested structure. It allows you-- my favorite example-- think of an order with-- you bought three things. Those three line items can be all within one row as a set of repeating values. And you can query it within SQL. And it's one row that allows you to have more performance in terms of data elimination. And you can get right to that row. We also use nesting to, in some cases, in those analytical data sets, to collapse some of the dimensions, or some of our master data tables, into those. So there might be org structure, or codes tables, or other things that can be present in the data set. So you don't have to do joins. So we use nested structures. And we collapse them into that row as well. The other thing we do is-- the last one is really, Google projects and views. So we use Google projects-- as a company, we have a lot of Google projects. And we use them to our advantage. So a lot of our data sets, as they come in, we have data projects broken up by domain. That's where all of the ETL pipelines, all of the ingest, and all of that works. It's all contained. It's managed by IT. It's delivered out to those other systems. But it's a place that basically controls the data. And you saw, in some of those building blocks, they land there. Anyone can read them. So we grant access to them. But it's all there. But it isolates and separates the different teams so they don't step on each other. So we have about nine of those areas. And we can add a new one if we need as the company grows. We use views as well. So BigQuery provides views. But we use projects as views to allow us to control the access. So we can put different folks, in terms of IAM roles, or things, into different groups, into those projects. And we can control the access. There might be PII fields that they can or can't see, or other data sets that we can use those projects. So we make use of Google's-- just part of GCP projects and views, part of BigQuery, extensively to basically monitor it, so we don't have one big project with all of our data it it. It's just too much. So we make use of that. So here's our hierarchy. Its a little bit crazy. So I'll walk you through it. So the way it works, if you're not familiar with hierarchy-- so there's two ways. In BigQuery, you can do pay as you query. It's a pretty standard way. Or there's BigQuery slot structure. With this, it's a fixed dollar that you can have per month. And you can basically lay out a series of hierarchies. So what we've done is, we have different parts of the companies in different-- they're all off of a root node. So the way hierarchies work is, siblings share first. So anything that's across, if there's extra slots available, it'll share to siblings first. So if I go to the lowest part of the tree on the bottom left, for example, those two nodes in the bottom-left corner will share first. So if either one of them need it, they share amongst themselves before it would go up. And that works for every part of the tree. So what we have is all the different groups in the company, from our dot-com teams, to our engineering teams, to our security teams, that all have different slot allocations-- everybody's off of a root node, because everybody wins. There's no loss here. You always get a guaranteed minimum. And there's a good chance you're going to get more than you paid for. And that's what we found. And it works really well. So if you're familiar with Hadoop, it's very much like the capacity scheduler. It works the same way. But it's really valuable for us. It also, with a guaranteed minimum, it allows you to control your SLAs and gives you more fine grain. So what we do then is, we attach projects into those different buckets. So like I said, Home Depot, we have a lot of projects. And we can attach them at different resource nodes. That resource node then grants them the amount of slots that they need. And if a team needs a different one, and they want to buy-- let's say we've got a general SLA bucket. That's where we put all of our workloads that have guaranteed SLA. If we have a team that has a project that needs a tighter SLA, and we decide we want to break that down and buy a little bit more slots, we could create another node, move that project over, and they can get that guarantee. So I think, when the Google team told us, when they saw this, they went, oh my. This is complex, but it works great. And a good story-- the day we turned this on was the best day we had in the EDW, because before that, we had basically everybody sharing one big pool. And it was-- you couldn't control the minimums. So everyone was competing with everyone. The day that we turned this on, it was the-- we had huge workloads. Everybody's jobs went through very cleanly. And it was painless. So let's talk the other big part of what anyone considering this type of move would hit just like we would. So security is a huge part of everything. I'm not really going to walk through the details of what we did. I'm going to give you our strategy and a strategy that you could use as you're considering this move as well. So almost every company is going to have a good, extensive set of security guidelines that drive based on your business. What we did is, we looked at the data classifications, the separation of duties. We looked at our DLP and exfiltration requirements. And we basically developed a strategy for how we were going to move to the cloud, what we were going to move to the cloud, and the mechanisms by which we secure it. A big part of that is, what we found in talking to a lot of other retailers, a lot of other folks considering this move, is a lot of the IT teams that are more legacy-based companies, when they look at the cloud, this is a scary thing. All of a sudden, we're going to have all this data together in a large area. It's a big deal. So we built a partnership with a security team. And to tell you, having them use the platform as well is a huge plus. So it breaks down the barrier. They're able to do more stuff in terms of security footprint using the same toolset that we're using for this. And it basically worked really well. What we found also is, the adoption rate went up. The fear went down. The understanding really went up. So the fear went down is the wrong way. It's really the understanding of the way the technology works. But the one thing we did work through is, on prem, we have a lot of native tools that we use for security. So when you look at the cloud, we're basically looking at what the new tools are, basically, out at the cloud, and how to adopt those. So there might be, you have x tool on prem that you use, but a tool at Google might be equivalent or better. And we would look at those different tools. So think about the strategy that you would look at when you're considering a move like this. And work with your security teams on that. And our end of this was really better adoption, so faster adoption. But we still put a huge focus and spent a lot of time with our security teams. It's never out of focus. It's just a better approach to go towards that. OK, so if I could give you some stuff, some ideas in terms of how would you start a migration similar to this-- so first, be agile, because what you think you're going to do, it's going to change. We've changed multiple times. And being agile has proved excellent in that case. Evaluate your tools. You're probably using existing tools today. Look at those tools. Bring the ones forward that make sense. Maybe adopt something new. I think our single entry point out of our data center was a huge plus for us. It made it a lot simpler. Keep your schema. So if you have an existing set of schemas, bring them over. Try them in BigQuery. Don't change anything. See if it works. Optimize if it doesn't. When we started, we kind of took an inverted way. I would recommend you just bring it over and move it. Try it. If it doesn't work, then worry about performance. There's lots of dials and things you can move. We did not start by copying the data. We actually tried to go back from source. Would recommend copying from your source, from your existing system. Use managed services. We love them, couldn't recommend them more highly. And work closely with your third-party companies. So a lot of the tool sets may or may not be ready for the cloud. So I'll bring up Rick to bring you through lessons learned and some of our performance wins. [APPLAUSE] RICK RAMAKER: Thanks, Kevin. Now, we'll go back to some of the fun stuff. So what is the result of all of this great work that Kevin just walked through and the team went through? And I will say, in short, it is game changing. And it is, and the-- all the additional enhancements that are coming in that are making it even a better decision for us. So you see some of the performance wins we've seen here. Yeah, I stacked the deck here a little bit. These are our three best improvements that we have. Your mileage may vary a little bit on your actual performance you see. But these are real. We were seeing things that were running for 8, 9, 12 hours that are now running-- that are now completing in a matter of minutes. And we had a ton of workload that we couldn't even run on our on-prem environment, because we just didn't have the capacity there. And that was a cool experience, to be able to see that stuff get completed and be able to run. We actually had a-- it even created a little of a problem for us, because a lot of the teams were so enthralled with all the new stuff they could go do, we had all kinds of folks working on new stuff on the cloud, and we had to get everybody-- don't forget, we have to migrate all of our stuff off the existing platform by the end of this year. So we got folks-- it was a good problem to have. So that's where we are from a performance perspective. From a capacity management perspective, it's also been very game changing for us. I can tell you, we had an outage in the middle of the night. We ran out of memory on one of our VMs. In the old world, that would have been a lot of work to figure out how to get that fixed. In the new world, it was a matter of bringing down the VM, increasing the amount of memory, popping it up back up-- all stuff you guys have probably seen and heard. But for us, that was a pretty cool experience. I'll also say, we have a major deployment going. Last week, even-- it was just last week or two weeks ago, and we were running a little bit behind in getting all of our historical data loaded into the cloud for this release. We worked with our Google team. They swung a bunch of slots over to us in about eight hours notice-- so thank you for that-- and used those slots for us to complete that migration on time. And we had all that data loaded in about a day and a half, versus how long we were planning on that. So from a capacity management perspective, it's been very helpful for us. And then, from a delivery side, we are real, and live, and running on the cloud at a number of key areas today. We have all of our pro reporting-- so all of our-- we have a sales force out in the field working with a lot of our major contracting companies we work with. All of their reporting is available to them running on the cloud. All of our services business, where if you want to have someone do it for you versus doing yourself, and all the-- they're doing a measure for you at your home-- all of their information is available to you on the-- to that team on the cloud. Our CEO Dashboard that goes out every week, with tons of metrics about running our business-- all of that is available and runs on the cloud. We have our sharing of data with all of our vendors that sell into Home Depot, and that we sell their products. All of their data, we pile it out to 75 vendors today so we can collaborate on the data. All of that is available on the cloud as well. And we have all of our clickstream data from all of our websites, about 800 terabytes worth of data, of clickstream data. All of that, also on the cloud. So this is real. This is happening. And we're using it every day for what we're doing. Last thing I want to touch on is some of our learnings. And there are many of them. We could probably have a whole presentation on what not to do. But instead, we'll focus on some of our key learnings. I would say, one of our biggest ones was just recognizing the complexity of the change management of migrating to the cloud. So before we started this, our team, from an IT perspective, was your traditional data warehousing team. So we had a lot of ETL experts. We had a lot of PII experts. We had a lot of SQL experts that were all experts in the specific tools that we were using. And they were some of the best that we had. And we had data modeling teams. We had DBA teams. We had all the different teams that were more of the typical center of excellence models that many organizations utilized. And when we decided to move to the cloud, that changed a lot, and a lot of it from a team perspective. So the team now-- and a lot of the folks are here in the room. They're awesome. They're Java, they're Python, they're SQL, they're full stack, they're SRE mindsets. And that was a major, major change to move that team from where we were to where we are today. And my advice on that one is, put the team first. So this is really, really hard for the team-- all these words, all these new technologies. Easy to say, really hard to execute it. So we did 10% days, gave people a chance to learn different things on the platforms and take their time to go and learn those. We did a ton of training for them in many different areas. You're going to make mistakes. That's OK. Learn from the mistakes and keep moving. And give the developers the opportunity to make their own choices. We didn't dictate, you must do things this way and that way. They tried a lot of different things. And we landed in a pretty good spot. I will say, you need to balance that sometimes, too. At one point, I think we had eight data pipelines built. And we realized that we probably don't need eight data pipelines. So we worked through it and got to a good spot on that. I'll also say, your productivity velocity also takes some time to develop. You're not going to be as productive as you were or want to be on day one. And that's OK. So take your time to get to the point where you have that velocity set for that team. And we didn't even bother putting together a road map on our deliveries until that velocity was set. But at that point, once you have a good velocity, definitely do get a roadmap in place. That really flips the switch from learning mode into, truly, delivery mode. And we did that about the end of last year. And now we're really executing against our roadmap. Second thing I would throw out from a learning is to realize that this change is just as big on all of your consumers of the data. So we spent a lot of time on the IT side, because we're the IT team, and we did a lot of work to get us through this. But at some point, all of our business partners also had equivalent amount of change. And there's a number of those folks in the audience with us here today. So we built out an analytics enablement team. It's a small team, probably about 3, 4, 5 people. But their full-time job all day every day was to help out that analyst community with this migration as well. So they helped out with making sure we're aligned on the naming standards, and how many projects do we need? And when do we hit the views versus when do we hit tables? What training is required? And just helping out with a lot of the overall communication. We're a big company. It's hard to catch everybody. I'm sure most of you guys are using a tool like Slack for your communication. That's been our number-one tool for communicating to folks. And it also helps all of the other people going through this journey to help each other, too. So we don't have to be the ones that answer all the questions. All of that's been a huge, huge help for us. And then find your early adopters and communicate those wins. We've got certain folks in the organization that were all in on this early. And they were great at helping move this journey forward. Last thing I'll call out to you is just, the new Google features that are coming are really game changing as well-- I mean, the improvements in the-- with clustering and partitioning has been super helpful. The joint performance is improving all the time. We had to flatten some structures to get the performance early on. We have to do less and less of that today because of the joint performance. So thanks, Tino, for all of that, and so forth-- the slot reservations and so forth. We're still partnering with them on a little bit more visibility on what's happening on the platform. That's probably the next thing we're trying to get some help with Google on. But for the most part, we're getting what we need. So in closing, as I said, this technology works. It's really been game changing for what we're doing in our environment. We have buy-in across the organization. Everyone is all in on getting us migrated by the end of this year, end of fiscal year. Our partnership with Google has been fantastic. We've learned a lot from them. Hopefully, they learned a little bit from us. And I think we're set up for analytics for years to come. So step one, get your data in place. And then hopefully, next year we can talk about some really cool things we're doing on the ML side. [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 12,455
Rating: 4.8699188 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: r_dwkSrcVaI
Channel Id: undefined
Length: 46min 32sec (2792 seconds)
Published: Thu Jul 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.