AWS re:Invent 2019: [REPEAT 1] Managing your cloud financials as you scale on AWS (ENT204-R1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone can you hear me nice I want to test my mind my name is Abbie are you doing from AWS and join me me today are Patrick and Illya from lifts this is a picture of me back on stage in 2017 reinvent when I was working at Expedia that was when I told my story as an AWS customer and shared best practices that had helped Expedia with cloud cost optimization journey today however I am a business development manager on the cloud economics team at AWS and I'm really excited to be here to talk with you and share even more about cloud Financial Management I don't know if anyone saw me walk into that music but that's just how excited I am so let me start by saying thank you so much for taking the time to attend this session this session is focused on the fundamentals of cloud financial management you'll learn how to enable data visibility within your organization and how to build a cost-conscious culture in your organization's Patrick and Ilya from lyft will also share how they have leveraged data to achieve results and how they make this data available within their organization and how this visibility has influenced actions while they grow their business with AWS please note that this is a 200 level enterprise session and it is not intended to be deeply technical so if you're interested in deep dives or anything that is truly deeply technical I know there are high 11 higher level sessions available at reinvents that might serve that purpose a little better so we're going to cover three key takeaways in this session number one the ability to make the connection between cloud usage data and cloud users fluence actions which then drives results number two when there is a deep understanding of costs and value when that is achieved it is difficult to replicate across the organization without automation manual optimization as your skill is simply not practical so automate whenever possible and number three managing cloud costs is not a once-a-year type of activity it is not a project it is to be treated as a continuous discipline so instead of having reactive war room situations I remember in my days as a customer we had a few of those be proactive by building out a process instead those are the three key takeaways that we're gonna cover here today so let me start what differentiates your organization whether you're startup on enterprise maybe is it is it analyzing data petabytes of data maybe it's delivering video content or maybe just building great mobile apps AWS cloud allows you to focus on those things when you leave the heavy lifting of the underlying infrastructure to AWS and as you focus on those things that differentiate your organization what are your goals as an enterprise or as a company so maybe your goals are you want to keep growing you want better experience for your users and you do not want to slow down innovation AWS can enable you to rebuild core applications or workloads on the cloud to help you continue that growth in your industry whatever that may be however growth does impact costs the question is are you able to operate your workloads or achieve those goals in a cost-efficient manner so here's an example of a growth story and the cost impacts so let's say building great mobile apps is what differentiates your organization let's go with that scenario and you start cloud adoption and things seem to be going really well with a lot of excitement in the air and you start building you start testing you start developing and you do proof of concepts then you see that your costs starting to rise that yellow line is a total cost rising so you make your first reserved instances purchase to get some pricing discounts on your computer services and guess what you notice that the cost is starting to drop right and you go back to business as usual it's a one-time thing castas drop let's move on but as you deploy more applications or even features over time guess what that cost start to rise again maybe you've seen this kind of story before your organization maybe it's familiar to you right as it starts to rise your leadership might start asking a lot of questions to probably don't want to be in that position where you're the one being asked those questions repeatedly maybe it's the CTO or the CFO they start to ask a lot of questions so with that scenario in mind let's take a quick poll I'm very curious with a quick show of hands what is the top current concern for your team or organization is it number one can I see a quick show of hands okay number two okay about maybe a quarter number three really it's so funny this is the same concern this is the same poor results that we got when we had the same session last night we had about 75 percent of the audience say number three and we had about 25 or you know 20-something percent say number two and a little bit number one the point here is these are common statements that customers make that typically triggers the need for optimizing costs so many of these are usually reactive statements but my hope is that as you leave this session today you will be even more proactive by building with cost optimization in your mind your concerns are valid you're in the right spot thirty five percents ten billion dollars what do these numbers mean anyone want to take a crack at it sorry wasted spent anyone else want to try you're right on did you attend my session last night well maybe someone told him but anyways according to a recent research he's very right a research was done to measure the amount of waste among enterprise public cloud users and 35 percent of cloud spend is wasted this amounts to 10 billion dollars in just wasted spent and this is primarily due to what we call poor cloud financial management's I'm going to say that another way poor cloud financial management is resulting in significant waste it's a fact now what does this number look like for your organization you don't have to say anything you don't have to say but think about it in the back of your mind is it 5% for you is it 17% maybe it's 69% what does it look like for your organization while I know it is not realistic to get that number to zero or completely eliminate waste just a small decrease maybe five or 10 percent in that level of waste in your organization can have a major impact on what IT can accomplish so for example say maybe your IT organization is facing a flat or decreasing budget in 2020 so maybe someone has said we're going to cut budget by those finance people you know them I used to be one of them oh we're gonna cut budget by 22% next year or something like that the point is improving your cost efficiency can actually make a difference in your ability to be able to meet some business demands and deliver new services by reallocating the savings you get from being more efficient in the cloud to those other important initiatives so it's really important so when we start thinking about excess cloud costs and how to address it it is only natural to think about the root causes the why so while there may be a few causes at play and that include includes things like pricing complexity and cost predictability there are two that I would like to highlight here today I'll be the first one to tell you I work at AWS we have a lot of pricing complexities right in terms of and that's based on just having a lot of flexibility for customers and giving customers a lot of options but within that comes a lot of choices and within that choices comes a lot of complexity so we understand that but here the first one that I want to highlight is due to the on-demand nature of the cloud use AWS offers on-demand provisioning capabilities that make it easy for anyone to spin up resources at any time within your DevOps organization right and often this result in uncontrolled cost because usage in the cloud equals cost it's not like the old traditional data centers where everything was already preset for the year the reason why you're probably in the crowd have made the decision to go in the cloud is because you want more agility you want that flexibility to be able to scale up and down we necessarily do that and the traditional data centers the second one is due to cost awareness so when those resources are being provisioned their owners of have little to no visibility into what their applications are costing in the cloud and therefore they cannot take actions to mitigate waste so historically as I was saying technology organizations are not attuned to cost unlike how they are with things like security performance and scalability the clouds therefore necessitates a new approach as a result so we have CIOs and Enterprise IT organizations they have to start developing new approaches and processes to manage and optimize cloud costs and this is where cloud financial management comes in what is called financial management in the past a few years ago we used to just emphasize just that middle part the green circle cost optimization we used to just say go cost optimize and that is still very valid but cost management is more than just cost you just cost optimization cloud financial management very simply is a set of activities that enables technology and finance organizations realize the business value throughout their cloud journey and a disciplined approach at CFM allows you to build a self-sustained culture so what does this AFF what does this four buckets mean it's really very simple four things to remember first one the set of activities that allow you to see what is being spent on the cloud so those activities at the bottom I don't have a pointer but under the first white circle thing things like account and tagging strategy those things allow you to be able to see what has been spent in the cloud the second one is the set of activities that allow you to save money and eliminate waste that's where you have things like matching capacity with demand things like choosing the right pricing model spot instances reserved instances and my new favorite savings plans and then those activities that allow you to predict and plan what your cloud cost is going to be in the next six months next years in the next five years those are things like budgeting and forecasting looking at a proof of concept based cost estimations activities like that and then last but not the list is activities that help you scale all of those things within your organization that helps you run all those processes that's what we call the cloud financial operations so things like making sure you have executive sponsorship within your organization that supports what it is you're trying to do from a card financial management standpoint and investing in your people governance and choose automation tools to help you scale that's called financial management so it really looks cool on people let me let me go back doesn't that look really cool that framework I think it looks really cool right but the question is even though the framework is good it's really simple the big challenge I've had two sessions here since I've gotten to reinvent and I've heard the same thing is how do we get people to actually take action we know what we're supposed to be doing we know for the most part we're supposed to be optimizing we're supposed to be not we're supposed to be eliminating waste but how do you get hundreds or even thousands of people within your organization to actually take action that is the biggest challenge I'm here to tell you today that you do that from experience from talking to a lot of customers even as the need when I was at Expedia leaving the cloud finance team there I'm telling you here from experience it starts by building what we call a cost aware culture a culture that involves a mind shift it all starts with a mind shift where you gradually but firmly establish the idea that everyone is responsible for cost and that there are financial consequences in everything they do you might ask so how do we do that well a cost are where culture starts with visibility a guy that likes to say I'll be back once said this words does anybody know that guy right he once said this wise words that if I can see it and believe it then I can achieve it when we are heading on a journey we are asked to pay attention and look at where we are going in cloud a similar concept applies the faster you're moving and changing the larger your size the more you have running in the cloud the more it becomes important to establish commercial visibility and control and by visibility what I mean is how often costs are updated and seen and which detail can be seen let's walk through that so just here's just a very very small set of questions that you're probably asking this set of questions in your organization your current organizations or at some point will ask I can bet you that so why did I spend go up and rejoin us virtue in September how much does my project or application costs or will it cost how many Amazon ec2 reserved instances or savings plans should we even be purchasing those are valid questions that you're probably asking how do you get those cost visibility you start by focusing relentlessly on clean data these two structures are the foundations of clean data in the cloud a granular multi account structure via AWS organizations and a properly formed resource tagging strategy is one of the most important aspects of good cost management and can help with answering those questions that I showed you earlier by doing so your technology teams become more aware of the direct contribution to the bottom line and they can start becoming more cost aware stakeholders without a multi account structure or tagging the impact of cost visibility is significantly diminished and it's difficult to know who is responsible for the spam and what it's for some activities that are crucial for this include things like if you say you wanted to write size you need to know the applet patience purpose and the owner to be able to make changes tigon can help with that say elasticity you want to improve on your elasticity tagging can also specify which instances are non-production and at what times they should be switched on or off so if you look at the tagging tags there the green the green bars those are like examples of cost allocation tags that you could have some really important common ones that a lot of customers use things like cost center application ID project environment is really important so making sure that you have that tagging strategy is important the next thing is really picking the right visibility - for you it doesn't matter whether you're a one-man organization or an organization whose huge like I don't know amazon.com with a lot of thousands of engineers it doesn't matter pick the right visibility - that enables commercial visibility and speed to the kind of insight that you need so on the left here you have the monthly AWS invoice this provides a view once a month and breaks down costs by services then you have the AWS billion console which provides a little bit more information over several months right if you have access to that there number three you have AWS cost Explorer this is a common one cost Explorer is a native AWS - that is free and it's able to show you cost at a very detailed level and not only does it show you the cost it also shows you some optimization recommendations over the past few months we have done a lot of improvements made a lot of improvement on cost Explorer due to customer feedback that we have received finally we have other options such as the AWS costs and usage report which is called a curve if you're familiar with that and this is where you have all the line items for every resource for every hour that you use in that month so this same curve report we have seen some customers take it and make it DIY and make it into their own customizable user-friendly data by using dashboarding tools like tableau Athena and just put in a bunch of queries on it and making it their own and customizing it and it's this sim car there are third parties you know companies out there they plug into that same curve and make all these recommendations and visualizations and make it available to customers so the question to you is if you look at the scale of small simple to large complex dynamic environments where do you sit my recommendation is always start I always recommend try to start with cost Explorer because it's free and you can always scale up or scale down depending on what you need for your organization as you mature in your cost management journey we see a lot of people tend to go more towards number four because now they want to start marrying their internal metadata information like revenue information with that information and that number four is it gets them there so now you have picked a two now you have the clean foundation what's next you need to select metrics to set your growth and efficiency inside so let's consider a scenario here so if someone comes and raises a red flag regarding a ten percent year-over-year increase in AWS spend what kind of insight would you be able to glean from that statement besides the fact that you know because I've gone up by ten percent what does that even mean it's just ten percent now consider if that same person said that in addition to the ten percent increase in spend searches increased by 50 percent and reservations grew by 30% ah now we're talking would you still consider that ten percent or costs or an investment it's probably an investment because your business drivers grew right so at that point it's no longer a cost depends on how you look at it value based KPIs are really important because they add meaning to otherwise solitary AWS costs numbers right so we have a double US based metrics on the left so these are the cost based metrics where you use to determine how efficient or how well optimized you are things like percentage percent of RI coverage you have is sixty percent is your goal eighty percent why did you drop down to fifty five percent this month those are really valid metrics to start off with but as you start growing and you start telling the story of okay AWS cost went up but why then you can start moving towards the the metrics on the rise the value-based KPIs those are just examples AWS costs per search it could be AWS costs per calls for your organization it could be it obvious costs per transaction for your organization it could be it obvious cost per users it could be AWS cost per booking depends on what your business is so if you have that metric it would help to tie cost and usage metrics to business value drivers and help you rationalize changes in your AWS spend so the first step in doing this is really working together across organizations to select and agree upon a standard set of KPIs it doesn't have to be ten it could be two you could maybe start with two here and maybe one there when we were at Expedia we we looked at bookings was one of the big drivers of costs so costs per booking was was a good one that that worked but in some cases you have the KPS that may have meaning for one meaning for one organization versus the other and that should be a red flag that prompts you to go back to the drawing board to identify what is really that metric that you all can agree on for your organization so you've got we don't conduct kpi's however data and KPIs are not useful unless someone is actually looking at it and the way to use cloud data in an organization is really to connect the data and the KPIs to your cloud resource users and present that information in a meaningful way across multiple levels executives need something different from DevOps team right you have your CCO your cloud Center of Excellence they probably need something to friend from the finance team so building customizing with dashboards with actionable insights tailored to each of this helps and as a result of extending visibility to multiple stakeholders people are much more judicious of their usage and they can start to see you can start to see better behavior across the teams I mentioned earlier over the past 12 months I'm gonna focus on cost Explorer a little bit here we've actually done a lot of improvements in this area some of the things we've done is now instead of just looking at monthly usage data we now have hourly usage data in cost Explorer so that is one area that the dev ops team really find really useful for instance so now with the right data and visibility you can actually now start taking actions that eliminate waste and start saving remember when we started off with the whole wastes now we worked our way journey through how can we stop taking actions it start with the visibility and AWS is approach on cost optimization is really about customer obsession let me remind you that helping customers save money is part of the AWS DNA and we really really put in a lot of time and resources and we've been producing tools and solutions for customers to help them monitor and analyze their spend there's a lot of ways that you can cost optimize about a hundred plus and having the right data and visibility like I mentioned earlier can help you prioritize and take action on them you don't have to do everything at once so to simplify what we've done is we've categorized those 100 cost levers into these four buckets remember we started with those metric KPIs and these are the optimizations that will help you improve them so the first one is really treating cost as a non functional requirement during design and architecture just as you do for things like scalability and security the other one is really leveraging dynamic predictive and scheduled elasticity of AWS to turn of resources when they're not being used so it's easy to for instance in RDS can both be turned off and arm that's really around maximizing your elasticity it appears pricing models is another one we know that most workloads start off as on-demand workloads because you don't really know if what's gonna happen so you start off using only and that's fine only man is really when you pay for what you need at a moment in time but there are other few pricing constructs that can give you more discounts based on how well you know or design your environment and that includes spots reserved instances and savings plans to optimize the right type of workloads and you can have the right mix based on that some of the data is that CFOs could find useful in evaluating should we be buying our reserve concerns and should we be buying sales plans up things like looking at the return and upfront cash outlay those are some of the data elements that CFO's helped make that decision last but not the least is really looking at things like your CPU data utilization data your RAM your storage to identify potential instances that can be right size you can right size or eliminate what we call I do resources or resources that are not necessary and that includes easy choose EBS snapshots and volumes and just assessing the performance of those and seeing if you need to scale down or just terminate those those are the four areas of savings but we know like I said just now you can actually do everything at once a successful cost optimization actually requires a balanced approach and prioritization and to prioritize you first of all need to really determine the most critical areas to get the most value of your time investment we realize this is not just gonna do itself its effort that needs to be put into it so you need to be able to pick a lever that brings in the same that the most value to you so you would want to quantify the ROI and make a decision to pick the ones that are most important that is worth protecting for your organization and I know that every journey is different you just have to determine what this picture might look like for you so let's look at this example in this case everything in the yellow is what I would call the low-hanging fruit because it has the most impact and the least amount of complexity to to to implement right and then if you look at the gray the gray at the top right there things like the spot and server less for this scenario they have a lot of impact as well but they're more complex because it probably requires some architecture and technological effort involved maybe that for your organization is not the case maybe the grade is actually in the middle in the in the left bottom for you it depends you just have to look at the amount of effort it would take for you and the amount of impact it would have and prioritize based on that so in this in this example for example the new Savings Plan pricing model is good for you if you have yet to drive significant optimization in your organization it's one of the most impactful and simplest cost optimization on levers to implement and why is that savings plan then highlighted I mentioned earlier is one of my new favorite things what is savings plans it's very new we just launched it probably maybe four weeks ago or something like that savings plan is a new recently launched flexible pricing model that offers low prices on ec2 and far gate usage in exchange for a commitment to a consistent amount of usage which is measured in dollars per hour for a one or three year period it obvious cost Explorer will help you to choose a savings plan and will guide you through the purchasing process so unlike reserved instances savings plan actually requires less planning and management so it gives you the same discounts but it's it's just a little bit has a little bit less of an overhead than reserved instances and as I mentioned earlier cost optimization requires a balanced approach so if you let's say you assume that you're going to right-size 50 percents then you can do savings plan to cover X percent you have to balance it out not one size fits all if you're a company that uses only a small number of cloud resources you might find that you can perform this cost management activities that we just talked about manually but as you scale up it's simply not practical so maybe today you tend to react and you have manual processes to do a one-time cleanup those might get you by but if your goal if you remember the third slide if you're going is I want to grow I want to grow from 1x to 10x that will not scale you need better cloud financial management and there are things that you need to start doing to prevent waste in a scalable manner with scale you need automation as mentioned earlier there are many levers you can pull to optimize costs and it's just not practical to do all that manually as you skill instead use automation to self-sustain this helps to eliminate a lot of human errors and put repeatable procedures in place AWS and some of our other partners are always developing tools that enable you to manage spend independently and automatically take action to remediate problems so let me share with you some of the things that we're seeing customers do with automation we're seeing new tools and resources that can help you with automation basically to do two things the first one is automation to save money and automation for governance there are tools to do all that so here are some examples AWS instance scheduler enables you to configure custom start and stop schedules for ec2 in RDS instances so if you have instances that you want to shut down after business hours or on weekends or maybe just look for dev or tests non prod instances or instances which schedule run times you can just leverage AWS instance scheduler there was actually a customer that was able to save $10,000 a month just using scheduling that's it that's a fact or maybe you have were close at the peak at the end of the month where you may want to ensure you have auto scaling enabled to meet new demands with auto scaling you can scale up and down based on the demand and usage which translates to specific metrics such as CPU utilization or the number of requests and you can also leverage all the different pricing options that I mentioned earlier in your auto scaling group for the most cost-effective scaling strategy we also have the new Adobe Systems Manager OP Center where you can I'd sent centrally identify things like EBS volumes that you are no longer attached to instances that you need to delete and get rid of that can help with that an intelligent cheering I really like this one depending on if it's a use case that fits for your s3 buckets this is basically an automated way of making sure that you are using the right s3 pricing class so we estimate there are different classes of s3 pricing there is a standard which is the more expensive all the way down to infrequent access so it basically by enabling s3 intelligent hearing you can look at your s3 analytics over the past 30 days and if you're a candidate for if your pockets are suitable for it it would basically help automatically routes your s3 standard usage if it sees that it's fit for infrequent access and move it automatically to the to the lower tiered pricing one and save your money that way so you just have to enable it that one time and then on the governance standpoint we have AWS budgets which basically is an automated automated way to get notifications when you maybe don't have your threshold mats it notifies you via email or SMS and maybe if you've meet your budget threshold or if your RI coverage has gone down or gone up based on how you set it so this is just examples of automation tools that we're seeing a lot of our customers to start to put in place a little bit more so let's take a step back and visualize what good cloud consumption actually would look like on this graph you see that I do not have there is no focus on aggregate costs like a total cost that we spent in October 2019 no there is no aggregate cost but rather what I'm focusing here is the unit costs unit cost helps to provide a measure of how efficient your clouds paint is with respect to your business we talked about that earlier so more specifically in this graph if you look on the right side we see a steady increase in usage which is the green bars over time that could be a year that could be two years that could be six months and on the left-hand side we see a unit cost cost per transaction which has actually kind of remained flat and steadily declining that is the yellow line this decline implies that cross are decreasing primarily due to optimization while transactions are increasing maybe due to volume growth or it could be a combination of both so the point here is it's not really about focusing on the total cost it's really about picking that metric that we talked about and using that to measure how efficient you are so with this ideal picture in mind I'm gonna turn it over to Patrick from lifts who is going to walk us through their journey and how they achieved similar results cool Thank You Abby ari so I'm Patrick and my colleague Ilya is over here he'll he'll join us later so we're here today to talk about lifts cloud financial management journey so first a little bit about us we were founded in 2012 and our mission is to improve people's lives with the world's best transportation we serve millions of daily trips and were available to about 95 percent of the US population as well as a few cities in Canada you probably know us for for ride-sharing but we're more than that so for example in your lyft app you can go and see public transit directions we also have our self-driving division called level 5 so why are we on stage today it's because lyft runs on EWS when you use the lyft app requests are made to our micro services running on ec2 we store data and DynamoDB s3 and other storage services and we leverage many other parts of AWS to grow our business and in case you're curious about our bill ec2 dynamodb and s3 make up about 85% of our monthly usage so to give you a sense of our scaling our growth here are some historical figures for you so in 2016 the number of rides that we serve tripled the following year they more than doubled and the reason I'm showing these here today is because these are the years that our journey really began when it comes to cloud financial management so this growth that I just talked about combined with a few technical challenges led to this increased AWS spend that Abbiati I mentioned so I'd like to share those challenges with you first we knew what our bill was we were using some of these tools from Amazon but we didn't exactly know which teams within lift we're spending that money this was due to an innocent tagging of resources within the company next we had little to no restrictions over spinning up new resources now while I think that's the best way to go trusting engineers to do the right thing we lack the appropriate guardrails to detect and mitigate waste and then as our as our company grew our engineering team grew and matured they began running more stuff more complex workloads and for example our machine learning workloads run on more expensive ec2 instances like the p3 family so we had this increased spend and we knew we had to do something about it so in 2017 we built this pretty simple tool a spreadsheet based tool to tackle this problem so what this thing did is a Python script it downloaded our AWS billing data it sliced up the spend by lift services and teams and we made that data available to everybody within the company the key metric that we tracked was cost per ride dividing this AWS cost by the number of rides helped us understand our AWS usage and spend as our business evolved and then for our engineering teams this was really the first time that they could tell how much they were spending in AWS and this and just initial visibility drove a good wave of right sizing and cost reductions so this was a big great success for us one win which is almost unintentional is just showing teams by ranked by their spend sort of gamified cost optimizations my favorite story is one of our internal infrastructure teams they saw their name very surprisingly in the top 10 list of spenders and they were very surprised their manager didn't like it and they went and investigated why they were able to do some optimizations they dropped out and then the manager was very happy within six months of launching this tool lift wide we dropped our AWS cost per ride by 40% so just by doing a small tool and giving us baseline visibility we got really really high ROI out of it now the the warning will give you however is if you do this sort of work in your organization this might just be a one-time thing also this work allowed us to just really understand our AWS footprint at a higher level and it laid a foundation for a team in this area of understanding our cost and usage our capacity team and and for me the reason I'm here today is because that like this work just help me understand AWS even much more so if you want to learn more about AWS I suggest looking at your bill and digging in from there so while this was successful we we learned a lot we found some new problems too first is the way that we stored our data was a bit rigid and we couldn't answer some of those questions even the ones that Abbiati mentioned earlier what we really wanted was fine-grained data that we could do sequel queries we could build custom dashboards but our solution was a pain to maintain and it just couldn't support our growth and we just needed a stronger technical solution to this problem another decision we made too was to abstract away the savings from ec2 reserved instances now this made it easy to compare usage across agnostic of our eyes because that data is not evenly attributed all instances but it led to some incorrect cost estimates when teams were saving money or estimating their spent basically our team had to step in and we didn't really have a self-service tool that we had built and then even still even though we built this great tool we hadn't developed a sustainable solution for tagging within lyft so to tackle these technical challenges we partnered with lyft data team to really build a scalable and first-class solution of this problem so I'll invite Ilya to talk about how we did it okay Thank You Patrick hi everyone I'm Ilya my team has been helping to establish solid data foundation for the next generation of ADA ballast cost management system at lyft so here is how we did it we started with identifying three major stakeholder groups and their needs finance and leadership wanted accurate high-level financial numbers correctly attributed to services and teams developers wanted not only spammed numbers but also usage data fairly attributed to them as well as ability to drill in as deep as possible to answer their questions and final I kept a state in which Patrick is part of wanted all of the above as well as some a SS cost management sees this if ik things such as our inventory management and things like that so once the requirements were clear we proceeded to building the data solution and we started with the foundation the data sources two major data sources you are actually provided to us by edible eyes directly the first and foremost is ablest cost usage report something that Ibra already mentioned today this is the heart of any edible s cost management system the most detailed spent and usage data you can get by hourly and the resource granularity it will s also we're providing us with a set of api's where we could get the data on our inventory on public pricing and things like that and also on utilization which is like very important I part of that we had like some internal data sources on our like internal pricing on teams and projects basically the org chart that was very handy for team allocation as well as container locks lift is making really good use of containers and that piece was very important for us once the data sources were defined we actually proceeded to building the data pipeline and we started with ingesting the data sources into our data warehouse so even those this talk is not very technical I still want to mention that we chose our major analytical data warehouse for a SS cost management system it is running on - top of s3 file system and even though a Tobias case management system doesn't necessarily need this scale it was still really good to avoid silo and also to get good use of support company-wide supports right away so our major goal was to get the our the lift version of cost usage report could answer all the questions our stakeholders had and today I want to focus on three major areas that we had to tackle the first area was around cost metrics here we decided to produce something called true cost something we call true cost basically the idea here was very simple we wanted each line item in our version of cost usage report to show the actual amount of money we pay to edibles if you ever deal with original Confucius report you know that this is not the case because you need to include different adjustments discounts you need to attribute reserved instance fees as well as for example like up front emerges ation feels the fees for reservations so this needs to be tackled in your data pipeline the cat the second big piece was team and project attribution even though we had pretty decent tagging it still wasn't ideal if you ever deal with tagging you know you cannot just get it ideal that's why in your data pipeline you need to do some further processing on top of text in order to correctly attribute the spent to teams and projects and finally we had to deal with container allocation we didn't expect this piece to be very complex but it turned out to be really complex as I said lyft is making good use of containers that means that on one easy to machine several workloads can be running in containers and those workloads can belong to even like different teams not only just like projects and those teams want to see the spent achieve it correctly to them so you need to build in some kind of logic that will distribute ec2 cost by those containers and that that turned out to be not trivial at all however once all these three areas were tackled we derived our own version of cost users report that was able to answer all the questions let's take all the stakeholders had however stakeholder still couldn't do anything because that curve was inside the data warehouse and you had to choir it with sequel we had to know where it is that's when we proceeded to building the reporting layer and here basically we provided different sets of dashboards high-level for leadership and Finance and deep dives for developers and capacity team so here's how they look like this is an example of a deep dive dashboard where for example engineering manager can go in and see our Lea expand for his org for his team here it is a bi project and then you can see it to the left a bunch of filters where he can drill in and answer different questions such as where this data transfer cost is coming from or how does my easy to flee it look like so here is an example of high-level view for leadership at the top we provide the major kpi's like total it Willis spent up to date or sense P right something that Patrick and at the other I already mentioned that's like important KPI for us and at the bottom the executive can see the daily spent by org basically he can like find out who the biggest offenders are so at the end that's what we got for our AWS cost management system this might look pretty complex at this point however the idea that we were like following the framework is actually pretty simple it's called data-driven framework where you get from data to information to inside and you don't have to be lived to actually do it and you can make good use of the tools of different caliber that'll be our dimension today to summarize to build a good at the west coast management system first know your data sources well know what is out there second pay a lot of attention to cost metrics find out true costs that you actually painter a SS then allocate that cost correctly to teams and projects and mine shared resources if you using containers that might be difficult but you have to do that right in your data pipeline and finally deliver all those findings to the stakeholders so this looks really great however if you do all of that magic doesn't happen because stakeholders can get the data but unless the act savings are not coming so and now now I'm giving back to Patrick so that he can tell you how our teams at lift achieved really nice results by acting on the insights we deliver to them thank you thank you yeah so using these insights we had it we're able to make a huge impact on our AWS spend as a bee add I mentioned automation is crucial we leverage that to optimize so for example our capacity team built a an automated tool to understand where low utilization resources were and we use that to delete instances and reduce waste using our dashboards teams on their own could now identify these high spend areas and make architectural changes to their application to reduce spend for reserved instances we built additional automation that facilitated not only smarter decision making but quicker decisions and then organizationally for our teams this allowed us to grow the scope of our our team our capacity team to build more automation and have a bigger scope we also founded an efficiency team that was specifically founded to drive cost reductions these teams helped us move from this reactive mindset to more of a proactive one for cloud financial management this also enabled better collaboration with engineering teams across the company and even other functions like finance and accounting so I'd like to share some of the results with you first our reduction of waste so for ec2 using that automated tool that I referenced we identified low CPU utilization instances we reached out to those owners and we were able to delete a large quantity of those for DynamoDB we looked at our top 25 most expensive tables and we found many many opportunities to right-size both provision throughput as well as storage and then enabling DynamoDB auto scaling to provide some elasticity also had a big impact here so our dashboards we really started to empower teams to do work on their own and understand this so on the left here we have a chart where our Data Platform team was looking at some of their workloads and they noticed that they had a lot of cross availability zone data transfer spend and then they realized that they didn't need to separate their clusters across availability zones so what they did is they co-located these clusters were appropriate and they were able to cut this cost to zero what do I think is the best type of cost optimization you can do on the right teams have been using our dashboards to understand the cost impact of containerization of migrating to kubernetes so here in green you can see an old workload running outside of kubernetes next spinning network load up in kubernetes and then finally removing the old workload and realizing a net spend reduction of 50% and then this last result is kind of a different one so while we added automation to save money save more money with reserved instances we also save time for our teams and other engineers previously our our ia process was very hacky we only had each ec2 process and we relied up on a bunch of charts and one-off scripts to do this work now thanks to our team we've built email reports for all reservation capable AWS products and it tells us every week via email which buys to make which conversions to do and we use these determinations based on a metric that we call potential savings per week so basically we look in the past and we see what would the perfect number of our eyes would have been and make decisions based on that so now that this work is automated our teams have now more time to build more tools and save money in other areas so even though we have a we've had a lot of success here I'm sorry to inform you this work is never over we have to keep keep doing this work so for our data set we need to add more resource utilization to really further right size and understand our workloads and then we need to attribute more of that shared platform usage back to teams for our tools it's all about automation we need to add alerting anomaly detection and automating some of our RI conversions and then finally as I said we just need to keep doing this this work gets more difficult over time not here thanks so much for having us and I'll hand it back to Abbiati thank you so much Patrick and Illya those are really great successes so I'm going to ask this question that I asked in the beginning who can operate workloads in a cost-efficient manner based on what we just heard with a show of hands Oh like that guy in the middle everyone in this room can do that everyone in this room can do that we've done it I've seen a lot of customers it doesn't matter what size you're able to do it so call to action as you've just heard as Patrick and Ilya I mentioned in the Elise journey here are some of the few things that I think helped me as takeaways that you can you can have a stickier ways basically enabling broad data visibility helps to drive the right decisions start small you don't have to go big at once and continuously build improvement processes as you go in your cloud management journey then define an owner to drive corrective actions it doesn't have to be a whole team to start off with it could be somebody who is just dedicated and who owns looking at this and making changes to the attitudes to that automate we've talked about this and last but not the least I always like to say this clean up that lab environment you know what I'm talking about clean it up so thank you so much for coming we have some other card economic discussions going on at reinvents feel free to take a photo of this if you want to go attend those sessions as well and then let me leave that on first second we also have a session survey in the mobile app please oh let me go back I think some people are still taking photos I'll leave it on but please make sure to take the survey we found out yesterday that the Soviet doesn't actually indicate what one means or five means one means really really bad rating and five means really really good ratings five means really really good ratings so just wanted to clarify that but thank you so much for coming we are going to be here for the next four minutes or so so if you have any questions please feel free to walk up and talk to us thank you you
Info
Channel: AWS Events
Views: 5,499
Rating: 4.9436622 out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, ENT204-R1, Enterprise, Lyft Inc., Lyft, AWS Budgets, AWS Organizations, AWS Cost Explorer
Id: ChupgIbZr5Q
Channel Id: undefined
Length: 55min 14sec (3314 seconds)
Published: Wed Dec 04 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.