Scaling YouTube's Backend: The Vitess Trade-offs - @Scale 2014 - Data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'd like to welcome sugu from YouTube who's going to talk to us about with us sugu tells me that you can think of it is is being similar to make router for memcached but this is for my sequel but this is for my sequel so let's hear all about it hi there so this is a last session for the day so after this we get to go party that's good I have a small confession I don't know exactly how long this talk is going to be I think it'll be 45 minutes so I'll try to finish around that time but we may go a little bit over under so I will go from there so my name is su goo and I've been at YouTube since around 2006 for the last four years I've been working on with us which myself and our team is extremely excited about and hopefully by the time I'm done with this I expecting all of you also to get excited about what we've done and how you can also get to get to use it so just to go back the go for in the car means something I don't know how many of you recognize what that means ah there's one one person all right oh there's two three okay a handful so the Vita's was written in go and we our entire team loves go as a language so I I do have one slide to talk about took our talk about go and what it did for us so what is with us is basically a scalable storage solution is based off of my sequel and it's mainly meant for online applications and so if you're if you're writing a web or mobile service you will need a back-end and eventually if your mobile service becomes even moderately popular you have to figure out ways to scale it and we we at YouTube faced the same same challenges and we figured out ways to solve them and we test being part of the solution so I'm going to talk about how we went about doing that and so it is now fully used at YouTube and it is open source which means that if you faced problems similar to what we faced at YouTube you can also benefit by using the test most likely so if you start off your you build your first website the first thing you do is bring up a database and then bring up some web servers and then start using it and eventually open up your web servers people start using it and then the database starts filling up with data and after a while you realize oh my god there's interesting stuff in the database I need to make sure that I don't lose it so typically what you do is you take your web servers down to take backups every once in a while and eventually that obviously runs out of steam so you need to go to the next phase these are all like scaling 101 so I'm not spending too much time here so the first thing you do is you set up a master and setup couple of replicas so as soon as you do that you get a breath of fresh air you get a huge durability improvement because there's two replicas keeping up with the master and there is also a huge uptime improvement because now you don't need to take down the master to take backups if a master fails or dies you feel over to the new to a new replica as a master and then keep chugging along and so this is this is the benefit that you get and then you look at it and say well this is like two replicas sitting doing nothing might as well send some real traffic over there so you get to distribute your load and scale there that way but of course the moment you do that you have to think about what you are giving up as soon as you direct your reads to a replica you are going to be reading stale data and your app might or might not have been written to handle stale data which means that you may have to go and refactor some other things the way we solved it at YouTube is we kind of categorized reads into two parts one is obviously a replica read which means that let's say you want to just show a video you don't really care that the view card is completely up to date somebody wants to see a video just give it to them that's that's how our users are so those reads always go to a replica which is read whatever is there displayed and we are done with it but then there are some other use cases where people want up-to-date information like I go update my profile and hit submit and as soon as I hit submit the first thing I do is I feel fresh and see if my information got updated so we kind of identified those workflows and says oh those things we should send those reads back to the master so we have actually two categories of reads and the YouTube app is rewritten to either go to the replicas or go to the master depending on how up-to-date your data has to be so this is actually a good flexibility because this helps you kind of sidestep the cap theorem think somebody else mentioned about it today where you can only get consistency or availability and not both in our case we kind of get both but we are basically playing with the laws of physics here but it is but it works as you can see like if you go update your profile and immediately hit refresh you will see up-to-date data so we have made some trade-offs here I will talk about all the trade-offs we make as we scale because those are trade-offs that you will also be making when you scale your database but this actually can take you for a while and if we survived this architecture at YouTube for a long time until our right QPS started catching up to us and our master actually it's not our master that couldn't keep up with the right it's actually the replication so as soon as you're right QPS exceeds certain amount the replicas just cannot keep up and as the previous speaker mentioned it's because of the single threaded Ness of the replication screen so there is a one time answer to that thing which is called prime cash it was actually written by my good friend Paul tuck field who is not here today he left actually to work at a social company so but he came up with this idea of prime cash what he said was this is single threaded and these threads are all disk bound so what he doing he wrote a tool which would actually read this real a log and look at all the where clauses and preload those rows into the cache before the replication stream would come and replay them so which means that all these i/o operations that used to be all these replication operations that used to be disk bound now suddenly our memory board we got a huge increase in our replication throughput and we survived for a while with that but then after why's that also ran order stream and your app may be different you may not saturate on right GPS but you are going to saturate on storage capacity eventually your data base is going to be so big that it's not going to fit in on one disk so when these things happen you have to take the next step which is obviously to break this data base into multiple parts so when you break a data base you break it up two ways one is you break it vertically which means that you identify a group of tables and say oh this is a logical data base let me migrate it out into a separate database or you may shard it I think we have heard shouting enough that I don't have to explain what it is so you do both things when you scale when you scale a database you do both things and I'll talk about how it supports these things so actually the first time we the person who did this is actually in this room for us santosh he also went to work for a social company so the one thing that that you do is that when you shard you lose some things one thing that you lose is the ability to to do acid transactions because now your database is split into multiple parts you write something that affects more than one database you may have transaction issues we had to rewrite our application to handle that you can't do joins as freely as you used to before and the client over time slowly has become more complex like the previous slide you saw that a client had to figure out whether to send reads to a replica or a master now you have a client tries to figure out based on the value of the where clause whether which chart it should go to based on the name of the table which database to go to and then there is actually a third complexity which is the cross shard index there are some some queries that don't shard well which means that the key for the query is goes across charts so forth for such things you create a cross chart index and then the client has to keep that up to date so at this point we have a we have a really really fat client but then if you get here you can go for actually many years with this you can keep going until that word gets mentioned it's called disaster recovery so if you are in a company you are in a situation where they are talking about disaster recovery it's a very good time to make your resume because the stuff is about to hit the fan so the way you you handle disaster recovery is you you basically replicate your data across data centers into multiple cells and so this is kind of what it looks like but way more complicated you have lot more verticals and lot more horizontal by this time we probably have like thousand two thousand depending on how big your database gets you probably have a thousand servers or so and imagine each of these servers was manually built by a person you cannot do that anymore with these many servers and sometimes but there are some things that are that improve like for example when you build data centers you for example you can say I'll build a data center in Japan which means that jab and jab an customers would have low latency because the data is right there for them so there are some benefits to doing this but it's mostly painful mostly problems and sometimes like we've had issues where if somebody brings up a replica and then points it to the wrong master sometimes replicas when you do a failover replicas get ahead of the master you get like alternate futures you get parallel universes like Sam why there was one son we had to call upon Doctor Who to the doctor to fix our space-time continuum issues so and then at this point further restarting is pretty much like out of the question you cannot manually recharge once you have done this the problem is just too complex but it's just it just doesn't stop here like right around this time what happened is that the number of engineers in YouTube grew to like a few hundred and what do a few hundred engineers do they write queries and what happens is when a few hundred engineers write queries every once in a while each engineer messes up only like once a year you do the math and see what happens every couple of weeks database will go down some toxic query would just do a full table scan or will it'll be an index scan but the index would have like 1 million elements so you would think oh who would upload more than 25 videos yeah so with us basically what it does is this is this was in 2010 there's two of us my good friend Mike he's actually in the audience here and myself we said we need to we are completely in this reactive mode and we are completely on the receiving end of things we need to do something to leap ahead of all these problems where we get on top of the problems and we stay ahead of it so we kind of took time off and we didn't take time off we actually took ourselves out of the main mainstream YouTube operational and scalability work and started building this Vitesse and our idea was to take it from point A to close to point B somewhere close which means that what we wanted to restore was we wanted to restore the YouTube app to a point where it thinks that it's just interacting with one database to basically get to as close as possible so if you look at it the there are there are a few problems here that we are trying to solve one is taking control of all the servers you need keeping track of all the servers what are the Masters who is related to who and keeping all those relationships the other one is the abstraction layer that helps you view this entire thing as a single database or a single data store and the third one which is that purple problem that propped up is how do we protect ourselves from bad queries so so those are the three components that I mentioned one is the workflow management the way we decided to solve that is to actually use a lock server we are using zookeeper and I don't have to explain about a lock or the lock server is zookeeper we is in the open source project and internally at Google we use obviously chubby so with this is actually nicely componentized we have a beautiful abstraction layer if you don't like zookeeper you can throw it away and bring in something else like HCD we are even thinking of a file based workflow management system because if your installation is small you don't really need a lock server like if you can write to a file system and like send that's by a file system across to your has config those are good enough also so the so that is the workflow management part there is actually a workflow server that maintains this data which means that when you do stuff like re parenting or bringing down bringing up replicas are bringing and down there is the server that does it work for you I'll go more I'll talk more about that and there are some command line tools which are meant for automation tasks where you can actually write scripts to do maintenance work and on the server side there is a the VT tablet which is actually a proxy that sits in front of every meal every my sequel instance I will talk about that and the query router VT gate which gives you the abstraction to to be test oh look Slyke there's some font issues so the lock server what information to be store in the log server there are actually two log servers one is the global one and the other one is the local one so the global one is central information for all servers to share we store key space information what is a key space the key space is basically a logical sharded database since the name database is already taken we have to invent a new name and we called it a key space so it's basically a logical database that represents a group of database and you access em as one unit we showed him from a restore information about the shard graph which means that we go if we do range bay starting which means that all these ranges and which database they map to that information is in the global in full lock server there is also the list of master databases that are currently accepting right traffic and there's the list of data centers where all this information is going and in each data center we have information about which are the replicas what categories they are because once you are serving a lot of traffic you need to there are like specialized servers there are some servers that you use for OLTP some servers that you use for MapReduce so we have lots of separations in Vitesse I may not have time to go into those details but they exist it does can help you configure those things and also which servers are up ready to accept traffic and which servers are undergoing maintenance maybe some of them are doing backup and and all those things so our goal is to basically automate things that would otherwise take a lot of time to do like for example when when you decide to reparent typically I don't know how many of you have done reparent but at youtube' reparent happens right now in about 30 seconds you say oh this master is bad we need to repair and we issue a repair and command in 30 seconds we have the new master up and running we actually are experimenting with a 5 secondary parent I will let you know how that goes when we try it on life what else like another goal is to actually basically automate it sharding which means that right now we have n shards and suddenly we are running out of capacity we say okay let's short it for X so you should be able to just go in and issue a command saying that ok split this into we have 16 shots split it into 64 shots you issue the command ideally it says ok I'm doing all the work needed to come back tomorrow and check on me and then you come back tomorrow and it says oh your shots are all ready are you ready to make the switch and you say yes make the switch it stops the Masters on the 16 shards brings up the master on the 64 shards and now you're full sharded so we are very very close actually we are performing this operation we are Canadian this operation that YouTube as we speak so we are really confident and excited about all these features so like this is kind of what the big picture we test looks like you have the log server we have the VTC TLD which is actually the the D stands for daemon which is actually an HTTP server you can basically go to VDC TLD it gives you a full list of your entire list of servers what's related to who and then you can navigate that and then you can issue commands about okay do a Reaper and do a rashard and there's like a whole bunch of common maintenance operations that we perform and we do we use this for for YouTube even right now and the VT CTL is actually a command-line tool that we plan to use for scripting and on this side we have the application the application will talk to VT gate will just send a query to VT gate at mitigate will figure out where to route that query based on what type of query it is it I so all these tools have sequel parcels so they can kind of understand your query and know what to do with them and eventually VT Gate sends a query to VT tablet and VT tablet sends it to my sequel and you may wonder why VT tablet what does VT tablet do why can't you just send the query to my sequel's VT tablet takes the query from the app and gives it to my sequels but that's that's not that's not all it does obviously so the first reason why we wrote VT tablet was the cost of a my single connection is huge it's approximately about two megabytes per connection and at YouTube we have tens of thousands of connections I think we have last I looked at it we had like 60,000 connections that kind of if you have 60,000 connections going into my sequel it's guaranteed to die yeah we act armed icicle started dying at like 5,000 connections so that's actually when we launched VT tablet so that was the first problem we solved the other thing is that I thought you remember the developers I talked about it's for them so if they write a bad query how do you detect it how do you how do you handle it so the problem with my sequel is that it is bound to the relational contract which means that if it's thinks that it can answer a query it will do whatever it takes to answer that query at the cost of all other queries so if you send a few queries that do full table scans your all your other queries are going to suffer right it's going to bring the entire database down and so how do we protect ourselves from even my sequel when you send all these tough queries to my sequel it's going to cause site issues so VT tablet basically is a mediator it's it looks at every one of your queries and makes a decision about what to do with it and I'll talk in detail about about like for example if there is a query that comes in he says you don't have a limit clause here do you know how many rows this is going to return so V T tablet puts a limit clause and then looks at how many rows it has written and it says it's more than what you should be fetching so it returns an error saying that you're not allowed to run this query so stuff like that if a query really really causes problems we can blacklist it and I don't know how many of you have struggled with my sequel my sequel runs fine and every once in a while he starts to slow down and you have no idea why the only tool you have you log into my sequel and do show process list and then it tells you these queries are running if you are lucky you may find the problematic query and do something about it but it's kind of shoot in the dark situation and VD tablet will help you in those areas also so it also the video tablet also does other things apart from the query service it also does tablet management work like if for example you want to initiate a back up the VD tablet is the one that actually does the back up for you the other thing that VD tablet does is we are we are living in a very heterogeneous world where no single tool is enough for everything sometimes you want to export the data from my sequel into a map into Hadoop or BigTable and want to run MapReduce is on those things so what VD table gives you is a subscription service where you can say give me all the updates that are going into my sequel and I will use those updates to update a different database and then use that to run other analysis so it provides all these streaming services so this is a deep dive I'll skip some of these items like I have already talked about connection pooling the awesome thing about connection pooling is now we we failover a master at YouTube and as soon as the master comes up all the web servers that have been throwing errors front end five hundreds just flood VT tablet and then within a few seconds no problem we did I tablet as I accepted all the connections and is now serving queries at full stream nor thundering herd issue so that's actually thanks to the go runtime because they have written such a good network stack the there is this real results to use this is actually a very common problem you will see with Web Services is that one particular item or one particular user suddenly becomes extremely popular maybe because they are in the home page and their Twitter is tending trending whatever and what happens is that when that happens all the front ends end up sending the same query to the same row or to the same set of rows to my sequel and if you actually directly sent those things to my sequel my sequel will actually independently execute each one of them it has no idea that there are like five thousand pending requests of the same exact query so what we do tablet does is if it receives a query and that query is already executing it just holds off on it and when the first query returns it actually returns that same result to all the people to ask who asked for that information this used to actually be a big problem at YouTube and we have never heard of that problem ever since we launched this feature that we are inside the video tablet we also have a sequel parser which means that we can understand queries and not only currently we do smart things about it if in the future we see things that we need to do about these queries we can do because we fully understand the query but it's not it's not the full my sequel syntax query which means that there is a trade-off here we don't allow anything and everything but we allow pretty much most of the common queries just to give you a contrast the Vita's query parser is about a thousand lines of code and the my sequel parser is fifteen thousand lines of code but it it's a very judicious heuristic approximation of what my sequel does because almost every query that you commonly sent to my sequel will work with VD tablet so row cache that is something I would like to talk about because everyone has been talking about caches the entire day today so I need to do sales pitch of what the the awesome stuff that we did with our row cache so before going into row cache I need to talk about a problem that exists with my sequel so my sequel has this thing called the buffer cache which basically reads 16 gape 16 K blocks at a time so if you have like 25 rows in a 16 K block and you access one row that entire block gets into the buffer cache so that is actually fine if you are doing a full table scan because oh everything comes to cache then you also fetch the rest of the rows from the cache but a typical online app that's not the usual pattern the usual pattern is random access you fetch one row here if it's one row there and what you are doing effectively is trashing the buffer cache if you are using that kind of access and so what we decided to do and the way actually people that's actually the reason why I mean cache if my sickle was actually efficient about this there's probably no need for memcache anyway that's a different subject so what we decided to do was Vitas has its own row cache which is actually just memcache and what it does is when you do a select and if it's a primary key fetch if we test loads that row into the row cache and all subsequent reads are served out of that there's that's no rocket science we all do that but it does a few other smart things that usually a typical caching system doesn't do one is if you send an update it invalidates it but what if you're on a replica if you're on a replica there are no updates there's only read traffic what we what we test does is it it actually subscribes to its own my sequel server as a replica and actually looks at the stream and uses that to invalidate the row cache so what that means is that a typical cache is actually kept up to date to expiry you say oh okay I can tolerate about one hour of stillness so I'll sit expiry whereas but in with tests a row cache never expires because it's always up to date so its primary key base fetches it is consistent and there are a few other things that we test can do smart which is for example let's say you use just sent a query and then that query is not a primary key fetch it's actually a index lookup so we test analyzes your query it knows the schema it says oh my sequel is going to do an index scan on this so then what it does is it rewrites the query it actually the sense does only an index scan from an index table and then fetches all the primary keys and then goes to the row cache and says okay which of these do you already have in the row cache gets those and what is left over is all it sends to my sequel so if that same query comes back pretty much your row cache is fully primed the only thing that you end up doing is the index scan and then followed by a row cache fetch which is then returned back to my sequel so these are like some things that we are very excited about it is challenging to keep it up-to-date because you have to handle race conditions like let's say I go to the RO cache the item is not there I go fetch it from the database and try to write it to the RO cache what if the RO has changed in the middle right so there are a few flows that we have to take care of we believe you have taken care of them well we actually have prototype this in sanity check mode and we have had weeks and weeks of run and there has been no inconsistency with this so we are actually getting ready to turn this on for full traffic so that's that's one part we are excited about and there are other fail-safes so these are all features that are meant to protect ourselves when a bad query hits the database sometimes people will start transactions and then we will make some long-running RPC calls and these transactions remain open for a long time so we test actually tracks all these things and then tells you this transaction is being open for too long I am going to kill it same thing with queries of query runs for too long it kills it there is actually it limits the maximum number of transactions you are allowed to do at a time because my sequel is now it becomes very very unhappy when the number of transactions hits 1024 so ever since we did these things all those problems have have gone away query blacklisting is actually strangely we have put in so many defenses that we haven't had to actually blacklist a query yet but the day will come what the feature is there and the DML annotations is actually for the bin lock stream I'll skip that stats is actually another awesome feature it's our SR ease love this because when there is an issue you look at the graph you almost always can figure out what's messing up because we have roll-ups by query by table so the way we test accept square it accepts it's not like a full my sequel query it's actually rolled-up query where you have wine variables and stuff so it actually group stats for each of those unique queries which means that if a particular query starts to go badly it will show up in in a video tablets list of badly performing queries and that information is also rolled up into tables so if a particular table becomes hot it will immediately show up in the graphs and this is something that we use on a daily basis and as well like for example one said there was once a time when the master latency just suddenly shot up it used to be only like three or four milliseconds and suddenly it's like 50 milliseconds and a lot like people are panicking what do we do about it so we bring up the per table cry graph and what we found out was it was just one query on one table that was just taking ten seconds loke ups but then the average looks really really bad so you look at okay no need to panic you just go talk to the developer and get that fixed and so and at the worst case even if those stats don't help we have something called the verbose stream log which basically says you just hit VT tablet on a on an HTTP server and stream every query that's executing it tells you how long it took whether it served any errors what what plan it performed how many rows it returned all that start is available for you to troubleshoot all right so this I am going to skip because it's kind of a summary of all the stuff that I've explained but it kind of shows how all the different VT tablet pieces work together so VT gate actually is also very easy to explain what it does is query routing which means that you just send a query to VT gate it figures out how and where it should send it's actually not that as easy to write but it's a very critical piece of of Vita's basically this is actually the icing on Vanessa's cake because with VT gate then your application has the opportunity to be restored to its original database agnostic glory it's not done yet it's actually the last feature that will make it as more or less feature complete once this is done then you can really really use Vitas that doesn't mean that you can't use it now it says that it will make it a lot better as a story so yes Vitas is production-ready it runs really really well you just had that have wheels we have it's it served YouTube traffic since 2011 like I mentioned this we currently like one of our higher QPS servers can accept up to 60,000 connections actually does accept 60,000 connections we believe it can accept an order of magnitude higher than this we just haven't tested it big increase in my sequel serving capacity and all the stats and monitoring is something that our SR is loved and the Rho cache is getting to launch ironically I said the same thing about two years ago about the row cache getting ready to launch but this time it's for real so yes you test is written in go many of you have not even heard of go how can you do something crazy like this yes we we when we started launching writing with tests and go we had lots of doubts not anymore we are really very excited about it it's not a toy anymore this graph actually shows goes adoption it's on the rise and trending up and we are very productive and very excited about it this is actually somewhat of a take no marketing slide it talks about where we test stands basically what we tested was it ditched it relational databases but didn't go all the way to no sequel basically what that means is that you don't get everything that you get with no sequel databases but you get other stuff that are good about relational databases like indexes the joints you do get some transaction facilities that you don't have at all in no sequel databases we are excited heart of the press we just last week we managed to make with us work with Maria DB and we also made it easier to install I don't know how many of you have tried to download with us in the past it's a nightmare you just you you can't get it to build only five of us in this world can could build with us and they all work at YouTube but now you should be able to download and build it the main reason was because it also required you to download Google's version of my sequel and build it and that's not easy not just Google versions of my school it also required you to build lip as a cell the right version of lip ssl forget it you know so all that has been removed so if you can today download with us and build it it's a it's a cure go binary and you download MariaDB independently and build it and we got it working against it so you're welcome to try it so that's the reason why it's easier to install we are very very close to doing my sequel five six and after I heard the news today I'm excited that we are also going to be doing web scale because the big showstopper was the GT ID and I just heard announcement that they are going to support GT ID very soon so that's exciting we are going to we are getting serious about releasing this as a darker component which should make it even easier for you to download and use we are behind on documentation but we are getting there and we are also getting excited about cloud support and supporting other lock servers like at CD etc we have some crazy ideas so these are these are these may or may not work but we think that if we pull these off we'll do some will make the system really really Rea also so p2p replication what is p2p replication is actually combining replication with peer-to-peer so with all this GT ID and stuff a replica doesn't really really have to be tied to a master right the only thing that the replica needs to know is what where is my next transaction I don't care where it's coming from anybody that has it can give it to me right so we're thinking that some by infusing a little bit of the peer-to-peer protocol we can basically free up all the slaves and say well slaves this is your database and you start with this GT ID and then you go fish for your next transaction ask anyone if anyone gives it to you take it apply it and go look for the next one so what that should give us is actually zero config which means that when you bring up a replica you don't really have to say point it to a master point it to this point it to that replica by its very nature knows what it is and should be able to go find the stream that it is looking for typically we expect that it will find one one source to replicate from and stay with it but if that source goes away it will itself find the next source and replicate from that so so which means that a whole bunch of maintenance tasks will go away it's it's all crazy ideas it may or may not work but I think if we make it work it will be awesome the other stuff is Pakistan there's been a lot of research on Paxos and typically a Paxos transaction takes completes in like 100 milliseconds the longtail latency but 100 milliseconds for a normal transaction is very expensive but what if we used PAC source for my sequel to elect a new master which means that my sequel should be like if a master goes down my sequel should be able to elect a new master within hundred milliseconds right so the five seconds we are talking about before now becomes looks like an eternity and if you combine paxos with p2p replication it's pretty much zero config which means that if if a quorum of my sequel servers decide that a new master is now they elect to elect a new master all the replicas will automatically find the new master and start replication from that point which so I think those two features will will marry well the parallel replication is is something we are excited about but I don't know it is going to keep up with all the new features that that my sequel has the reason why we can we think we can do better replication than my sequel is because we are a chartered database we know the key space IDs that are changing which means that we know we can by looking at the transactions we already know what is isolated and what is not and we think that we can actually replicate very quickly run as many threads as we can in order to replay replication so we'll probably explore this if we think that the newer versions of my sequel can't keep up on replication and there's other stuff which means that like since we have this query log stream we think that we can just dump it on a on a MapReduce back-end and then run analysis and stuff and this per row data protection with all this new stuff that's coming about authorization snooping and all these issues there are actually schemes where you can say that only a user only a user that is logged in they will have a cookie and they or that cookie is the only thing that can be used to unlock a particular row which means that only a user it's an end-to-end guarantee that only a user is allowed to see the data so if accidentally another user comes in their cookie will not be able to unlock the data of another user which means that there is actually a low-level protection from people from data getting cross-linked or even unauthorized access to data so I think with is this is going to become very important than coming few years and we are actually seriously looking at doing that all right the team actually many you used to be just two of us now we are actually a full team we are six SWE's or TL is a larger bar who is not here s our ETL is bad actually he is he's right here actually maybe you should have I'm going to put you on the spot you should have all the Vitesse people stand up and wave yeah there you are all right there's Mike this Brad there's Adam there's a few others who are shy oh there you go there they are so that's a that's about all I have to say Wow perfect timing that's good if you have any questions it's a book thank you so definitely very exciting projects the questions I think several questions came to my mind I limit it to one so the VT server it has metadata hopefully about all the data structures and all that so do you do DD l Salter tables and all at the you know bypassing your VT server are we going through the whole witness we do so at YouTube we actually bypass it and we have safeguards around that but actually what we are actually retooling those to actually go through with us which means that when DDL hits like there are things that we have to react to like when a DDL hits when a table changes we have been validated to row cache we have to yeah we also have a that is something I didn't talk about we have a schema rollout plan where sometimes you just apply the schema sometimes you have to play the shell game where you go apply the schema to a replica and then make that the master and then you go apply the schema to other places so that you can keep up time especially for for busy tables yes we test does allow EDL's yeah else so let's go party yes all right thank you again everyone for joining us today we heard a lot of terrific talks both in the data track and on the web and mobile track so I wanted a bunch of people came up and asked about slides and videos from the talks we will be posting videos on YouTube and sharing them on the Facebook group so if you're interested please go subscribe to the Facebook group and you'll get the videos we already have the first couple of them on the group and we hope that today is the beginning of a conversation among all of us on what on how best we scale and how we can learn from each other with that I don't want to hold anyone from the happy hour happy hour is on slow one soar up and you can grab your t-shirts on the way out thank you
Info
Channel: @Scale
Views: 11,342
Rating: 4.9424462 out of 5
Keywords: atscale, atscale2014, facebook
Id: 5yDO-tmIoXY
Channel Id: undefined
Length: 46min 2sec (2762 seconds)
Published: Mon Sep 22 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.