Scaling Facebook Live Videos to a Billion Users

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon thanks for coming to the talk how are you do you like the conference so far that's weak that's weak more energy do you like the conference so far and now we are talking I know all of you are waiting for lunch so we'll get through this quickly let's get to the important questions first said what's the lunch menu ok ok all right so let's begin I'm touching I lead the video in fact limit Facebook the mission for video infra is to make videos and live successful so let's look at what we actually work on we build the infrared percent our Facebook video platform we work on a number of large projects a few examples here we try to make our video uploads really fast and reliable every we like fast every like reliable straightforward we cluster similar videos together so we can show them in related videos and then rank them better we have built a distributed video encoding platform where we take a large video file break it up in chunks process that separately encode it in parallel then put it back together this results in very low latency in transcoding videos again everybody loves fuss not a problem and we build an infra for Facebook life and we will talk a lot about Facebook life Sree raise your hand if you have not used Facebook life no lunch for you guys ok all right so super easy to use we'll go through what life is and you guys can try it out maybe why even while the talk is going on that's fine too let's talk scale 1.2 3 billion people access Facebook each day not in a month not in the year every day 1.2 3 billion let's put that in perspective the world has about 7 billion people many of them don't have access to the Internet so all of the people that have access to Internet a very large number access book each day and we consider this a huge responsibility we have to make sure these people get the best experience possible and we figure out ways to connect allow them to connect to other people across the world in new and interesting ways and that is where life comes in facebook lives is a live streaming platform that enables people to broadcast live video streams using just the Facebook mobile app iOS Android we also have desktop apps and you can go live using a live API if you are a professional are using your own encoder so this enables people to share everyday moments with their friends and followers this allows celebrities to reach out to their fans directly and this also encourages citizen journalism ever since we launched live to the public we have seen the uptake has been excellent and we love that so why is life different than just uploading a normal video well the key differences if you are uploading a video it could be well crafted it could be photoshopped with lives there is none of that you are sharing the moment as it is happening that authenticity is what differentiates live from just uploading a normal video the second thing is life is inherently social there is a lot of interaction people are commenting asking questions while the live stream is going on this way the commenters and viewers can actually change the direction of the live stream they can ask you to go to something else while the live stream is going on and so that social nature makes this powerful as well until the interactivity the authenticity and sort of sharing the moment as it is happening are the key things that make life special and different than just normal videos so let's look at the history how where did we start and where are we now it all started in April 2015 in hackathon for those of you that are not familiar hackathons are Facebook are almost a near religious event people from different teams and disciplines get together to base something new to hack on things and to be creative we generally ask people that they should not spend time in hackathon doing something related to their D job you know the others what's the point right and so in this hackathon a group of us got together and we wanted to build a live streaming back-end so this is a photo a two or three software engineers one production engineer a TPM and so on this hack was called hack under the stars it was held on the rooftop garden of Facebook's headquarters in Menlo Park California we worked on it for a couple of days one or two nights the first live stream that went through our system is very special to us and you can tell being the geek we are we live stream the clock nothing interesting just the clock now why the clocks we wanted to measure the end-to-end latency if you like scream o'clock and then view it and then you can you can see the difference for my graph this is what we are we are doing that's what we do and so remember people keep talking about the first phone call ever made and the discussion was mr. Watson please come here I need you well if life ever becomes that big remember 10:03 p.m. end-to-end latencies this is what people are knowing on the first livestream ever so we started with the hackathon a few months later we launched it to celebrities Dwayne Johnson the Hollywood action hero was at launch partner he gave us a really good start and so from hackathon to launching it the celebrities using dimensions app it was four months so we moved really quickly in launching this since then we have had many memorable live streams that I can remember and I'll share a bunch of them throughout the presentation with you folks there have also been a few that were little bit haunting like the one from the comedian Ricky Gervais where he went live from his bathtub I did not need to see that come on right okay so mention happened then we launched it to users you will see about 20 folks here against our engineers production engineers designers and so on we launched out the users in the Humber 2015 and the three screenshot you see are the initial version of life it's not still it's substantially the same even now so again hackathon where we had nothing we started from zero to launching it to all users eight months and again everything we launched on VN app has to work at scale has to work for those 1.2 3 billion users who want to use it every single day so this was a tremendous experience we love this but then that brings us to wildlife like why did we build this in the first place three big reasons engagement most things we do relate to whether people are going to like this and over they would use this and this is add value to their lives when we launch live to users it was clear that we had created something magical and powerful the engagement was really good the uptake was good and so we knew we had something on our hands and is well it was worth investing more time and energy into the second reason was public profiles we wanted celebrities and public figures to have a way to reach out to their fans directly so here we have Wynn diesel another Hollywood action hero you start seeing a pattern I like explosions action that kind of thing wind easel goes live fairly frequently that's good he can reach out to his fans his live streams are pretty engaging we have folks like Oprah who go live comedians like Kevin Hart use the platform President Obama when he was in the office used to use live now I think he's busy surfing in the Caribbean somewhere so don't see livestream from him anymore and the third something that goes to the core of Facebook's mission is connecting the world we care a lot about making the world more open and connected giving people a voice and sharing different perspectives that are hard to come by in any other medium so here's an example of some person taking a live stream of what refugees have to go through to get to a better place to have a better home to lead a better life usually the perspective like these we only get from the lens of a media house but here you have just a normal person who could go give you that perspective I think this is very powerful and this was sort of one of the third reasons why we decided to do life all right you have talked about what is live why we did it in the history of it now let's get into the guts of the infra we'll talk about eight different sections high level architecture and then we go deeper into each section we'll talk about the scaling challenges I will quickly breeze through protocols and codecs because I know most folks here don't spend their day and night thinking about protocols and video sizes and resolutions so we just breeze through that then we'll talk about stream ingestion stream processing and playback we will go through a couple of reliability challenges and some interesting solutions that we came up first and then we quickly touch on new features that are being launched in the lesson that we learned throughout this and again we have live streams throughout the presentation let's talk high level architecture it all starts with the broadcast mind let's say I want to start a live stream from here I will my phone I will create our TPMS stream rtmp s stands for real-time messaging protocol s is secure and if you'll connect to a pop a purpose a point of presence which is basically a rack of hosts that Facebook owns and so it will connect to a pop the pop will with then forward the connection to a full data center where encoding can happen so we create different bit rates different resolutions for a given stream from there it goes to different pops and so it can go to the playback lines so if I am doing a live stream from here I will connect to a pop close to me if people in India and the US are viewing they'll connect with pops closest to them and we make sure that full pipe keeps working with super low latency latency is actually very important and I will go into why soon given the network bandwidth required for showing all these live streams to the world we can't just live with our own CDN we also depend on third-party CDN to make this work all right resource usage compute is a primary resource we require CPU and compute for Co encoding and decoding streams and for analysis of media the amount of compute required also depends on number of streams being processed number of encoding we create the bitrate the resolution etc we require memory for encoding and decoding streams again depends on the bit rates and the resolutions more encoding is more memory once a live stream is done we give people the option to store it as a normal video so they can watch it over and over again they can share it with their near and dear ones and so if they choose to pick that option then we store it in our long term storage if they delete it then it goes away from the storage and the network the network required for uploads is not significant but the network bandwidth required for playing back a given stream is significant all right let's look at three scaling challenges that we ran into and then how we solve them so I've simplified the architecture into three broad pieces the broadcast lines which is ingest encoding servers which is processing of live streams and then playback lines challenge number one was how to deal with number of concurrent evening streams initially it looked to be a tricky challenge but then we realized this has a nice pattern during the day we get a spike and then towards the evening or night it goes down because the pattern is predictable we can plan resources accordingly we still have to make some key decisions correctly we have to pick an ingest protocol that allows us to work across different kinds of networks it could be cellular networks Wi-Fi and so on it should also be something that can take advantage of pace route network infrastructure that is spread through the world and we have to plan network capacity and server-side encoding capacity for all these streams the second challenge was a total number of viewers of all streams together turns out this also has a predictable pattern it is slightly less predictable than the previous one but there is still a reasonable pattern that we can work around the thing that we had to pick correctly were delivery protocol which had to take advantage of Facebook's network intra once again the protocols could be different on the injection side on the delivery side network capacity requirements here are huge and so we have to deal with that with CD ends and so on and caching is extremely important so at this point we were thinking everything seems to be going well like everything has a nice little pattern we can plan resources everything works well and then she just got real right this is how life for its every single time and I know some people don't like nerdy jokes so here's our non nerdy one for you this is a third problem that caused us to think a lot about how to solve this the maximum number of viewers of a single stream now why is this hard this it is unpredictable unpredictability is the bane of engineering if you can't predict something you can't plan for it it becomes really hard to solve and so streams could go viral live streams could go viral without any heads up like it is hard for us to know which stream is going to viral at what point and why because we can't answer that question solving this problem became a kind of heart and so we had to deal with caching and stream distribution to solve this reasonably about 10 15 slides down we will talk about the architecture on the playback side and then I'll explain how we dealt with this kind of a problem so if I was to summarize how is Facebook live with your different compare to just normal video these are the four challenges for the Netflix's of the world you already know your content ahead of time so you can cash it in different parts of wherever your CDN and pops are but for live content there is no pre-caching the content is getting later right then we cannot predict that content ahead of time so you cannot precache predicting number of viewers of any stream is not possible for a premium content like if nothing is going to launch house of cards they know based on last year roughly how many people would see it we cannot because random streams become popular and we cannot predict which stream would go up in there well and so planning for some of these live event and scaling resources based on that is problematic and then we talked about predicting the concurrent spikes which are hard to do so these four reasons why Facebook live video is very different and harder to solve let's quickly talk through protocols and codecs there were many requirements but these four turned out to be the most important time to production we wanted to go from zero just the hackathon to production in four months and to all users in eight months this was a very tight timeline share time the production was a very important this meant we have to take advantage of everything please we have to offer networking for a common libraries everything you can't build a full system from scratch and scale you to a billion users in eight months just not possible end-to-end agency matters a lot to live streaming once you buffer and you batch things up interactivity is gone we like the latency to be sub 30 seconds for every stream ideally in single digit seconds with that kind of a goal you cannot have too many buffers throughout your pipeline because every time your buffer you're adding a few seconds of latency and that adds up very quickly if you have latency then it is no longer life we want to actually be even faster than Network TV and so on so into and latency matters a lot this because this was going to be part of the Facebook's apps there was a limit on how much we can add to the application size the budget given to us was less than 500 kV so let's look at the protocols that we considered and which one we picked we looked at WebRTC which is based off of UDP which meant it was not compatible with Facebook Network infra which is tuned for TCP and so we eliminated that HTTP upload would have resulted in a horrendous end-to-end latency not going to work we thought about building our own custom protocols which are optimized for life built for life times the production was a problem we would have to write our own client libraries our own server libraries tune them well and then launch it to all apps unlikely this would have worked in the timeframe that we were offered we looked at some proprietary protocols which are built for use cases like these the application size was a couple of megabytes beyond our budget of 500 kV so we couldn't pick that I can't be satisfied all four requirements rtmp real-time messaging protocol is built for video streaming so it had the right latency characteristics this is widely used in the industry so there were client libraries and server libraries that we could reuse which meant time through production was very simple rtmp works over TCP so this is compatible with Facebook's network infra and the library size was about 100 kilobytes much bigger budget so great this worked out we put rtmp and this is what really well the for encoding properties that people need to usually care about when doing something like this or aspect ratio the product requirement was one is two one I was very straightforward on the resolution we started with 720 by 720 as we launched two users all over the world we realized people may not have enough bandwidth to upload this kind of content continuously we didn't want people to have jitters or the connection to go away that spoils the experience of viewers and so we started supporting a different resolution 400 by 400 if you don't have enough bandwidth drop down to a lower bitrate and audio and video products were standard AAC and h.264 these are industry standards nothing special they're cool time for a live stream this was an interesting live stream of SpaceX launch it was awesome to see this on the platform and a lot of people tuning in commenting and so on there is something beautiful about rocket launches I don't know what that is but definitely gets the nerd in me super excited thing every time I see a rocket Langley great maybe does the explosion I don't know fire and action maybe that's the interesting part when it's all happening we were all hoping this doesn't explode hey this is not just a rocket the like swimming back-end as well if you wanted to make sure it all works fine so this went well this was cool I'll show you a different stream later on where we were hoping things explored okay let's look at stream ingestion processing so a briefly mentioned pops before a pop is a has a several racks of hosts which have two responsibilities one to terminate incoming connections from clients and then pass these connections over Facebook's network which is much more reliable this way the round-trip time is significantly lower compared to people connecting from wherever they are to our DC and the second is caching boxes where we can cash a bunch of these streams for the playback side not on ingestion data centers are much bigger compared to pops and they have many more functions we will look into the details of which functions we end up using for live streaming a couple of slides down so when somebody wants to do a live stream they will create a connection to a pop before they create the connection they use out-of-band api to get three things they get a stream ID they get security tokens and they also get a URI the stream ID is important for consistent hashing I will talk about that later on security tokens are obvious why we need them and then the URI the URI gets resolved against Facebook's network infra we load balance it and then the client knows which DC or which pop to talk to so in this case you see it is connecting to a pop as soon as the connection is made the pop will forward the connection to a DC this is where the stream ID comes in we need to map a given stream to a particular data center which data center it picks is not super relevant the important part is they pick something such that things are balanced well and then it keeps going to that DC for that screen so we do not have jitter and lag this is a flattened topology of what we just saw broadcast planned connects to a pop connects to a DC now let's get into the details of each one of these there's going to be some cool animation here by the way you can see I get excited about these things alright so let's go inside a pop as I said it has two types of things proxies and hosts which are responsible for terminating the incoming connections and the big cache hosts responsible for cashing on the injection side caching is not relevant and so we can ignore that for now so the broadcast client creates a connection with a random proxy gene host in its in the pop closest to it the proxy gene host has scriptable logic which we can write and determine what to do with a given stream or a given connection in case of live streams the logic is send this live stream to an appropriate DC be a data center based on load balancing and so that is what it does now let's look inside the data center and see what happens there that is where the real processing happens on the ingestion side and emission there's three types of hosts the proxy agent host and the big cash shows they have the same function as the two types in the pop and there are the encoding hosts which is which are the owners of a given screen and we do a bunch of processing on the encoding host we will ignore the big cash host for now caching is not relevant to musician side the broadcast client creates a connection the pop forwards the connection to a random precision host in that data center the proxy in host does consistent hashing on the stream ID and sends it to an appropriate encoding host within the DC when we started we were actually doing this mapping based on the source IP and then we realized with nads and on that fanciness we would end up mapping a lot of machines to a lot of streams to the same machine and this was not cool so we moved to stream ID and that worked out much better this mapping is important for a variety of reasons if a client loses its connection because they move from cell to Wi-Fi or the other way around we don't want the stream to have jitters and so in this case the connection goes away the broadcast client creates a new connection with the new network but uses the same stream ID and so it gets mapped to the same host so viewers of the stream would not even realize that all this happened behind the covers which is exactly how it should be this logic also allows us to deal with planned and unplanned outages at our scale planned outages and unplanned outages happen every day there is just no way around it and so in this case let's say you lose the encoding host so the connection is gone the end-to-end connection breaks away the broadcast plant will be interconnection again to go to the precision host the proxy gene hopefully realize that the encoding host we just supposed to be responsible for this is down and it can no longer talk to it it uses consistent hashing to find the next host that should be able to take care of the stream and start sending the data to that host depending on how long it took for us to realize that the horses down there may be a small jitter but in most cases you wouldn't even notice that all this is happening behind the covers which is great what sort of well for everybody works or develop for us don't get called for on-call issues you works out for users they get a smooth screen awesome so that's what happens is that the DC this is sort of the ingestion and the processing side of things let's look at the encoding hoods and what they do encoding hosts do five things the first one is authenticate a stream so they make sure a stream is proper and formatted correctly and so on they associate themselves with that stream so now they are the owner they are the source of truth for this stream this is important because when we want when you talk about playback the playback lines and the pots need to know which hosts to fetch data from and so encoding hosts are the ones that side the ingestion side and the playback side together the encoding hosts create several n coatings so whatever quality we get we create multiple encoding some lower quality with lower bitrate so we can show the live stream without jitters to people who don't have enough connectivity they create the playback output which is - and we'll talk about - in the next section and they store media share we talked about how live streams become normal videos those get stored the encoding hosts are responsible for converting them into videos and then putting them out in a long term storage time for live stream this one we wanted things to explode and here's why this was a live stream it will show up whether it's not error I will click a button and will show up to employees of BuzzFeed decided to find the breaking point of a watermelon by putting rubber bands over it they went on for 45 minutes and put 690 rubber bands and then it exploded here it was you'll see the explosion in action see then they proceeded to eat it which is fine so we have different kinds of live streams there are rocket launches and then there are watermelons exploding who's to say one is more important than the other right they're all good engagement work out well all right let's talk about the playback size MPEG - so what is - - is a streaming protocol over HTTP super simple actually it consists of two types of files a manifest file and media files manifest file is a table of contents it just points to media files so server creates one second segment so as the live stream is being created for each second we end up creating a segment and then update the manifest to point to this new segment so as the stream is going on the manifest can keep increasing and so we keep a rolling time window if a stream goes on for several hours it is not super useful to send the entire manifest to all clients then the entire time because this is supposed to be live client refreshes the manifest periodically using standard HTTP GET when it sees that there are new entries in the manifest file it will go and fetch the corresponding media files again we are actually beget so it's a straightforward protocol it also allows different bit rates for the same stream and then syncing them appropriately which is again important when you when your bandwidth is varying which happens a lot so more animations now from the playback side so let's consider let's of this playback client is somewhere in India and we are going to use the Blue Square which is at the encoding host as the source of truth or the file being requested it is sort of the freshest so the playback client will connect towards closer Spock and say give me the manifest file for this stream let's assume this is the first time anybody has requested the manifest file for this stream so the only place where this is available is on the encoding host the proxy agent host checks with its local big cache shows to see if something is available the big cache OC is nope there's nothing there the proxy knows then makes a call to bridge a center that is supposed to be responsible for this stream and connects to a random paroxetine host there there it checks locally again with a big cash host again figures out there is nothing under the cache of ceaser because I guess the first time somebody is fetching this manifest and so it knows which encoding host to go to based on the consistent hash consistently hashed stream ID goes to the encoding host gets back the manifest it then populates the cache sends it back to the proxy ssin in the pop which in turn again populates its own cache and then sends it out now all of this may seem like a lot of work like why are we populating all these castles this becomes relevant when some other of their backline comes in and connects with the same pop your the proxy line host will talk to its local Vaqueros finds a file sends it back did not even hit the data center so remember we talked about a stream with a large number of viewers at that we couldn't predict which stream would be using this two layer caching architecture we can solve this problem reasonably because the first time somebody connects to a pop we fetch it from the DC everybody else who connects to the same pop no longer needs to go to the DC they get everything they need right from the pop and so you can scale out by Fox and DC's separately and have separate caching Erik Lehnsherr in both of those places now let's see what happens if let's somebody in the US wants to see this live stream and so they are going to connect to a different pop now this pop doesn't have the manifest file yet so it goes through the same motions checks to the by cache host doesn't have it goes to the DC check so the big castles there finds it fantastic returns it caches it locally and out so roughly you can see that the number of requests that the DC would get is equal to the number of pots because once you have it in the Bob you are done that POC will take care of the manifest file itself now we also need to update the manifest file periodically and there are two ways to do this one will have a small time to live TTL which is a common jargon used with caches where this will fall out of cache in say three seconds and the next person coming in will go to the DC and fetch double eight in manifest file or we can use HTTP push and push this out to to the respective box as required the second one is better than the first one but maybe slightly more complicated I think Todd gave a good talk on async yesterday I think the push mechanism sort of ties into that kind of thing all right let's talk about reliability challenges quick reminder this was the original architecture we talked about there is a big problem that we had to solve and it was a network problem on the ingestion side if we can't get the stream ingested nobody can see that stream and so ingestion problems end up causing a bad user experience across the board and so we had to solve it when the network is bad now if the network is just gone if there is no connectivity there is nothing we can do the stream is just gone people will go find something else to watch but if the connectivity is lower the bandwidth has gotten lower than what it was before then we can do interesting things we use something called adaptive bit rate adaptive bit rate is typically used on playback side where when you are watching videos you will see if you are connected if your bandwidth goes down you start seeing lower quality videos we also apply the same concept on the ingestion side where if your connectivity goes down or your bandwidth reduces then we can send in a lower quality stream now this is not something that we like to do because who likes lower quality videos but the choices are either the stream is broken or you see a lower quality stream and so we have to make a compromiser when I think of connectivity problems people typically think of areas like these where you don't know what's going on you'll be surprised to hear that there are connectivity problems in very interesting areas of the world like the White House terrible Wi-Fi we could not believe it they try to go live a couple of times in twerkin visit how we never thought that Wi-Fi would be a problem for a well they do now we know so dealing with network problems three different solutions we talked about the adaptive upload bitrate we can handle temporary connectivity loss by buffering on the client this is one of those times where we are okay with buffering to avoid cutting the stream off in the worst case where there is just not in the band were to send a picture stream over we can do audio only broadcast where what you are seeing is more important than what people are seeing audio only broadcast work fine cool let's talk about a second problem thundering hood everybody loves thundering herd's there is a more explosions here so let's say the particular stream is popular and so a bunch of liebe clients are asking for that stream again we have the blue dot only in the DC at this point the proxy gene hosts all make connections with that one big cache host that is supposed to be responsible for the stream we don't allow all the streams to go to the DC right away we block them using a cache blocking timeout this time mod is important because there is no point in sending 50 requests over to the DC right away we can send one and the rest of them can wait for the response from this this works well if the response arrives in time sometimes the response doesn't show up in time and then the timeout timeout expires this unleashes a thundering herd everybody who was waiting for the cache to be filled now goes to the DC guess what is going to happen yes fire so this is well now the way around us is tuning of cash locking timeouts if the cache blocking timeout is too high then losing even a single request will result in stuttering of the screen more jitter if it is too low then we get frequent thundering herd's so tuning this is a little bit of black magic a little bit of art a little bit of science but we have sort of gotten to a reasonable place over the years all right let's talk about new features this is exciting so going broadcasting live is one experience but wouldn't it be cool to add somebody else to add a viewer into the stream to interact with you so imagine when diesel goes live and I get to join that broadcast stream and ask him a question directly that would be fantastic and so this is worth this feature is it is being rolled up now where a broadcaster can add a viewer and then interact with them in real time the International engineer is now these two people are talking in real-time you can't have seconds of degree it has to be milliseconds for them to have a reasonable conversation while everybody else is watching their conversation so that was the interesting challenges that we had to look at video tab is out to some people at this point where the key challenge here is we have to pre buffer several different video streams and push them out to the clients video tab allows people to watch top live streams recommended live streams related videos or a lot of content from their friends and followers and so on and so buffering all of that and pushing it out was an interesting problem to solve we also allow casting of live streams to different screens like your TV Facebook recently launched a facebook video app for TV for Apple TV for Samsung TV and amazon firetv and so now you can watch interesting content there the holy grail of immersion is 360 Live this is where you can transport a person into a completely different experience as it is happening life super exciting the hard problem though is as of just normal life 720 by serenity was not hard enough now you have to do a 360 view of that entire thing so tuning our entire stack for super high bandwidth streams like these was a challenge now all the people who said they haven't tried live yet aren't do repenting that you can try it now when you see how cool it is okay let's look at some lessons learned large services can grow from small beginnings sometimes it is not possible to predict if something will become so successful or big we have to start somewhere it is better to actually write some code then keep discussing forever on what the ideas architecture should be you have to adapt your service to the network environment that you have to operate on this means bad connectivity bad networks people moving from Wi-Fi to cell and so on dealing with all of those aspects is important to make a product successful reliability and scalability are built into the design you cannot add these later this usually does require more time than people like to budget for but I think this is very important hot spots and thundering herd's are my best friends at this point they happen in every component and I have to deal with them anyway all of you have to as well the design has to accommodate for planned and unplanned outages if you don't think will keep going down on calls will be hell you will not be able to get any work done and so dealing with this is very important you have to make compromises to ship large projects we have to compromise a lot in terms of what protocol we use what quality we support and so on but these are important compromises to be able to launch and have a reasonable service it doesn't feel good to make compromises but it is the right thing to do and keep the architecture flexible for future releases doing things like a broadcaster adding a viewer is can be pretty tricky the architecture is not right and so flexibility is very important when you're working with product teams that are moving way faster then the infrastructure team can move so flexibility becomes key cool these are some of the sources a bunch of people set out with the slides the two images are for got real and so on alright thank you time for questions okay so we have a few minutes before lunch for questions one thing that such II didn't have in his talk which I asked during a preview was how they deal with porn with what porn porn do you want to answer the question here I like action movies nor fun it says one has to be dealt with carefully there are a bunch of things if people report form then we actually have humans looking at that to take it down there are also some automated flows to take down pawn so it is a combination of those two but it's a hard problem alright no pun intended oh my god yes anybody else any questions we probably cut it out in like we'll take a few questions and since lunch that was Wonder said hi FS thank you it was very entertaining talk and my question is as far as I understand most viewers of a live stream are more likely to connect to a pop rather than the data center right because most likely to be co-located to the live stream presenter so in that case you I mean it seems that latency is not that big of a deal because you would rarely connect to a data center so on an average your average case latency shouldn't be that much of a problem is that correct actually latency is still a problem because as I said Sam broadcasting from here people could be in different parts of the world not everything so even engineers like to believe that everything travels at the speed of light but when you send it over 15 different networks different peering arrangements and so on a lot of latency gets added to we have to encode these streams for to account for the bandwidth available on the viewer side that encoding requires a buffer for it to do a reasonable job and create a good enough quality with a lower bitrate and so that adds latency the different protocols are talked about - the other protocols like that HLS and so on each one of those have their own requirements on much buffer they need before they will start playing and so there are several factors that go into that and so latency actually is the problem and we have to queuing it really well to have a good experience point thank you hi thanks for the toe first of all a couple of questions on the back hand on the data center do you use containers on the back hand and how do you deal with storage how do you provision it how do you keep up with it so we use Linux containers and on storage we have our own system that we have talked about publicly called our haystack or ever store and we use that system for somebody's weaving and we have a system called ever store which stores all the photos and videos that we have so it's a dedicated system for these kinds of use cases I want to ask you did you guys consider using the quick protocol instead of a TCP for uploads or streaming so we are considering quick now we didn't consider them because our network and freudstein for TCP and so since times of reduction was a key requirement we went with whatever was whatever the head does get into production sooner but now that we have sort of more Headroom if you will we are going to try a reliable UDP and a bunch of those things hi you've been talking about seconds of latency in your streaming over here little bit nod you've been talk about seconds of latency new streaming but towards the end you talk about new feature where you having persons personal conversations where you to be down to milliseconds very tackling that till to tackle some of those things we have to look at different protocols rtmp won't do because it can't do 200 milliseconds or so so we look at a variety see we already have a body support for real-time calling on messenger so messenger has video calling and voice calling and so we can use a similar stack for people talking real-time okay hi how do you differentiate between the amount of people who are watching live streams so because there are live streams which are worse by I don't know how thousands of people and some are worse by like two or three so you still put those metals files into the caches those magic boxes big caches are like components it's manage all those things yes so it depends on if you have one viewer then a cache is useless yeah but we can't predict that you will have one viewer only so that was just a simplification of like the putting everything in today because yes okay if you have two users watching it hopefully that in the same Bob so caching it in that form makes sense if you have one then yes the cache is pretty useless okay thanks you released live first on salute for celebrities and then for the rest of the public was not done for performance reasons or for marketing reasons sorry I earlier the first part you released live first for celebrities and then for the rest of the public wars are done for performance reasons or for marketing reasons so we wanted to figure out if this is something that we should invest in more that is one and two it is good to try things out and see how they work to evaluate our assumptions in production if we launch it to all users then a much larger problem to deal with mistakes get amplified with celebrities it is a small set but they still end up testing the limits of our infra because celebrity streams will typically have a large number of viewers and so we get all the testing we need in production instead of just doing our own shadow testing and so on yeah hi so when you guys started so what was the team size and when the idea becomes the reality how the team grown up so initially as I showed you the first picture it was basically a very small set of people that build the infrastructure and it was just those people who launched it out on dimension that once it figured out this is working well and we need to invest a bit more to launch it for all users we added a few more people I'm not entirely sure we have talked up the team sizes before but teams at Facebook are typically very small in the morning I think we saw how teams are sizes more than twenty fail miserably we have already taken that too hard we typically have very small teams a closed set of people who work together and deliver things if we need more people with different skill sets we form virtual teams and so there will be a couple of people from product couple of designers few infra engineers they sit together for a few months and then hammer out the required code how do you concede that using third-party media server at least during your prototyping phase before you develop your own or so most things in the industry don't work at Facebook scale and so it is okay for a prototype but we know we have to throw that out anyway before we put it into production and also we typically build our own or if you end up using something that is open-source or so on like we took views my sequel a lot we make a lot of changes so it can scale and operate at our level at the curiosity quit say for your prototype do you use one can use tell which one you used we then use a mediator OJ's built around thank you I'm curious you say you encode things to multiple different formats for different devices and things do you do that on the fly in terms of creating a new stream per set that needs it or do you do them do you do all the different formats ahead of time just in case you need them in which case surely that wastes a lot of safety so it is a mix of things we can make some assumptions about what format we are going to need but I think that is one area where we need to invest a lot more to save Cebu you're right efficiency is starting to become a problem and so we look into that okay great questions everybody I think it's time for lunch if you have further questions please come up to the stage and ask them thank you very much okay
Info
Channel: InfoQ
Views: 56,124
Rating: undefined out of 5
Keywords: Architecture & Design, Performance & Scalability, QCon, QCon London, Streaming video, Facebook, Facebook live, Case Study
Id: IO4teCbHvZw
Channel Id: undefined
Length: 51min 31sec (3091 seconds)
Published: Fri Jul 21 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.