System Design Interview Walkthrough: Design Twitter

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

let's design Twitter together really quickly before we jump in if you want to try this very question along with dozens of others for yourself head over to helloneterview.com there you'll be able to answer questions directly on an interactive AI powered whiteboard you'll get instant feedback on your design from an AI trained by myself and my co-founder a former Fang hiring manager there's no faster way to ramp up for your upcoming system design interviews so I highly recommend you go over there and give it a try designing a micro blogging service like Twitter consistently ranks as one of the top five system design interview questions asked that the world's biggest companies like Facebook Amazon Netflix and Google as a former staff engineer at meta I conducted hundreds of interviews and over that period of time I saw exactly what it takes for candidates to stand out as we go all call out those subtle dues and don'ts my goal is to make you feel as prepared as possible come interview Deck with stop wasting time and let's Jump Right In in any system design interview that you do the very first step is going to be to make sure that you understand the requirements of the system now there's two main goals in this section first it's to make sure that you and your interviewer are on the same page before you start designing second it's for you to get a clear understanding of the features that you're going to support in your design you're going to come back and reference this list time and time again throughout the remainder of the interview now we break the requirements into two categories the first is what we call functional requirements the functional requirements are the core features that the users expect from the system so for example this would be being able to create tweets next we have the non-functional requirements these are the system's quality attributes so security reliability performance that all fits into the non-functional requirements so for our service like Twitter the functional requirements would be things like this the user should be able to create an account and log in users should be able to create edit and delete tweets they should be able to follow other users they should be able to view the tweets of the people that they're following on their home page in other words the timeline they should be able to like and reply and retweet to other people's tweets and users should be able to search for tweets for the non-functional requirements this system should be able to scale to support hundreds of millions of daily active users as such the system should handle a high volume of tweets created and read every day it should be highly available ensuring the system can be accessed at any time of day with 99.99 uptime the system should ensure the security and privacy of user data and importantly it should have incredibly low latency so there shouldn't be a lot of time that users are spent waiting for the tweets to load keep in mind for the functional requirements in particular this is far from a complete list the real Twitter has DMS complex Integrity systems ad networks the list goes on and on and on but the reality is they had 17 years to build it and we have 45 minutes in an interview so in your interview you want to make sure that you pick the core requirements and then you touch base with your interviewer to make sure that you guys agree on the features that you're going to include in your system important Pro tip here do not spend more than five minutes on this section spending too much time here is one of the largest mistakes that I'll see candidates make time and time again the reality is is that while the requirements are incredibly important they are not the most interesting part of the interviewer so get your requirements down and move on to the far more interesting bits all right let's start designing this thing so it's always most logical to work left to right across the request lifecycle when you're designing on a whiteboard so the very first thing that we're going to add is our clients we're going to include both a web app our website and a mobile application IOS and Android from here requests sent from the client hit our load balancer what's a load balancer load balancers distribute incoming traffic across multiple servers and they're key to scalability in just about every single system design interview that you're going to do now when we talk about load balancers there are two relevant points of discussion that are worth bringing up in your interview the first is the routing algorithm this determines how the load balancer distributes incoming requests to the backend servers common algorithms include round robin which rotates requests evenly across all servers least connections which sends requests to the server that has the fewest current connections and IP hash which routes based on the IP address of the client this ensures that the same IP gets the same server for each request second is the layer in the OSI model that the load balancer operates at so load balancers can operate at different layers the most common are layer 4 or transport layer is like TCP and layer 7 the application layer the HTTP and https layer 7 load balancers are interesting because they can make routing decisions based on the actual content of the request such as the URL or the header okay so for our Twitter like system for our routing algorithm we're going to opt for round robin router it's simple it's fair we have no need for persistent connections and our backend servers are going to be stateless so this works great and then our load balancer is going to be a layer 7 load balancers this means that it operates at the application layer and then we can make content-based routing decisions for feature rollouts and then support more complex traffic management and things like this as our system scales all right but what are our load balancers Distributing traffic to well that's going to be a fleet of servers serving as our API Gateway we're using an API Gateway here because we've opted for a micro service architecture this just means that our service is broken down into smaller independent services that can communicate with one another in each one of these Services is strictly responsible for a specific piece of functionality within the larger system now the API gateways responsibility is to take the incoming request from the load balancer and forward them on to the appropriate service it's worth mentioning that these three components the client the load balancer and the API Gateway are going to be present in roughly 90 of the system design interviews that you do so when in doubt is usually a pretty good place to start the next thing for us to do is to lay out the key Services typically this matches really really closely with the system's core functional requirements that we came up with earlier so let me get those back up on the screen and we'll walk through them one by one so first users should be able to create edit and delete tweets great that's easy let's add a tweet crud service to handle exactly that crud just stands for create read update and delete now users should be able to like retweet and reply to tweets likes and read tweets are just updates to existing tweets so those will be handled within our tweet crud service but for replies let's create a separate service to handle those this allows us to scale replies independently which will be particularly crucial for things like viral tweets and then let's add a search service to handle searching for tweets we'll add a timeline service to handle timeline creation and then lastly a profile service to handle account creation profile management this will also be responsible for managing followers and the social graph the next step is for us to go service by service and we're going to ensure that each one is equipped to fulfill their necessary requirements so let's start with our tweet crud service again this needs to be able to create edit and delete tweets it also needs to support tweet likes retweets it needs to handle the metadata associated with each tweet this is things like timestamps user IDs and any media attachment and then in terms of the non-functional requirements this service needs to be designed to handle really high throughput this is because tweets likes and retweets are some of the most frequent operations on Twitter particularly reads so let's make sure that we keep that in mind while designing as well let's talk about how we're going to store the tweets Twitter actually uses an internal nosql database that they've named Manhattan and similarly we'll use a nosql documentdb as well but we're just going to use an open source alternative like mongodb this is a solid choice because of mongodb's performance efficiencies it's really really good for applications that require rapid and frequent read and write operations just like our Twitter app it's also a good choice because we won't have any complex joins on tweets when a user requests a tweet we just need to return that document the Tweet document itself is just Json it's key value pairs that store all the important information about that tweet here's an example of what a tweet document might look like it has who created the Tweet the contents of the Tweet metadata like timestamps hashtags and mentions even the location and of course we'll include the media retweets likes things like that all directly on the document while we've stored references to the attached media directly in the Tweet document we need to store the media itself somewhere for this we're going to use an object store like Amazon S3 and there's a good reason why media is almost universally stored in Blob storage systems like S3 they're built to handle this vast amount of unstructured data and they ensure that the media retrieval is super fast and seamless no matter what the scale is so if you're in an interview and you're dealing with media there's a really high chance that you should be storing it in Blob storage and Amazon S3 is a great choice great so our right path is starting to become super clear users are going to send a right post request or load balancer will then take that request and send it to our API Gateway our API Gateway is going to forward along to our tweet crud service and then the Tweet crud service will upload any media to S3 and create a new document in our document store with the contents of that Suite updates or edits are going to follow a very similar path there's one more thing to consider here we want to make sure that users can't flood our system with write requests this would usually just be bought activity and would degrade the quality of the service that we're trying to provide so let's make sure that doesn't happen and add a rate limiter along the right path to limit the number of tweets that any given user can create over a period of time after that we'll just add our arrows and our right path is looking good now let's move over and focus on our read path our main concern here is that things are going to be lightning quick so to ensure that let's make sure that we add a cache on the read path which is going to Cache our popular tweets this cache which can be either redis or memcache is going to ensure that the frequently accessed tweets are served rapidly and they don't need to hit our database every single time last thing here in order to further optimize content delivery especially for our Global users let's also add a CDN to distribute and serve the static content media assets and even frequently accessed tweets this is going to put that content closer to the user's geographic location this will reduce latency and ensure that people all around the world have a Snappy user experience while browsing our site we're just going to add our last handful of arrows here and just like that the read path looks great all right let's keep working our way down our list of services moving on now to our reply crud service for writes we're going to store replies in a separate document store now I know what you might be thinking and that's why don't we just store replies directly within the Tweet document and to be honest this isn't a terrible idea particularly if we had a really small service but there's a list of compelling reasons for why we want to keep them separate the first is scalability popular tweets can Garner thousands of replies so storing them separately not only ensures that the Tweet document doesn't become too large and unwieldy but more importantly it allows us to scale the reply service entirely independently now the second reason is for performance by keeping replies separate we can fetch a tweet without necessarily loading all of its replies this makes the read really fast for a tweet and if you've ever used Twitter you see exactly this you load a tweet and you see maybe the first dozen 20 50 replies but it's not until you scroll lower it that you load the remainder of the replies and then just like with replies we're going to add a rate limiter on the right path to limit the negative impact that Bots can have on the service then we'll just add our arrows and just like that the right path for our replies looks good now when it comes to reading these replies we're going to mix things up just a little bit because instead of letting users read replies separately which if you think about it doesn't make that much sense we're going to be bundling them right in with the tweets so this means that when you pull up a tweet you'll automatically see its replies or at least a portion of them to make this Snappy our reply database needs to be indexed by tweet ID so this means every time we go to grab a tweet we're also swiftly fetching and caching its replies from the replies DB here's an example of what a reply document might look like as you can see it looks really similar to the Tweet document the notable Edition being that tweet ID field which we built the index on awesome taking a step back we can now create edit retweet like and reply to tweets our Twitter app is starting to shape up really nicely and at this point we have a pretty usable product so continuing to move down our list of services the next thing we get to is search Twitter search like many of these components is really complex and could be an interview question all on its own that said we'll take a simplified approach and just handle the basics here we're going to need searches to be lightning fast so while the simplest approach here would be just to iterate through our Tweets in our document DB to try to find tweets that match any Search terms or instead you can add a separate text-based index with elasticsearch we'll create reverse indexes on tweet content username and hashtag within elasticsearch which is going to allow us to quickly search for specific tweets profiles or trending topics with minimal latency elasticsearch is a distributed search engine that's designed to handle really large data sets with low latency and high reliability so it's particularly useful for applications that require full text search like our Twitter app if you're in an interview and you need full text search elasticsearch is likely a really strong bet a is going to make its way into elasticsearch via what is called change data capture or CDC this process captures and tracks changes in the Tweet data ensuring that our elastic search index remains up to date with the latest information from our tweet document store so as updates are made to the tweetdb those changes will automatically be propagated to elasticsearch next up is the timeline service and this is where things tend to get really fun in an interview that's about a social media site whether it's design Facebook or design Twitter you're often going to spend a good amount of the interview on this exact service now for our purposes we're going to simplify things a bit and we're going to generate a timeline that's just made up of posts from your friends this means that we don't have to focus on a recommendation system at all the simplest way to construct a timeline is by doing what's often referred to as fan out on read when a user requests their timeline we query the list of accounts that they follow fetch all of the tweets from these accounts sort by time and then return the timeline as you can tell this is pretty slow and expensive so it would fail to satisfy our non-functional requirements around low latency retrievals so instead we can Leverage The Power of asynchronous operations by updating friends timelines whenever a user creates a new tweet this strategy is termed fena on right here's how it works for every newly created tweet they're going to get promptly placed on a message queue this queue acts as a buffer it ensures that we're not immediately overwhelming subsequent operations especially since so many tweets are often created at the exact same time now workers dedicated to our fan out service are always on the lookout they're going to be pulling these tweets off of this message queue one after another for each tweet that they grab they'll run a query to fetch the list of followers for that tweets author once we have the list of followers then the magic happens each follower has a timeline cache a quick access store that holds the most recent tweets from the people that they follow our workers job is to update this cache it takes the new tweet and prepends it to the start of each followers timeline cache this makes sure that the next time any of these followers check their timeline the new tweet is right there at the top waiting for them so this fan out on ride approach while it's more right intensive ensures that reading our users timelines is lightning fast because the data is already prepared and just waiting in that cache it's a trade-off that prioritizes read speed which matches our non-functional requirements from earlier so it's great there is one catch though and that's for users with millions of followers like Elon or Drake the sheer volume of updates that we would need to make to all of their followers caches could overwhelm the system so for these I don't know call them Mega influencers we need a hybrid approach while fan out on right works well for the average user for these high profile accounts we might want to lean towards fan on read this means that instead of immediately updating every follower's timeline when a celebrity tweets we wait when a follower of Elon or Drake checks their timeline the system then fetches the latest tweets from the mega account and integrates them into the user's feed from the cache all right we're almost there all we have left is the profile service as a quick reminder this needs to support creating and managing both user profiles and followers we'll offer a simple SQL database for the user data this will be traditional relational rows with columns for things like username email bio and all that good stuff SQL is a good choice for Twitter's user data because it allows for efficient querying on structured user attributes and ensures data Integrity with acid transactions and it can handle complex joins and aggregations for user Analytics for the follower connections we're diving into a graph DB why well graph databases are tailor-made for mapping out networks especially social ties plus as our app grows this graph becomes a gold mine for advanced features like recommendation systems suggesting new followers and keeping things safe with our integrity system we're also going to spin up a separate service for auth which our profile service will lean on for authentication and authorization why separate this out well authentication and access management are critical in specialized tasks by isolating them we ensure tighter security more Focus maintenance and most importantly the flexibility to integrate with other services or third-party systems in the future without tangling up our main user profile logic oops it looks like we missed one Edge from our CDN to our client so let me go back and add that and then boom there it is a complete system designed for a service like Twitter but we can't celebrate just yet no system design interview is complete without touching on security monitoring and testing so let's spend a moment for each of those let's start by briefly touching on security there are four main considerations here the first is authentication and authorization this is to make sure that every request comes from a legitimate user and that the users have the right permissions to perform the requested action in our system this is all handled by our auth service under the profile service second is data encryption all user data both at rest and in transit should be encrypted we use https for encrypting data that's in transit ensuring that the data exchange between our client and the servers remains confidential and tamper-proof for data at rest most modern databases have native support for data encryption so this is usually as simple as just flipping a switch next is rate limiting now remember we already discussed rate limiting for tweets and replies specifically but it's also crucial to prevent DDOS attacks for this we do IP rate limiting within our API Gateway this means that requests from a single IP address are limited to a certain number within a specified time frame preventing any single user or bot from overwhelming our system lastly we need to consider input validation on our client here we'll validate and sanitize any user input to prevent things like SQL injection cross-site scripting or other malicious attacks typically we'll want to protect against things like this both on the client and on the server now it's still in the monitoring there are three key aspects to consider the first is system health checks it's crucial to keep an eye on the health of our service if any component like our tweet crud service or our profile service goes down or becomes unresponsive we need to know this immediately for this we can utilize tools like Prometheus and integrate with grafana so that we have a nice visual dashboard these tools will monitor our services and provide real-time metrics on their performance second is logging every action from a tweet being posted to a user logging in should be logged this AIDS in debugging and tracking potential security threats in our design we'll use an elk stack stands for elasticsearch log stash and Cabana elasticsearch will store our logs logstash will process them and Cabana will provide a visual interface to analyze these logs all our services including the API Gateway and the databases will send logs to the central system to avoid having to draw out all those lines I'm going to Simply add a general elk component to our design and we're going to have to use our imagination to understand that all of the services have an edge to this component last up is real-time alerts if there's a sudden surge in traffic or multiple failed login attempts we need to be notified integrating something like alert manager or pagerduty with Prometheus that we talked about earlier can help send notifications by email slack or any other channel so that we're immediately notified and can quickly respond when something bad happens last but not least let's briefly talk about testing first up is load testing before introducing any new feature we need to see how our system especially services like the Tweet service or profile service hold up under the new pressure this will allow us to pinpoint bottlenecks and potential failures second is automated testing given our microservice architecture it's particularly important that our services integrate seamlessly every time there's a Code change RCI tools or our continuous integration tools such as Jenkins or maybe GitHub actions are going to automatically run both unit tests which check individual components and integration tests which ensure that the services communicate effectively lastly we need to talk about testing our backup and recovery so our data is invaluable regular backups are non-negotiable and will need to periodically test our recovery process this is to ensure that in the unlikely event of a system failure we can restore our Twitter clones operation swiftly we just walked through the design of a simplified Twitter we talked about microservices scaling we talked about handling timeline generation data storage and all the many intricacies that come along with it we touched a little bit on security and monitoring and testing and at the end of the day I think we put together a pretty robust system that would do excellent in a system design interview now if you want to try this interview for yourself and I strongly recommend that you do then you should head over to hellointernview.com and this is where you can try this interview or dozens of others on an AI powered interactive light board where you'll get instant feedback from an AI that was trained by myself and my co-founder who's a former Fang hiring manager alright thanks so much for watching Everybody if you have questions leave them in the comments I'll be answering them as quickly as I can

Info

Channel: Hello Interview - Tech Interview Preparation

Views: 24,441

Rating: undefined out of 5

Keywords:

Id: Nfa-uUHuFHg

Channel Id: undefined

Length: 23min 4sec (1384 seconds)

Published: Tue Sep 12 2023