gRPC & iOS at Lyft

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay hi everyone my name is Michael Arabella I am an iOS engineer here at lyft on our mobile infrastructure team I was kind of laughing about this earlier actually a lot of this talk is about how to not use JSON anymore so make up your own minds I guess I'm gonna be talking a little bit about lifts networking stack on mobile as it is today as well as kind of how we've started rethinking where we can make improvements and then where we're taking it in the future so to give you a high-level overview of where we are today I'll give you a side-by-side with iOS and Android so to start off we currently define our endpoints in llamó and swagger and this lives inside of one github repository today which is basically a container for information on all of our client facing endpoints so this would include metadata you can send it in a request so what types of responses we can send back any errors that might be expected to be returned from certain endpoints and I'll come back to that a little bit later as you might expect we currently use HTTP as our transport layer and we send JSON today for the most part over that in most cases we do gzip it so we get some sort of benefit from that level of encoding now where is an android really differ today is on the mapping layer where we actually take the data we get from the server and convert it into a model on the client so on iOS we currently use a framework that we have actually open sourced it's called mapper and we this mapper is used to basically take dictionaries of JSON data and map that into strongly typed Swift's trucks that you then pass around basically throughout your day-to-day in the codebase Android took a slightly different approach and they actually depend on that llamo and swagger and generate data transfer objects which are basically strongly types models that are closely tied to this yamo and are basically only used for mapping from the network layer into something that strongly typed on the client so it's very strongly tied to that yeah Milan swagger and then they then take that and handwrite their own models on top of it so they transform it however they need to and then use that elsewhere in the code base on the day-to-day and that kind of abstracts out the separation of concerns between the data representation on the server and then how we represent it on the client now some of the trade-offs I'd like to point off here are HTTP one is really well supported pretty much anything you come to will support it but it does have some limitations which I'll get into a bit later secondly m1 swagger is pretty simple to write but it doesn't act as a single source of truth now what I mean by that in this case is iOS as given in my previous explanation does not actually respect the e/m oh and swagger definitions that we have and basically that's a problem because iOS and Android are kind of out of sync or they can be out of sync because iOS engineers use this as more of a point of reference whereas Android is strongly typed to it and the other downside here is that our servers actually also don't really respect the amyl they also use it as a point of reference so there's only one of these three places that use this data that are actually depending on it to be correct and the third point I want to make is that JSON mapping is really easy to iterate on it's basically adding one line of code if you want to add a new field to your model or add some new data but it's also pretty fragile and that's the one I want to focus on for the time being so we've actually had incidents in the past where hi there the server will send unexpected data and then we won't find out until run time because there's no really way to find that out at compile time and so we won't be able to decode these models and we'll just get nothing back we've also had the reverse where the server will send data that we're not expecting on iOS maybe some server engineer made a typo and we get the same result so these things can happen on both the client and the server but either way we end up with the same result that breaks the mobile clients this kind of got us thinking about where can we improve this structure today what can we do with our current networking stack on iOS and Android in order to make this more reliable and get us to an area get us to say that's less error-prone so the first thing that comes to mind is we could switch to strong types that are consistent across both mobile platforms as well as the server so as we try to approach this we came across something called protocol buffers which are commonly referred to as protobufs and if many of you are not familiar with this I wasn't very familiar with it until recently but Google's very wordy explanation is a language neutral platform neutral extensible mechanism for serializing structured data that is a huge mouthful that probably is you had to reread to understand any of it but essentially it means that your per debuff allows you to define a model so you can think of it as a struct you define that in one place and then you can easily translate it into multiple languages and it gives you that functionality under the hood where you basically define something in one place and you can generate Swift files or Java files or Python or go or whatever the languages that you happen to be using in your project are and the way that works is you define something similar to this so you would define a message for example as you can see it this whole file looks very similar to a swift struct and that'll play in a little bit later but you define something like a message and within that maybe we have our own custom type called a ride type and then a couple system types that are just doubles here which are latitude and longitude and you can ignore the number of representations those are more of just implementation details of how protobufs stores unique keys so if you were to take that protobuf structure and generate Swift from it you end up with actually struct so you end up with something very very similar to what you initially typed but you can get this same representation in Java you can get the same thing in Python and again it's all provided for you through their tooling so you can actually take that old step further and prefix these with something like DT o or data transfer object to give us a little bit more parity with Android just a quick fYI on how we get from a protobuf file to swift or to Java basically there's a command-line tool called Pro Talk that you're able to run a command on it you run pro talk and then you sell it the language that you want it to output which is basically the name of the plug-in and then you hand it the path to your protocol file and then you end up with something like this so for example a pro talk swift out name the file gives you this thing dot Swift so some of the key wins you get from protobufs right off the bat are one it's strongly typed right you have this structure you're not passing around JSON anywhere the second part is that this actually provides a great amount of compression if you were to send a pro to representation over the wire and thirdly it provides cogeneration for numerous languages so as I said you have one representation and you can kind of generate different language representations based off of that definition so great we have strong types but these types still only live on the clients right this doesn't actually explain this doesn't solve the problem that I mentioned earlier where these clients and the servers are not speaking the same language so how do we actually use them well protobuf is more of a wire format right but we still need some sort of transport layer to send these things between services so we could actually use HTTP and JSON and that would work fine and that is a perfectly acceptable approach protobuf is pretty extensible so you could totally do that however we're looking for something a little bit more extensible so in our case we're actually looking for something that might support streaming so if we're gonna be redoing our networking layer and building something to set us up for the future we want to get us the best state we can be at so an example of where lyft might use streaming might be getting the drivers most recent location right so let's say you request a ride and you're sitting there waiting for your driver you're looking at and you can see the drivers car moving along the map basically what's happening there is we're asking the server every few seconds where's the drivers car hasn't moved and then we draw that on the map when we get the response back as you can imagine if we're waiting a few seconds in between each net recall we're adding some latency and the driver might already be there and you won't find out till maybe five seconds later or if you're in a really low connectivity zone it might take even longer if we're able to actually support streaming basically that paradigm shifts and we end up with the server telling the client when the driver moved as opposed to us asking for it so we get the information much much quicker and it also avoids extra latency of us making requests and asking for things that may have not have changed so some of the things that we investigated the first off is our PC which stands for a remote procedure call so ignoring the name the easier way to think about this is basically you're calling a function on a remote service so it's kind of shifting away from the idea of I'm making an API call on something else and I'm waiting for that thing to happen and then I do something with the response when it comes back instead you should think about this as more of calling another function somewhere else in your codebase and in this case you're actually calling it using protobuf messages so you're calling it using those strong types that we generated earlier so to illustrate this a little bit let's say that we wanted to make that request from the client right so we wanted to actually get new information on the drivers location the client would initiate a request and basically call a function or an RPC stub with that information so you call a function somewhere else in your codebase and under the hood RPC actually packs that struct into a protobuf and then passes that on to our servers and servers on unpassed unpacks that same protobuf data but this time it encodes it into a go class or a Python class or whatever language our service happens to be written in and from this point it has the exact same representation that the client knows about so they're able to actually speak the same language and the server can go do what it needs to do and then return that new representation as it knows about it back to our PC and then our PC does that same encoding but in Reverse back to the client and the client gets back a swift struct so throughout all of this there's actually no JSON encoding there's no manual mapping for things all of it happens under the hood and the really powerful thing here is that each system is able to speak the same language so you have one file that defines both the function names as well as the models and you can look at that file or you can look at one of the implementations on any of those platforms of the file and have a pretty good understanding of how the whole thing works together so here's an example of what an RPC quote unquote stub might look like it's looks very similar to a Swift class basically you define a service which like I said is very similar to a class and then inside that you define an RPC so this will actually turn into a function when you generate Swift or Python or something else so in this case it's a function call to get location and it takes a driver location request message or in this case a swift struct and then it returns a response so I'll get into how that translates to Swift in a minute but first we needed to figure out what options do we actually have so RPC is a protocol but we need something to actually implement that functionality for us so two of the big contenders are G RPC and thrift we opted to go for a RPC which is basically Google's version of an RPC implementation so the way Google implemented this is they basically built most of this functionality into they wrote it on C and then they built this into a library that they called their G RPC C core which is the core functionality for Europe C written in C once they had that they built out many other language representations of this that basically wrap the C core but exposed public interfaces in their native tongue so for example you can have Objective C that exposes Objective C api's but under the hood it's still talking to the C core and depending on that so they're actually able to make changes in one place and most of these languages can get the benefits from that with little-to-no it changes and this is one of the reasons that we went with sharp C's so one they right out of the box they support a ton of languages to Google is heavily backing this and three they actually have great support system and are regularly updating this if you look at their repositories there's a ton of commits on them they have very strict release cycles they have betas every few weeks they have production releases every six weeks I think they also have phone calls you can join every two weeks that are just like here's the state of GFC here's where we're going so it's actually really nice one thing you might notice on this slide is there's no mention of Swift though so this became a little bit problematic as we were thinking about integrating this so as we did a little bit more investigation we found that there actually is a swift repo but Google has explicitly marked it as experimental so that it's kind of a little you know concerning when we are thinking about shipping something to production so we had a few options right the first option was we could just use the objective-c version of G RPC which would work yeah that's that's fine however our entire code base is in Swift so if we add a new objective-c dependency that's not great from like a tooling perspective but then also we're forced to generate objective-c files instead of Swift files from protobufs so that's introducing more and more objective-c code as we use more pieces of and as we migrate our events so our endpoints over and then the second part of that is when you have more objective-c code makes the swift code that's moving away from our paradigm of all Swift but then it's also introducing longer compile times it's introducing more context switching for engineers they have to go switch between language ons on a daily basis so this really wasn't a great thing that we wanted to go with so we're pretty much on the path to just try Swift so that said we went and we reached out to Google and they're actually really excited about having us try it out their team is really great they're super happy to like work with us and to work with other people who are working in this community and so we basically took this and we started implementing our app and we ran into a few things because it is experimental but you know we upstream fixes for all of these as as we built it out and there's a ton of people actually contributing to this there's a lot of people working on this it's a pretty well supported repository and there's definitely a path forward to getting this into becoming an officially supported gypsy language so it's actually pretty exciting so taking a step back when we actually used G RPC with the services that we mentioned before what does it actually look like when it comes to Swift well if we take another look at the previous definition we had in protobuf for this service called get location and we run the command line tool on it we end up with something like this so we get a generated Swift class which is a service client and it exposes a function called get location and again it takes that same model representation we saw before and similarly to how you'd expect with like a URL session natively in Swift it gives you back a completion closure but instead of just an estate ax or an error it actually gives you a strongly typed response and it gives you an error back and a call result so to recap a couple of key benefits with shared PC first it utilizes protobuf it's actually designed to work with protobuf so it works really really well and it gives you these strong types on the wire and it encodes them into really small payload sizes as opposed to JSON secondly under the hood gypsy uses HTTP 2 which maintains one single connection and you don't really have to reconnect between endpoints so you end up with it being faster and HTPC you also supports streaming which meets our previous requirement which is really nice and furthermore G RPC actually takes advantage of that and provides first-class support for streaming so you can define either a unary endpoint which is more or less what you think of as the classical HTTP requests where you post a requests wait for response and then get that response back or you can use it to implement streams where you basically bi-directional in this case means you can send data up continually so every you can basically keep sending data over and then the server can keep sending data back to you so this is much much faster and it allows you to send data back and forth more easily without having to call any function or start a new request entirely so when we implemented GFC we took a few steps to kind of make our lives a little bit easier and the first one of those things is we introduced the concept of validation on our end so when we take a look back at the very first heard about message I introduced you'll notice at the bottom we have a double value for pickup latitude and thinking about latitudes it's a double value but there's a very finite number of valid latitudes right it has to be between negative 180 and positive 180 anything outside of that is not an actual valid latitude natively protobuf doesn't really provide support for being able to guard against this it basically just happens whenever you were to check against it so we added this concept of validation rules and that's actually open source on github by lyft what you can check out the name of the project is at the bottom right but we can add these validation rules to double and for example we can enforce that this is greater than negative 180 and less than positive 180 and essentially when we run those generators on these files that allows us to generate Swift code or generate Python code that inside of that performs these type of validations in that language that enforce that these types keep these values within those ranges and there's a ton of different validation rules this is just an example of one of the more primitive ones the second piece that we use to build on top of gypsy and Petroff was a little bit of automation so normally you would take it profile and when you run the command-line tool on it you end up with a set of Swift files and then maybe you end up with some go files and some Python files which is perfectly great like that works totally fine but as we start growing and as we add more endpoints to this number of files kind of builds up and a it's more for us to compile every time a developer builds the app but then also there's a lot of files and it's hard to a it's hard to find them be you have to remember not to edit these because they're all generated and if you modify them they're just gonna get over it and next time they're generated so something we decided to do is basically pre-compile sets of these into modules and then we end up with these modules that we can then import from elsewhere in the codebase and so that leads us to the last change that we built on top of protobuf and sharp C which is abstraction so as I've shown before you could use the generated code that comes right from gypsy and protobuf but we decided to add some wrappers on top of that to make our lives a little bit easier so in addition to the ERP C dot Swift or dot P V dot Swift files we're also generating these wrapper files and essentially what these do is we'll import our dependencies and then we'll kind of wrap all of the call sites with some of our own convenient stuff so as you can see this function called get location still takes that same request model but under the hood it handles trying and catching and some error handling and retrying logic all for you under the hood and then it exports that function publicly so that on a normal engineer's call site they can basically import that module and just call the function directly so this has two big wins from our perspective the first is you no longer have to directly import Swift gypsy or Swift protobuf from most call sites throughout the app all of that is contained in these precompiled modules that we have so that kind of hides away the implementation details of what's happening under the hood so we could actually switch out our PC or something else as long as we keep the public interface is the same none of the call sites they'll actually have to change the second piece of this is that it kind of reinforces that concept that our PC pushes of your calling into another function in your code base and not specifically like making an API request and as you can see from this it doesn't really look at unlike you're making an API request aside from the words requests then maybe response but it's basically your calling into something and just getting a closure called back so going forward some of the things that we're planning to do with gypsy and our new networking layer are first off we're currently piloting this rollout on our analytics so the reasons we picked analytics as opposed to some of our other endpoints is our analytics is one of our highest traffic endpoints from the client so we send a lot of data up so it's good to kind of load test that and secondly it's really easy to shadow analytics data so we can actually send our analytics over HTTP as well as over G RPC and kind of compare how those work so we can compare the SLA is of hollowing these events take to get up to the server we can compare the payload sizes and it's not a huge deal if some issues arise because we're just shadowing them we're not actually using them for production just yet the next step is we're planning on rolling this out to new endpoints and eventually all of our endpoints and then the biggest milestone after that we'll be starting to use the streaming functionality of G RPC so this will allow what I mentioned before of basically shifting to a world of push instead of pull where the server is handing the client information as it becomes available as opposed to the client sending information up and asking if something changed and this not only makes a better experience for our users because it reduces the latency of the information that they get it's always more real-time but it also saves a lot of network it saves battery all those types of things so I would highly recommend checking out ERP IO it has a lot of great information there documentation is really good if you're just curious about what GFC is or what protobufs are I would highly recommend checking out protobuf even if you're gonna use it over HTTP is still an awesome system to use and that's about it here for questions all right I'm gonna run after Alexis here hi I was wondering what impacts this had on other ways that you interact with that code so one of the things people like about JSON I don't know if it's a great thing one of things people do like about it is it sort of human readable but all this super compressed binary stuff isn't as much and one of the things maybe not so great about JSON is it has no support for versioning I seem to recall ERP so you'll have something like that so I don't know if you've just could say a word about how you've found you needed to adopt new tools to work with this or needed fewer tools to work with this or something like that yeah those are actually great questions the first one we're talking about just before this actually so we have an in-house tool that more or less acts as a proxy so we're able to inspect network traffic coming from our iOS simulators and kind of see what the client is sending the server that breaks when it comes to GRC because of what you said we it's highly encoded and you basically need the protobuf model to be able to decode that so we are gonna have to build some new tooling for that that'll basically allow us to decode that traffic we haven't built that yet because we haven't rolled it out to all of our endpoints but it's definitely something that we will need to do the second piece of that was it was I'm sorry what was the second part of it changing the way you do that sorry schema versioning and if you're changing the way you do that or yeah so versioning protobuf supports versioning pretty well in that so earlier in the slides you could see there were integer values assigned to each field essentially what the access is those numbers once they're shipped to production are basically reserved so clients will always have default values proto 3 which is the current version of protobuf all fields are actually optional so they all provide default values for you so new clients can ship without having a representation sent to them and they'll be able to just supply default values and then they're backwards compatible in the way that you're never allowed to actually delete a field you can remove the field and not you can remove the field and then reserve the original key that was assigned to it but you're basically always guaranteed safety of these models because of how her debuff is structure and how it decodes those messages next question yeah the code example sorry microphone the code example that you gave about basically doing the API call with protocol buffers doesn't really look that different from like a regular rest like json api call when you actually look at the function that you're gonna be calling and specifically with screaming i'm wondering like does that same interface still work and if so if not how you have to change it and think about calling like the get location in different way for example yeah that's a good question this does change a little bit with get location so in this case the example I gave was basically a unary call so it's like a one-off basically as you mentioned the way you would handle streaming would be as you can see is creating an instance of driver location service which is that kind of service that handles the communication layer the way you would handle that is you would likely hang on to an instance of that service and when you're receiving data the way G RPC handles it is it basically keeps calling back this closure that you provide it so you're the completion closure you can basically gonna called multiple times with each piece of data and it gives you a different state within that result so that's kind of how they handle data coming down and then data going up is somewhat similar you basically call the service again but hand it more data so it is it is a little bit different it basically changes how you treat completion handlers from it's not really completion it's more of like an update hey I had two questions so with G RPC there's C core I'm curious where's the networking component handled is that in C core or you know because given different platforms that different extents yeah the vast majority of it is handled in the seeker there have been talks I know about writing full implementations like a higher level I know that something that has been talked about on Swift Europe sees github pages are basically replacing pieces of the networking stack with like Swift ni o or the new networking framework that was announced at WWDC so they're definitely like on top of that and I think they're excited to use some of the new iOS specific frameworks at the moment I think most languages try to maintain their connectivity to the C core but it doesn't seem like they're opposed to deviating if that makes a much better expert yeah it makes sense cool and then the other question was around compression so you know before you're kind of comparing contrasting you mentioned gzip I was curious that there's a specific compression that was being used with G RPC or if when you say compression here you more just mean like it's binary such really small and therefore very like compact yeah proto representations are by nature smaller because they basically use those integer keys as opposed to JSON like string keys so that makes them smaller and then there are actually different encoding policies you can give to so one of them is actually gzip so it can gzip the payloads and then there's a couple other arbitrary ones in there cool thanks okay let's go with you first hey I was wondering if you also considered graph QL along with thrift and protobuf and if so why you went with protobuf I don't think we at least I wasn't involved in that particular conversation I don't know that we really considered it I think there are teams that use it a little bit but that's mostly server-to-server I think client/server it's not super useful in our case primarily because there's not a lot of like dynamic querying that we really need to do for the lift use case but it I have seen that we do use that server-to-server for some of these but I don't think that was really considered a ton for the client perspective Thanks sorry just a quick follow-up on the seeker question yeah I was wondering if that if you thought about how that impacts the ability to do networking in the background or things that only the system can provide and if you use it for like large payloads like images or videos or I guess you wouldn't for lyft or if it's something you imagine confining only for smaller data payloads Thanks I said the ability to run requests in the background yeah there's a background task that if you were say uploading a very large video file that's probably something you couldn't do through seek or if you need to use system API I'm just wondering how it affects like more edge cases like large payloads and networking process in the background yeah I'm not a hundred percent sure on that one I would assume at the very least if we knew that requests were still in flight we could start our own background test since we kind of wrap it in Swift and Swift does the same thing with Swift GFC but with respect to it currently handling that I'm not 100% sure to be honest that's checked alright thank you very much Michael he says [Applause]

Info

Channel: Swift Language User Group

Views: 4,016

Rating: undefined out of 5

Keywords: swift, protobuf, grpc, lyft

Id: Go3_72i8bjI

Channel Id: undefined

Length: 32min 1sec (1921 seconds)

Published: Sat Aug 18 2018