Best Practices for (Go) gRPC Services

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay all right hi everybody my name is Doug folly I am an engineer at Google and I work on the go language implementation of G RPC and today I want to talk about some best practices for running G RPC services the concepts should apply to pretty much all languages but the code examples will be in go and some of the concepts will be go specific kind of concepts so first of all for those who don't know G RPC it stands for the G actually does not stand for Google the G stands for a different word for every release we do so the first one actually stood for G are PC and an RPC being remote procedure call it is a high-performance open-source standards-based feature-rich RPC framework that's a lot of buzzwords in one sentence basically it's a way of sending messages across networks and it's an open-source version of the stubby RPC system that we've been using internally at Google for the past I don't know 15 or 20 years or so and it's under active development and the latest release is version 1.3 so this is a typical hello world style example on the left hand side is an example of what the proto file would contain that describes the service and the RBC's that it provides here you can see there is a map service defined which has a division RPC and it accepts a request message and returns a response message and those are defined below on the right hand side is the actual implementation of our map service it's got a server type that is defined that has a method on it that implements the method required for this type to be registered as a G RPC service as a map service and then the main function at the bottom shows how to listen on a tcp port and register the server and start it up and and yeah this is pretty much all it takes to do a simple hello world style solution but you know normally things aren't this simple when you get to the real world things get a lot more complicated and you know your concerns become how to run your service reliably in a secure way that's performance it does correct error handling etc etc and these are things that are often left out of the getting started documentation so what I what I wanted to talk about today was was how to do some of these things and what kind of best best practices you can follow in your services in order to you know resolve some of these issues and and cover some of the gaps that are missing between the hello world example and a real-world actual deployment and to top it all off when you get to the real world you know complexity never almost never looks like the standard you know I have one client that talks to one service that that thing that the hello world example always presumes it ends up being something more complicated like the picture at the bottom where you have a bunch of services all talking together and so some of these concerns become even more important when you have those types of deployments so yeah so these are the summary of all the things that I want to touch on today I apologize if some of the transitions won't be smoothly it's just going to be a list of things that I'll be running through and hopefully some of them will be pretty obvious to many of you here but hopefully there will be some things that many of you haven't really considered before so yeah with that I like to start with API design because I think that if if you start off with a poor API then you can really run into trouble trying to actually implement something that is reliable on top of that so it's a good idea to start with the API design and really make sure that you're following best practices there before you move on and start implementing so the first the first concept for API design is idempotency this means that it is safe to retry our pcs not knowing whether or not they've completed and so the reason for that is because as a client talking to a service you don't necessarily know whether the operation that you're attempting has been committed or not before an error occurs so you want to make sure that your RPC API enables you to proceed even when you wouldn't you don't know for sure whether that's happened and allow clients to retry calls when there's any type of a failure so for example if you're transferring money and you have an RPC to initiate that this example request and response here shows that you can you know provide some basic information a sender recipient and an amount of money to transfer but obviously this is bad because when you perform this RPC if there's some kind of an error like a deadline exceeded you don't know whether or not the money's been sent so as a client you could retry it now that will result in multiple transactions happening so obviously we want to avoid that and we can resolve the issue by adding a timestamp for example to the operation or a global unique ID or some kind of a transaction token that the server can use to determine whether or not the operation has been performed and then not perform the operation a second time if it turns out that it's redundant it's also important that your response be identical whether this is the first call or a subsequent call and the reason for that is because you don't want your clients to have to implement special case logic to know whether or not this response is a first response type of message or some kind of a subsequent thing and have different logic depending on that so you want to make sure to always return the exact same response message let's see another issue is performance you have to think about that when you're designing your API it's repeated fields can be a particular problem repeated fields are essentially lists that have no bound on their size in protocol buffers and if you have one of these in your requests that basically means that your service is allowing someone to specify potentially an unlimited amount of work for your service to perform and this can be a problem because obviously you don't want to do an unlimited amount of work we had a RPC on a team that I worked on before where you know we offer the ability to batch multiple database transactions into a single call and we expected most users might send you know 20 or 50 operations just as a convenience we offered this but we did have one user that thought you know like five thousand would be reasonable and that leads to a lot of a lot of problems down the line so it's important to set limits on these things and check them when you're validating incoming requests and an error out early if you see something you don't want to handle conversely in a response wherever you see a repeated field that you want to send back you should stop and think about what that represents and make sure that you're not going to be returning an unbounded amount of data potentially to and if you are then you want to make sure your API supports pagination so that users can do repeated queries and get a section of the result back each time or alternatively you can utilize a streaming RPC where the results are sent back one results at a time another issue with performance to think about is long running operations so anything that takes on the order of you know a couple tens of seconds I would recommend not doing that in in a synchronous RPC because the longer an operation takes to complete the more likely it is that it will fail at some point because either the machine goes down or the network connection gets dropped or you know any other number of things so when you find yourself wanting to do this amount of work in an RPC handler you should consider how you can do this for instance by performing the operation in the background return from the RPC as quickly as possible and then perform that operation in the background and deliver the results through some other mechanism besides the RPC response so you can use either a callback or an email or publish a message to pub/sub or you can also in your response message provide a tracking token back to the user and then they can call a different RPC in your service to check on the status of that operation but ya keep pay attention to those long-running operations so still on API design here default values are important to consider with protocol buffers if someone doesn't specify a setting for a field that field will go back to the default value of you know the zero value for that type so an empty string or zero integer and for enums the zero value become the zero entry in the enum definition and you can take advantage of that by making the zero value some kind of an unset unknown on fied that can be an indication to you that the user didn't set the field when when you wanted them to make a decision about what that should be it does it does make sense though to specify default behavior that is the most common behavior and so as an example of that if if your operation is a database query for instance then you might make the sort ordering and inu mand the default there can be a sending in that so that's a reasonable thing to do another reason why you might pick less common default behavior though is to maintain backward compatibility and so if you if your service already has this RPC and you want to add a new field that offers new functionality to the user you obviously need to make sure that existing users aren't broken when you add this so just make sure the default value matches up with the old one because even if you're running like the full the full stack yourself and you own the clients and you're in the services unless you have unless you are intending to have planned downtime then you basically can't restart your clients and services all at the exact same moment so when you're doing this make sure you add the new functionality do a rolling reload of your servers and only then begin to reload your clients and then just be careful when when you're introducing new features like that because inevitably you will need to roll back your your servers if you find a bug and so keep in mind that if you've already started releasing clients that use those features then you're going to be broken let's see so last thing on API design that I'm going to talk about is errors so in G RPC errors are a first-class concept what that means is for any RPC you send out the response will come back as either the payload data in message for the type that you defined so that RPC or you'll get back in error and the error includes a status code which is canonical and shared across all the different languages that implement G RPC and then you get a status message which is an arbitrary string that you can set and then recently we added the ability to send arbitrary structured data along with the error that you can attach to the errors that you send back and then check on the client side so it's best to use this functionality and not try to return error information in the response because if you if you don't have a failure in the RPC itself but the result message says that there was a failure now your client logic becomes a little more complicated because you have to check to see whether the RPC succeeded and then if it actually succeeded check to see if the response indicates that the operation you were attempting failed and that can get really complicated so it's best just to keep it simple and use the built in error system and then along with error handling it's a good idea to avoid trying to batch multiple separate operations into a single RPC and the reasoning behind this is that error handling can become complicated in these situations too because if you're if you're performing multiple for instance database transactions in a single call now you need a way of communicating back to the caller which operations succeeded which ones failed which ones maybe you didn't even attempt and so that can get really hairy especially in the context of you know if you don't manage to send the result back because there is a deadline exceeded there now the client has to figure all this out so so in these cases I would recommend use a streaming RPC where each request message indicates a single operation that you want to perform and then each response to that to that particular message would indicate a single result and that's one place where maybe you do want to put Aramis into the payload because the errors are per stream so for a streaming RPC you would have to terminate the entire stream in order to indicate that in there happen the other option is to use multiple calls instead so initiating an RPC call is fairly lightweight so it should be fine to just tell your users to perform multiple calls when they want to perform multiple operations and you can have multiple concurrent are pcs in flight that's fine and that keeps things simple for error handling so on the topic of error handling the requisite douglas adams reference here don't panic basically any kind of a panic means that your server has crashed and now you're not serving traffic anymore so this should be fairly obvious but especially in languages that are not go it's pretty common to throw in assertions or checks that will actually crash your application if you violate them and and yeah I've run into this situation in the past a few times it's remarkably prevalent how often this happens so yeah just know that even if you are not in your application doing these things maybe look at the libraries you're using and and try to figure out if those have checks in them and avoid those at all costs so panic is appropriate in certain cases if you can detect the you know there's been memory corruption or if you you know run out of memory or if imminent data corruption in your database is about to happen then sure go ahead and panic if if that's your only way out of the problem but otherwise always best to return an error and then just a word of warning is obviously you know no pointer exceptions are you know to be considered as a panic just like this in go for proto's it can be it can be a pretty common thing to have proto messages and when you have nested proto messages you can have mils if they're not set and if you use the getters on the proto messages those handle mil receivers just fine and don't panic but if you forget to do that somewhere in your code you can end up with crashes and causes lots of problems so another thing to talk about for error handling is what to do when services that you are using as a service or libraries that you're using return an error don't follow the pattern or anti pattern of if ur not equal Mill returner from your our pcs handler you really want to convert all of your errors into things that are appropriate for your users to be able to make sense of what happened and you want to make sure you set the status code correctly in that error to something that will give the user an indication of whether they should retry the RPC or whether they should give up and in file a bug against you or whatever so at the bottom there's an example of how to do this for making another G RPC call from your G RPC service so right so there's a from error method that you can call to make sure that the result that you got back from that RPC call was the right type of an error if not you can return it internally or back to your caller that's not expected to happen and then and then you actually want to look at the error codes that came back from the service that you called and make sure you're translating those appropriately and so the example down here is taking an invalid argument error which means that my service when I made an outgoing call I gave illegal arguments to that service that's not the fault of the users that called me so I'm going to translate that into into an internal or when I returned that so the thing there that is it's my fault not theirs okay um switching gears a little bit deadlines deadlines are really important and deadlines are how a client communicates to its service when it needs an answer by and it allows the client and the service and all of the other services down the chain assuming your context is forwarding this deadline correctly you know it allows them all to stop at the same time if they can't meet that deadline so there there's a couple ways that you can set a timeout you can either set it as a duration in the future as the example on the left or you can set an absolute time as in the example on the right and then that information gets set into this context object which is part of the GRP C API used for clients and services and so then you attach that timeout to the context and pass that context into your outgoing RPC call and then that gets received on the server side servers care a lot about deadlines to that if if your deadline is not long enough and as a service you can know to abort early and not attempt an operation so so it's a good idea to check deadlines to make sure you have enough time before you start performing an operation and conversely if you have no deadline that's a very bad thing actually it means that the RPC cans that they're install with you indefinitely which could be a problem or if your deadlines just too long it means that you're using up resources that you you might not be intending for for too long so here's an example of how to end go retrieve a deadline from the incoming context in handler and then you can also check how far in the future it is the deadlines returned as the absolute value always and it's carried and propagated in the context in that form but then you can you can check how far in the future that is and then error out if there's not enough time or if too much time was specified now as a service that's making calls to other services as part of an interconnected system you often need to propagate this deadline that the client gave you when you're making your outbound calls and you may be making many outbound calls and so you need some kind of a strategy for specifying to all the calls that you're making what that deadline is so the the first most obvious thing you can do is just reuse the context as it was given to you when you're making outbound calls and this is pretty simple and it works and it lets for each call lets everybody know what the deadlines that your client had but it does have one drawback which you know depending on your use case could be a small one or it could be a big one and that's that you don't ever terminate early even if you're sure that the operation isn't going to complete so if in the process of performing the first call you use up almost all of your time and you know you're not going to have enough time left to make calls two and three here then you could in theory return early and give up at that point but this implementation would keep going because the deadline has not been reached yet and so you would perform the second call and and maybe that one times out so so it's not super ideal but it is it is convenient another choice that I've seen employed is to basically measure the performance of all the services that you're calling and then set a deadline that's appropriate to each one of those calls so so in this example if the first call takes you know usually it completes within 100 milliseconds then you can apply a hundred millisecond timeout to the first call and then 50 and 100 subsequent ones so so this is also fairly easy to implement and it does give you that benefit of returning early in case you know there's not enough time left to finish an operation and one one of those calls is taking a long time but the problem is that it may cancel too early and so even if you're using something like the 99th percentile latency number four for those deadlines ninety nine percent you know times three different calls means you're going to fail your RPC now three percent of the time when you do this so you have to use really conservative numbers if you if you go with this approach and then the third the third method for this that you know is the most ideal in my opinion but probably also one of the hardest to implement would be to try to determine how much remaining work you're going to have to do after the first operation completes and set that as your deadline when you make your first call and then when you perform your second call you do the same thing so you work backwards about how much remaining work you have to do and so for the first call we basically take whatever the client's deadline is and we subtract 150 milliseconds which was the time for the two subsequent calls you know on a good day and and then we use that as the deadline of the first call so so this lets your first operation take as long as it needs as long as it returns within time to assuming the other services are performing as expected everything will work fine so the the problem here though pretty obviously is that this can be hard to maintain so if you want to reorder calls or add something in the middle there you have to you have to go update all the places where you're setting the previous deadlines okay um rate limiting so rate limiting is a potential problem for all services pretty much if you're getting too many requests from the same-same user then they can be denying your ability to handle requests from other users so to keep things fair we like to apply rate limits the way that you do that in G RPC go is you can take advantage of this hook that we have called the tap handler this is executed basically right after the message makes it to the server so before the payload actually is even decoded and possibly depending on the size before it's even been received entirely so this is the best place for a rate limiting kind of check because it prevents the server from using too many resources in order to perform the rate-limiting so below is an example of how to use that I've used the X time rate package which provides a limiter type which is a pretty simple but effective way of doing this but essentially the context which is provided as a parameter to your tap handler contains information about about the user and their credentials and everything so you can you can extract that from there I haven't shown that and then you can just apply in rate limiter for each user that calls your service obviously you would want to make sure your your map has a RW mutex or something like that on it because maps aren't thread-safe here but you can assign a new rate limiter if one doesn't exist and then you just check the allow method if if the users over their allowable rate limit then that returns false and you can return an error immediately back to the back to the caller so on the other side as a responsible client you want to implement your own rate limiting and that should match whatever the server's enforcement is so keep that in mind and it's just as easy to do rate-limiting on the client side as it is on the server side and you can use the exact same x time rate package for that in the same limiter object in fact which you can initialize in the exact same way it's just that instead of calling the allow method which determines whether or not you're allowed to proceed you call the weight method which blocks and note that the weight method takes a context which allows the weight to break early if the context deadline is exceeded so recommend doing this when you're making outgoing calls and another thing I wanted to talk about was retries so g RPC does not currently support retries but it is it has been approved RFC and so that will be implemented in the near future and the way that you'll set that up is through this service config feature that we have in G RPC that enables services to communicate to clients how how they should contact that that server and and what sorts of parameters they can use and so one of those configuration knobs will be how retries work so the retry support that will be added isn't only retries which would be sequential calls and then if an error comes back matching a certain error code wait a certain amount of time and perform the call again that's your standard retry it also will support concurrent hedged requests so we'll actually be able to send out the same request to multiple servers at the same time and then whatever result comes back it will take that one and cancel the call outbound to the other services but until then you have to implement retry as yourself sorry and you can do this with either some kind of a wrapper library or you can use interceptors which is also a good choice and just a note is that the context that you have given to you from your clients it's a good idea to just use those verbatim when you're doing operations from a previous role we tried something where we took the deadline and you know split it into three parts and did retry as in case things stalled and and that turned out to be a really bad idea because if you abort an operation early you don't know it might have been just about to return and so if you know if you took your ten-second deadline you split it up into three three-second calls and it was just going to take four seconds to do your operation for whatever reason then now you can't do it anymore and so if you had used that ten-second deadline you would have been done much faster so transient errors you know there should either be automatically retried by G RPC for you in certain cases that happens when their connection errors and in other transient error cases you want to just retry those yourself when you get those errors but don't don't mess around with the deadlines would be my advice so here's an example that I used on a previous team for wrapping client calls and this essentially does a lot of things that I was just talking about so it employs a rate limiter it it employs retries and it also deals with wrapping errors when they come back and and basically the way this works is it's just a single function that you can call and and you can configure this per client that you're connecting to your service that you're connecting to and and so it's just a single function that you call whenever you want to make any type of RPC to a particular service and at the bottom is an example of how it actually works so the first thing that you would do is declare your response in the scope in which you want to use it then you actually call this function and you pass it a closure that returns an error and assigns to that result from the outer scope the result of making the call and then actually you know you perform the outgoing call using the texts as it's given to you and then you and then you return any errors that came back and then this will be run inside of the child RPC function and if there's an error result you know if it needs to be retried then that'll happen automatically and then it gets converted to the right type of error that you wanted it to be and and then returned back to your your function that called that and then afterwards now you can use result in there as though they had come directly from the RPC call themselves so this was pretty handy for us another way to implement this in G RPC is to use interceptors and I don't have an example for that but that's actually a pretty good idea I think and avoid some of the mechanics around you know passing the name of the the function or the name of the RPC needing to you know do this wrapping stuff it sort of takes care of all that for you okay so here's something that's more specific for go there are no limits on the number of go routines that a service can have outstanding at any point in time basically every time a request comes in G RPC will handle it and it will fire off a new go routine in order to have your application serve it so this can be a problem under heavy load because with no limits comes no memory so um there's a few ways of mitigating this problem the first of which would be to set limits and limits on the number of outstanding connections you can have at any point in time and also how many concurrent streams or how many in-flight RPC is you can have for each one of those and so below is an example of how you would do that you have to limit the connections on your on your listener itself and you can do that by wrapping it in this X net net util functionality and then G RPC though does of a way of setting the limit on the number of streams that you can have for any ferny connection so basically the product of those is how many outstanding our pcs now you're limited to so another option would be to utilize the tap handler that we talked about for the rate limiting and that works pretty well and you could put any number of things into that too you don't just have to limit it to a number of our pcs in flight you could also look at your memory that you have available or anything else that you want to use for health health type checking and be able to air out early before before things you know and start using too many resources and then the last option is is probably the ideal one but takes a lot of effort to set up and requires that you have a you know a lot of coordination going on around it but you can use health reporting to have your service advertised how much memories it's got available and then have load balancers that are monitoring the health of all of your servers and redirecting traffic appropriately so not just a large number of requests can run you out of memory but just a single very large request can run you out of memory there's no inherent limits on the size of a proto message and there's no built-in defaults I believe for G RPC enforcing a limit on how big of a message it will accept and so that can be a problem because just in the process of receiving your message you could run out of memory so it's advisable to set a limit on on that received message size and you can do so when you create your server with the option up there and I recommend figuring out a number that works for your service and setting that and then on the way out even small requests can result in very large responses so consider where you have a database query as your request and a enterprising user wants the entire database and they do select star now you've got to fill in your response message with the entire contents of your database which will probably run you out of memory before you can return and it's even possible but maybe you can fill it all in and then you send it back to your client now they crash because they don't have enough memory to hold it because maybe your server has more resources than their client 'add so this goes back to the API design issue though and how it's hard to fix things when they start out broken in this case you should you should support pagination for this RPC and and limit the number of results that you can have or just do a streaming mechanism there okay a couple more random topics here so bogging it seemed like a good idea when I first started at Google writing services to just start logging errors and think hey maybe I'll notice that one day nobody reads blogs it should be obvious but it wasn't to me so I'm here to spread the word if you wanna if you have an error and you want to find out about it before your user reports that problem to you then don't log it use monitoring for that and the caveat here is that logs actually can be really useful if you are post-processing your logs and generating alerts or metrics from that but if you're not doing that then logs are great for debugging once you've detected a problem they can help you track it down but they're not going to help you notice a problem so just be careful what use your logs for so yeah instead instead use monitoring and I think it's really important use monitoring extensively have metrics for everything custom metrics just do it set up alerts it's a really good idea and yeah I mean we had we had metrics for everything on my previous team eventually and any kind of a any kind of a like unexpected error or something like that throw metric on it there are some monitoring on there and then you can find out about your problems before your users do and so we had this pattern that helped us remember to capture metrics for everything and it let us capture all the RPC latency x' and everything like that that we're really important to be able to measure but basically what we did is we embedded our server object inside of another struct called a server call which would also carry information about that particular call and then in our server handlers we would essentially implement the actual handler logic on this type and then our our outer server handler would instantiate one of these and then make sure to defer a done function that would actually record latency metrics and things like that and then call the real Handler and then the real handler was free to use this info method over here which was an easy convenient way of just recording information that was going on for that RPC with the context of the rest of the RPC carried along with that so like the name of the RPC and how long it had been running at that time and things like that so it's a lot of code I know but the idea is fairly simple so that's that's basically it this is how you can get in touch with us and find out more you can follow a GRP cio on twitter and visit us at GRP cio and you can subscribe to the Google group and see the repo and come contribute so yeah we have a little bit of time up for questions if you give me just a moment I'll uh I'll come my queue for your question hi yeah PC is the implemented using secure socket so UDP TCP did see anything it's it's built on top of HTTP to answer them it uses okay yeah okay so that'd be an issue you for rapid communication that come any of them scalability issue or any okay I mean there will be too many connections between side to side if you have no Corey to GRP see how one TCP connection yeah I mean it can reasonably support you know thousands of connections at a time but yeah I mean if you're doing local local machine the local machine it does work for that also but yeah you would use it between processes they're not between process you would use it between binaries so really you wouldn't really want to use it even between white threads probably okay so the gene by native means multiple threads initiating the deputy will be funded through the one pipe or visible yeah just in terms of like if you had multiple essentially services running on the same host then you could implement those as separate G RPC services and do communication between them there I use it for just communication within a process so I would use other like just use go channels or something like that instead I think so because I might be met I might not really be aware we did it process local or mod oh right right right in that case in that case yeah I mean where you have something that is a service that has you know request in a response then then yeah that would that would make sense and a performance will be determined by the kernel socket implementation for the tcp/ip right with some overhead for you know the HTTP implementation yeah I'm serializing your messages and deserializing yes so you're gonna pay those anyway I presumably thank you expect above take getting the questioning so proto isn't the only way to use gr PC you can basically send arbitrary binary blobs over G RPC but the way to implement it the you know is the easiest to get started with is to use per debuffs versioning okay ask me so if you control using robots only right yes that's true yeah there's no there's no other versioning system that we support now you have to you have to do that manually okay all right thank you thanks very much everyone
Info
Channel: CoreOS
Views: 32,496
Rating: undefined out of 5
Keywords:
Id: Z_yD7YPL2oE
Channel Id: undefined
Length: 45min 2sec (2702 seconds)
Published: Tue Jun 06 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.