REST APIs for Microservices? Beware!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I was working at a Logistics company that processed millions of orders a day. Messages came in via FTP and were processed against multiple remote methods. Each method kicked off billing events for each process. If there were 10 steps to ingest a message to shipping and final billing, we had very long running transactions with a lot of failures. Often our failure rate was 20% which required the process to "restart" at the last failure point. This was a mess.

So we moved to microservices that monitored Azure service bus with transaction support enabled and durability. This allowed us to roll back in the event of a problem and if processes were running slow or offline, the durability ensured the messages were still there when the services restarted or caught up.

Sure 80% of the time messages flowed freely through the service bus without problems, but that 20% is now rolling back and a log is generated and a queue setup to "reprocessed" these messages or fail them and notify someone.

It takes discipline to set up the service bus correctly and to ensure the microservices are participating in the transactions, but we cut our failure rate down to .01% when we did this.

Just another suggestion. I'm sure you could do this with RabbitMq or NServiceBus as well if you aren't on Azure or in AWS or self host.

👍︎︎ 15 👤︎︎ u/katghoti 📅︎︎ Mar 17 2021 🗫︎ replies

How do I get into orchestrator when I am not using Docker or kubernetes?

Can I use RabbitMq or azure event bus?

Also should we send the full payload or just the Id and let the api query the db?

👍︎︎ 7 👤︎︎ u/AbPSlayer 📅︎︎ Mar 17 2021 🗫︎ replies

I wonder what would have happened if I made this

👍︎︎ 1 👤︎︎ u/holoedregq 📅︎︎ Mar 18 2021 🗫︎ replies
Captions
if you're starting a new project or moving an existing project to a microservices architecture how are you primarily communicating between services are you using a restful http api or maybe grpc before you commit to that i'm going to explain the complexities you're going to have to deal with in terms of latency and availability when using request response in a distributed system hey everybody it's derek martin from codeopinion.com if you recently subscribed to my channel thank you i really appreciate the support if you're new to my channel i post videos on software architecture and design so if you're into those topics make sure to subscribe all right so first let's compare this to a monolith and how this would typically work a monolith you're going to have a single process that's likely going to have a single database and regardless of whether you have boundaries within your monolith let's say for example we're creating an order and that's within sales and it's interacting with the catalog maybe billing has to create an invoice record and the warehouse portion has to update the quantity on hand for what we just are about to ship out but all that's done within a transaction so if there's a failure at any point through any of that interaction between billing warehouse and sales it's atomic it's either going to work or it's not going to work so we don't have to worry about any inconsistencies of data of like portion of the process working it's all going to work or it's not all going to work it's either going to succeed together or fail together so the first issue now is if once we move to a distributed system and have all these services each service has its own database so we can't have a single transaction to handle this so for example we call the catalog to get some information then we make a synchronous call over say again maybe http to billing saying okay create this invoice for this particular order it returns okay 200 okay that it worked then we make a call to the warehouse to create the shipping label and maybe to allocate the quantity that we're requesting for particular products that are that are on our order what happens if that fails maybe the service isn't available or there's actually just a 500 error something occurs that we couldn't actually process that request well now we need to go back to billing to tell it okay that invoice we created undo that don't do that we don't want to create that inv invoice anymore because something else went wrong but what happens if that fails now we have an invoice that we've created that we haven't allocated the inventory for and we're in this really inconsistent state so the problem now is since we don't have a single transaction we need a distributed transaction one solution of that is having a transaction coordinator that can do a two-face commit there is another solution but before i get to it i'm going to show another problem so the second issue we need to deal with is latency so we have a client that's making a call to sales and let's say that the sales service is using hp to make a call to the warehouse for allocating product for our particular order that we're creating so we make that call but do you know how long the actual timeout is in a perfect world let's say that the warehouse for that particular uh request is a couple hundred milliseconds happy path it's all good we don't really care we just know that the warehouse does what it does but do you set the default timeout all the time when you're making http calls do you know what the default is for the c-sharp http client if you go to the docs and you look up the http client timeout you may be surprised to find out that the default value is a hundred seconds so when sales need to make that call to the warehouse all of a sudden if that warehouse has some performance issues availability issues where it's taking in requests but they're taking a long time to process you could wait up a hundred seconds before you get a task completed exception because of the timeout but it can get worse because if you're kind of in the sales service and that's the part that you're working on and you're just saying okay i just need to call them make to the warehouse you have no idea what else is going behind the scenes of the warehouse you can't see a stack trace that shows this you're not in process what happens if the warehouse makes a call to billing and just like i said before that on the happy path works so great and then billing also needs to make a call to the catalog you again you have no idea from the concept of sales from that particular view that all this is occurring and now you have this call chain over hp from service to service to service so how's that going to add up with latency so if that call from billing to catalog let's say took 200 milliseconds and then the call from the warehouse the billing took 200 milliseconds now we're at a 400 millisecond round trip so far and then let's say just for example it's another 200 milliseconds so now we're at 600 milliseconds just to get from the client to sales and from our sales point of view all we were doing was making a call to the warehouse so we decided to add a timeout in our request from sales to the warehouse but the problem is is that the warehouse might not actually even be the problem and we don't know that because we don't know what's going beyond the warehouse and what it's doing but we know that through illustration right here that the warehouse was calling billing billing was calling the catalog so we add our timeout and then it actually turns out that the catalog's actually the one causing the issue because it's having a terrible performance and maybe the length of time it was taking for billing to call the catalog is adding up to say 700 milliseconds and then the warehouse the billing was adding another 200 to that and let's say that we had a timeout of 500 milliseconds well we're going to bail out and it's not really the warehouse's fault necessarily it's that farther down the chain is really where our issue is of the catalog so if you're creating services and you're doing request response to communicate with other services to me the problem is you really turn this into a distributed monolith and it's actually in a lot of cases in my opinion worse because if you're really just taking the same concept of what you had in a monolith where you have all these interactions they went from in process to outer process over the network let's say we have sales that communicates to the warehouse the warehouse communicates the billing everything communicates to the catalog well what happens if the catalog has an issue if you have no recourse if that has a particular issue then the reality of it is everything's on fire nothing's going to work so if one service is down all the services are down so one solution to some of these problems is directly because of the request response in order to get rid of this request response we need to move to asynchronous messaging so one way to be doing this is using a message broker using messaging with events and orchestration to get rid of the temporal coupling you have and kind of embrace the asynchrony of the whole thing so i've mentioned this before in some of my videos so check some of the videos related uh to the saga video that i have this is what some of this is related to but when you place an order we send that to the message broker the message broker we have an orchestrator that's kind of dealing with our long-running process and all this is asynchronous so there's no request response going here we're just sending messages to a message broker and our message broker is then dealing with the other services uh and we're not going from service to service directly so we have our bill order billing does say creates that invoice then it sends a separate event back that our orchestrator is dealing with in sales to say okay it was actually billed then we go to create our shipping label in the warehouse to allocate our product but the warehouse maybe isn't available or there's an issue there that's no concern to us in sales that message will just get stacked up and as any other messages that actually come through they'll just get stacked up as well and once the warehouse becomes available it's back online and processing messages quickly it'll consume all those messages basically that are in its queue so is asynchronous messaging the golden hammer that solves all problems no of course not it's trade-offs like everything else but it does solve the problems that it outlined and when you're doing request response through services and dealing with the latency that might occur and the availability that's going to cause issues and how you're going to deal with all that can get very complex you kind of need to break away from the temporal coupling of the request response and just embrace asynchrony another thing to look at is boundaries if you have services that are making synchronous calls to other services to for example get data then you may want to look at your boundaries to see if they're actually correct you want services to be a set of capabilities that be the authority of a set of capabilities and behind that is the data that it owns if you have to go out to another service to get data maybe you have some boundaries that aren't aligned correctly so the last thing i want to touch on is the reality that you will need to do request response in certain situations so let's say that my client here is some sales um service and i need to get some currency exchange from one currency to another so we have this primary service let's say it's a currency exchange where i pass it hey i have one us dollar i want to know how much that is right now in canadian because that fluctuates and it changes and that can't be messaging we need to get a response immediately from that so whether we're talking about something being available or it being slow and having an expectation of how long how quickly that's going to return you either you have a couple options one is if there is a failure can you still process whatever your request is without getting that result can you be resilient to that service being down if you can then great if you can't then you need to have the expectation that if the primary service is down that you can immediately go to some other service that can give you the result for you so always kind of have a fallback is if something fails then immediately go to the fallback if you have some timeouts set up that's some reasonable expectation where you're expecting it to be done in 500 milliseconds and if it doesn't immediately go to the secondary service that's your fallback and then like i have in quotes here is possibly failover you don't want to for every request have to check the primary oh it's still down go to the secondary have some failover in place that you can just immediately keep going to the secondary until you know the primary is back up and then you can flip back over to it so when you absolutely have to do request response and it's critical that you actually have a value from that service make sure you have fallbacks if you're going to go the request response route with communicating between services realize the issues that you're going to face with latency and availability especially when we're talking about long-running processes or workflows where you need to communicate to multiple services and if anything goes wrong you need to either have some distributed transaction or two-phase commit or your own complexities of trying to sort that out when failures occur to me services should be autonomous and i much prefer using messaging and a message broker to communicate between my services if you're interested in more on that make sure to check out a bunch of my videos on sagas event orchestration and choreography if you have any thoughts or questions make sure to leave a comment if you enjoyed this video make sure to give a thumbs up and of course if you haven't done so already make sure to subscribe for more videos on software architecture and design thanks you
Info
Channel: CodeOpinion
Views: 30,377
Rating: undefined out of 5
Keywords: REST APIs for Microservices, software architecture, software design, cqrs, design patterns, software architect, asp.net, soa, microservices, message queues, kafka, event bus, event driven architecture, azure service bus, rabbitmq, distributed transactions, service bus, mass transit, message queue, message queuing, messaging patterns, service oriented architecture, microservice architecture, domain-driven design, enterprise service bus, rest api, rest apis, http api, Swagger
Id: _4gyR6CBkUE
Channel Id: undefined
Length: 11min 48sec (708 seconds)
Published: Wed Mar 17 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.