Vladimir Dejanovic - Protocol buffers and Microservices - VOXXED DAYS BRISTOL 2017

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so welcome to my talk about to oops about the particle buffers and microservices so okay let's meet so my name is vadim rodionovich and i'm the founder and leader of f you have a user group and for my day job that i used to pay my bills I work as a chapter lead at ING and for those of you who don't know Angie's a huge bank and it's one of those banks which is like front-runner when it comes to technology and IT of course in the bank world so what's our agenda for today first we'll talk about the part of the time before the protocol buffers so what the problems we had how we solve them how we deal with them then I'll talk about the protocol buffers what they are for what they you can use them for what you should use them and after that I'll try to answer the most important question of them all and that should even care about the particular buffers and do you even need them do you care about them at the end there will be some time for questions so if you have any questions during the talk please wait for the end and then ask for them or you can always find me online or you know I'll both be fearing after it so in the old days in the beginning there was a message right well as soon as we have system a and system B and they need to talk in some way of course we need to send some messages over and to set the message of course we need to have some formats right some way that we know that both sides understand one another we need to have some type of protocol so that we are hundred percent sure that both systems a and system B speak the same language and the result of this of course is that everybody started writing their own formats and protocols some were good some are bad but there were a lot of them then if we fast forward to 1996 one very important event happens and that I started going to high school so now you know roughly how old I am either also important thing happened and that is XML or extensible markup language for those of you who like long names so Y X months are important well X most important because one very important thing and that's they're both humanly readable and machine readable and all particles before that mostly were either humanly readable and then they performed bad with computers or they're very good for computers but when you need to look at them during this debug or some walks you wanted to cry so with XML at least this wasn't case and XML really flight as a rocket it was really adopted very well of course it helped that some big companies said this is going to be a standard that some of the bigger vis a she said is going to be standard and something even that XML went places where it shouldn't go but that's other story and why XML is so gue wise flow so high is also very easy to see from this simple example so if we take any random person around this venue but who's not attending folks days they will also be able to understand this message right they will understand okay we have some attendee his name is John Doe and he's attending something in this dress and it's very powerful when you need to look at this message day out in the in like I said during the debugging and figuring out what went wrong other think why XML was also adopted wildly and why it's still used a lot is because of schema definitions this is the schema definition for the message before and why this is so important is that in this way we can guarantee the built system say and system be understood one another they they all spoke the same language we could use schemas to validate the messages to make sure that they are correct we could also use schemes to generate the boilerplate code and we all like to write it right so this was a very very very good feature for XML and then people circular yeah XML is perfect right well no if we again go back to this simple example we see that there is a lot of redundant information here and how much if we count the characters without pieces we send 159 characters but we're interested only in 68 in this case so XML messages as we all know starts to be bloated and explode very fast especially if you send a lot of data if you use full descriptive names like I used here then they can be extremely huge and this isn't a good thing so if we fast forward to roughly 2000 again very important event happened I went to college yeah the other important thing happened is the Jason was born or Crawford will probably say it was there a long time ago but let's go into this discussion again if you like long names this is one of the long names that Jason is known for but some people said it's not true but again let's knit let's not go there why this was important but again because it's both humanly readable and machine on machine readable exactly the same like XML and if we see nowadays JSON is everywhere absolutely everywhere I wouldn't be surprised if my fridge started talking Greyson to me and one of the reasons why is became so popular of course is the JavaScript right because that's from from where it came and web is now huge but also because what other thing and it's very developer friendly if you take a look and you look at some code in debug mode and you look some objects you can really simply just put them in the JSON and you will not lose any single thing of information right it's exactly the same and a lot of extra systems and databases did exactly this think again if we take the same example we again take the random person who is not attending walks days again they'll be able to understand this message and if we look how much data be sent in this case behind since 90 characters instead of 159 so it's much less data and when there is a lot of data a lot of messages sending back and forth this really is important think so we can say then yeah Drazen is perfect right Jason is the best because it's humanly readable it's everywhere so it's we should use it all right well there's a big problem with Jason the Jason does have schema and before some of you start saying that's not true let me just point you this there is JSON schema but it's not official so it's still draft it's still working progress there are libraries out there that can give you the boilerplate code that can validate the messages and everything but they are not standard so even if you use them it doesn't guarantee that they will work in the future if you want more information you can find it here so if we fast for some more time we come to micro service here right or how do people outside the ID call it the present time and what's interesting about this present time is that Monell is out monoliths are bad right we don't want them micro zine everything is micro service docker we have them everywhere also if you want further top to be accepted at conference you just need to put micro service in the title yeah now you know why I'm here but what does this mean well in a nutshell it means this in the past we had one huge system and this system did some stuff right but now with the micro services we have this a lot of micro service a lot of services who in theory should do the exactly the same thing but there are a lot of differences that actually and shift is that is brought but I'll just stay on one which is relevant to this talk and that in the past where we had wine micro service well actually one monolith and we had internal calls two classes two functions two procedures now we have this a lot of network calls so now is everything's on the network all and if we go back to the beginning of the slide what does it mean well network holes means messages means formats right so which form should be used with people are really very predictable we always do one same thing let's we do and look what we already know so what we already know XML and race right so the question is should we use XML or should we use JSON so let's dive a little bit so as we said XML is very secure because we can validate the messages and when we use access DS we can make sure that everybody's talking the same language also if the micro services are written in the different languages we can use access these to chlorinate boilerplate code so it's very very good if you use XML right and we Jason well we have fingers crossed we hope that all will be good that all the services will work good that everybody is going to make sure that all edge cases are correct that in case that they get invalid message or the message which shouldn't behave good that all the system will not crash so 1:0 for XML right but as we said XML has potentially huge size and if we have a lot of services they are talking to one another it means a lot of messages a lot of traffic so it's going to be a lot of data sent back and forth and it's not only problem on the network it's also problem on CPU because we need to the encode the decode we we need also memory for parsing so it's huge size for everything with the JSON well potentially we have smaller size we will definitely have smaller size but again with the Jason it can also be extremely big so in the case of the high traffic financial system you would probably use xml sense schemas right because here if something goes wrong people will go in jail and people who lose houses so you want security right but in here if you have for example like simple normal website where there's a lot of traffic but if something goes wrong people get to know it a little bit then you probably use raised right because it's going to be simpler and faster but let's look at the new contender the new kid on the block the protocol buffers so protocol buffers came from google and initial version of portable buffers were developed in 2001 but it was only for internal use the first public release that we could also use is version 2 and it came in 2008 in the meantime there's also version 3 of the protocol buffers and during the talk I'll mostly talk about - but I also point out some differences with Version three so what are the protocol buffers well let's first see what Google says well this is the longer definition right and first when I first saw this I was a little bit confused and the part that confused me most was this part thing XML so why I was why I was confused well I did story so say backward I first started working with the protocol buffers writing software and designing the code and working with it and then I actually read the definition so what they already knew is that with protocol buffers you have schema actually proto files that you use to generate the messages and the boilerplate code and which are used to validate the messages but the messages itself the protocol is binary and in a painful way I found out that when you debug things and something goes wrong and use binary format it's not really nice and when I think XML I think humanly readable something that you can very easily see what's going on what's in what's out so that's why for me it is too confusing it was completely opposite of XML but then I realized actually one very simple thing and that is in this situation that we already mention right where we have a lot of services which are talking with one another potentially different programming languages if you have schemas then you have security that everybody's speaking the same language that all the messages are well valid and you can double check them and again if we just look at this image and forget about the Mac services for a second this only means there's a lot of service servers talking with one another and if we again go back into the past in 2001 Google had a lot of servers they were not micro services but they were a lot of servers so for them in that case they needed security they needed some way to validate the messages to make sure that everybody's talking the same language that they can generate the boilerplate code in a good way and they approached this problem by having the proto files but the problem with XML which is like bloating the messages they solved by creating this very clever binary protocol or you could actually encode in decode messages in very fast way and they will be kind of zipped so that's why they said Fink XML but smaller and faster now it makes sense right so in order to actually do start working with the protocol buffers first thing you need to actually create to download the protocol buffer compiler then you need to write your profile and you need to write some code like this in command line to actually generate the code in this case I'm generating the Java code so let's look at the profile this example of a proto file which will generate exactly the same messages like we already saw with xml and with Jason and let's go line by line for it at the beginning like I said you first need to define which version in this case I'm using proto - so if you want protocol buffer version 3 you just put here protocol version 4 to 3 but be warned they are not compatible so if you have file which is working with part 2 - and you just change the number 3 you will get a lot of errors so that's my main reason why I think which parts - as we said already we are using part of files to generate the code in many languages so you can generate code in c-sharp in Java and go in Python in many languages and again we know what we all know is when you generate lots of code there can be collisions there can be problems so it will be good it's a good thing to put a package where we want free code to be generated so it will not collide with one another right this thing here then can look like an overkill right because we have package above already defined so why this part well this part here is Java specific so with this we say ok all the code generated will be in this package but then if you want to have some specific things for certain languages you can then something like this and in this case I said ok I want for package to become box Bristol module and I want in this class for all the code to be generated then we'd go with a message attendee and then where we say ok we this is going to make me a message then this is the first part of the message I assume that you all know what's required meets all the you name probably didn't serve the protocol buffers and here is the first difference between protocol buffer version 2 and protocol buffer version 3 protocol buffer version 3 doesn't have required anymore it's removed and once it's removed is again first time I saw it I was confused because again we said we want security we want to be making sure that our code is compatible it's in the future also backward compatible so require is very good because then we know ok this field will always be there it will be valid message so why remove it well it turns out that once we put something to be required it has to be required until the end of the time forever you will never be able to remove it so if you see in the future that you made a mistake and you edit this field and it shouldn't be there you can't remove it anymore and actually what Google said is that they did a study looked how much good required brings how much bed it brings and that in their use case it turned out that it was more harmful than useful so that's why they removed it then of course we define the type there are some already predefined types like string integer boolean's things like that come things that you can use out of the box but also you can define your own types like we'll see later then we put the name of the field and take a look here it's like first name underscore name right for some languages like Python or some other it's common practice but in Java we use camel case and here is actually where the proto buffer and protocol buffer compiler come to rescue because it knows that to which language we want for to generate the code so it will take names like this and convert it to correct ones for the language at hand so when you write a protocol buffer proto files write it in this way and the protocol buffer compiler will generate the code in the correct way for your own language then we have something cryptic here right number one this actually is the tech and this is the part per protocol buffer shines so wise and shows its clever way because these tags are used for actually encoding and decoding messages very fast making sure that all the messages are there and also checking the validity of the message and this tag has to be unique so this is unique for this field forever once you put certain tag you should never reuse it if you also remove certain field you should never use that tag number also numbers from 1 to 15 take 1 by class 2 into encode so they should be used for more frequent fields and the 6 in the above for less frequent fields here of course we had again required string class name and then here we actually defined message inside the message so why did this well as we remember we had the 10 D and he was attending something and on some address and that was in XML and JSON the sub object should say in this case I am defining the specific type of a field type that I will be using later in the message and in this way I'm hiding inferred world so this message at nd type will not be able we will not be able to use it outside the attendee so only inside the attendee we can use address if I took this block of the code and put it outside then I could reuse it so again this way we can hide things if we don't want to share with other people again in the clever way then here we actually say ok again optional right again you'll guess what option means that we in this case we have optional address and it's tag free again optional exists in proto buffer - it doesn't exist in portable version 3 because importer buffer version 3 all fields are optional by default so you don't put required you don't put optional they are removed and all are optional by default so some things that you need to know about the generated code in case of generative code we get the messages in the classes right and they're immutable so it means we can't change them we can only create them we can set value in them well again like when you first read something this you can think of it's not very useful right what can I do with something which I can't set well for every class we also get the Builder and actually the builders are used to generate the messages so you can set values here you can read the values here you can also you have getters setters read/write but then when you call build you get the message and that message is immutable in order to read data there are two options for Java of course parse from which gets the byte of arrays or parse from and you get the input stream and that's it that's the only way that you can actually read the protocol buffer message and if you want to write something you get two messages two byte array and write two again output stream and they're part of the builders as I said as a result you get the message so backward compatibility is do's and don'ts well as we already said we can't change the tag number of any existing field because if we change it then we are going to break all other people who don't who are using the code generate with the old part of our files because they're going to look at the tag number two and then go to see field which shouldn't be there they're going to say this is incorrect so once we put some tag number we can't change it also we can't add or delete required fields because as we said once something is required it's required until the end of the time so we can't change that what we can do is we can delete optional or repeated fields because they're optional right so it's not a problem if it's if they are missing we can add a new optional or repeated fields but in this case we need to make sure that we use new tag numbers we can't reuse the old ones and it's very important okay so let's go for a little demo so I have some code which is generated already for exactly the same protobuf per protocol but particle file as you saw and actually what they have here is very simple spring build jersey application with some end points and if we go here we can very easily see we have some endpoint for XML we have some endpoint for Jason we have some endpoint for text and we have also one endpoint for protocol so in this case what we do we create builders on as we said on a builder we set values under ten for this for address then I'm set values on a 10 D here as you can see we say set address and that address dot built in this case that build on address will generate the address message then we can put as a setter inside the attendee builder here then we actually create attendee message we send response type and actually write message back so if you actually looks if you call call yeah if you call this endpoint we get some text right then if you call endpoint for JSON we get same Jason then oops stick oh yeah sorry yeah yeah the palms I don't see on my screen that's the yeah so if we hit xml point we get xml point and then and we can all read this right so it's very simple we even if we don't know what the type of the Mestre is what's - if we just hit the URL we can see the date' very easily but then what happens if we hit protocol buffer yeah not very useful right if you see something like this you wouldn't know what to do with this how to how to go from this actually to the profile itself and that's things like I said why I was confused first time where they say like think XML because I can't understand this and when I see XML I can understand it also if you open the file which is if you go and open the file which is generated by the particle buffer and look at it you'll probably be scared because it looks crazy there are a lot of things here which not very useful in but then again this is not really a tenant for reading if you want to read it you can but again like it is that you're going to just generate the file and use it you're not going to change anything here and that's even at the top of the file warning do not edit so yeah I would follow this advice so conclusions right as we saw the JSON and XML are very humanely readable in human friendly and this is very important thing especially when you're creating something and if something goes wrong then you're very happy for it you can I said you even if you don't know anything about what type of messages you need to you are going to receive we're going to get you can just hit endpoints and randomly do things and see how it behaves and then create your code around it with protocol buffers you really really need port files without them you're going to be in a lot of pain and you don't want to go there also the set protocol buffers is the binary format so it's very compact it's very fast it's very good for size also for CPU usage and memory and networking things like that but then again Jason is absolutely everywhere everything understand Jason today not everything understand the protocol buffers so if somebody would ask me should I use protocol buffers or Jason I would say in the case you're already using Jason and you're happy with it I would say with it because Jason like I said is understood by everybody all the system supports it you can very humanely figure out what's going on and again the size issue isn't that big problem with the Jason because with the Jason I mean a Jason if you go with the RAW format it's much larger than protocol buffer but if you zip it then you get still protocol buffer wins but not for bet that big margin so then it comes back to the question actually how much messages you send if you're saying gazillions of messages then yeah probably maybe it would be useful to move to the protocol buffer anyway but if you're not sending like enormous amount of data then maybe adding the protocol buffer word just add the complexity which you don't need so with the Jason at least you know every system will support it every system will know what to do even with the zip trace them out of the box you don't need to do any magical think also one more thing which is very important in this case is like we already said like for a front part the Jason is the king right all the JavaScript files no JavaScript know how to do deal with the JSON there's also a way to actually do protocol buffer and JavaScript how is your good it is I don't know I didn't try it so but again like what I saw is that most people keep at least the human facing parts of the application in JSON or XML and don't expose particle buffer and use particle buffer usually for internal things which are inside your own network again protocol buffer which is XML in most cases I would say you can go with a protocol buffers because as we saw the most benefits that you get from the XML that's schema validation boilerplate code you get all of this from the porter buffer and also you don't get the problem from the XML that the data can just explode but the protocol buffer is going to be much smaller inside so it's going to be much better but again think we're if it's customer-facing if the customers don't understand the protocol buffers if you don't want to expose port to files with them and if they need to actually hit it by with some way in understanding it in a normal way then you can maybe just keep that layer with XML and again keep the particle buffers inside also this photos from the Google and that that the protocol buffers are not always the good replacing for XML so in the case that you want to send the data which is text-based and with a markup then the protocol buffer will probably not be the good choice in that case you're probably much better staying with xml's well that's about it so any questions Helly I was just wondering if you're starting from XML or Jason if you'd consider using EXI instead of protocol buffers sure but I know that there is a patchy Street and that Apache thrift is similar source similar problems like particle buffer I know there's like flatbuffers some other particles but again I didn't I don't have experience with them I use XML JSON and protocol buffers so I don't know about 3c standard for transferring XML as compressed binary and there's one for Jason - alright okay thank you do you have any experience with the differences between protocol buffers and something like captain proto I mean there are some other variants too no like I said I read some articles but that's not really really real life experience so because I'll again you can go on the website and read something and people say ok yeah we're better we are faster but again again when you go and first time read about the protocol buffers it's all it's like all it's cool all this perfect Google is using it it's it's just solves all your problems and then when you start writing some code then you end up in some weird places you you really start hitting them so it's that's why again like that's why usually you really can't give any real comments because also what at least for Apache thrift what they saw is that it's built from a different way because it's built for example for that one is built from a way that again for remote calls so there are multiple protocols multiple ways that you can call remote things on different machines while the protocol buffer is from the other side they started as far as I saw as a simple message protocol and solving problems of XML and just making sure that at least it doesn't blow that takings on the water date and then start to look into other ways of fixing it but for example I also read that thread buffers again they're really concentrated on only the messaging part and just compressing the data but again like real life experience I have only with JSON XML and protocol buffers I was kind of surprised too when you showed the the output of the binary protocol that the text was was visible I sort of assumed that it would be encoded more efficiently than just you know standard text either compression or some kind of encoding oh I know maybe somebody happened I know because I just hit it like simple with the simple browsers but again as far as it so they didn't really do a lot of crazy things they did some clever things and actually what also what I read is that they start with initial idea from actually using the structures from the C and just actually how you actually encode that in a in a smart way so it's compact but again the tip in very fast go for it and don't have to like unzip all the date blow create a billion of objects in that you can just very fast in very simple if as we go through the protocol next we'll see what with it is that's actually why they used to text that's why the texts are very important if you to if you removed or replace them you can crash the whole protocol so it's but again it's it's not completely cryptic but again it's not really user friendly especially if something goes wrong just a quick question one is the expectations in terms of performance say over my point of view over xml that one would expect right off the bat and two are the protocol buffers live is available for c++ was initially developed for c++ so the c++ is actually the starting point and that's why also they start using with exceed structures so if you're with C++ you're in perfect place yeah and yeah it's again like I try to find some benchmarks but again like everybody does their benchmarks in different way and what they saw is that in most cases particle buffer is usually one of the best again some companies claim that they have better protocols but again like I it definitely kills the order performance the XML interation that's without without the dot XML especially so the question is can you write the messages itself or the boilerplate code okay so the question is can you write because with JSON XML you can sign right messages your oh my hand but can you write them also with the protocol buffer my answer would be I have no clue I would guess that it is possible but it's going to be extremely difficult extremely painful and something that I would never try to do so yeah you can do a lot of crazy things but it's you gotta get like in the end you are getting the binary binary protocol so you need somehow to fix the ones and zeros and the tags and you would really really really need to understand the format very good then again like white border yeah yeah but again look the best yourself again like because again okay I'm naturally Einstein question cane because then is your assumption that you have a porter file or you don't you don't have a profile okay and then you don't use the protocol buffer compiler but you would like to use your own compiler okay and that's that sex one of the painful painful moments that I had with the protocol buffers because again I needed to write the client and the server and I really need to write the client because I just need to want to test his service is working good and then something didn't work and I was moving back and forth so at that time I didn't find simple solution but I'm sure maybe in the meantime there is some because again like it really is moving fast so also when I first time looked at the protocol buffers there were no JavaScript support now there is JavaScript support so any more questions male I know I'm talking I don't know if you mean like some github or simply there's some that way or maybe like these deals with the XML sorr yeah what what we did in our product is we just put them in with the source code so I can like that like I did here so I decided as a source code so whoever wants it in is here but in like you can always put it to some conference page Internet I don't know if there is again in most cases you use protocol buffers for internal communication inside the company so in that way usually the way that you already would send the XML schemas and physical that you would do exactly the same thing maybe just have one bit clear trip off all the files there who's the source one way that you can do it but I don't think I don't know any standard way out of all on top of my head if if there is one I don't know about it so any more questions okay okay thank you [Applause] you
Info
Channel: HBB
Views: 2,886
Rating: undefined out of 5
Keywords: VDB17, Voxxed, Bristol, microservices, Protocol Buffers
Id: oDIkgNchrVY
Channel Id: undefined
Length: 40min 9sec (2409 seconds)
Published: Fri Apr 07 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.