Compatibility Regression Testing for Apache Pinot

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone uh thank you all for joining for yet another apache pinot meet up um we have an awesome topic today uh compatibility regression testing for apache pino with two amazing speakers um and i will introduce them now um so our first speaker they're actually going to be presenting together we have someone doing presentation and there's also going to be a live demo we love live demos um so the first speaker is ning lang he's a software engineer at linkedin um on the linkedin's pinot team where he supports pinot use cases at linkedin and tackles various platform challenges mostly focusing on end-to-end tests metrics and provisioning previously he worked on cluster management as a software engineer at vmware and he's also a code contributor to various open source projects like kubernetes and etcd um cheryu gandhi is our other speaker today she is she recently joined linkedin's pinot team as a software engineer and has she has big data tech experience such as spark kafka and nosql database real-time analytics and cloud her work in pino currently involves involves solving platform challenges like metrics and provisioning tested testing framework and raw data compression support so i'm going to bring them both in welcome share you and ming hi guys hi everyone is doing well yeah feel free to chat in the chat window also um if you guys have any questions during the presentation feel free to ask in the chat and we can absolutely get to it at the end during q a and if you are watching this on on demand you can also join the pinot slack channel um where you can also find cheryu and ming there and some other folks who are also pretty knowledgeable about the topic that can also help i will post a link to the slack url in the chat as well um and i think that pretty oh if you like the video give it a thumbs up and make sure you subscribe to the channel i always forget that for some reason um all right cool so um share you and ming i'm gonna let you guys take it away i'm gonna put myself backstage and uh yeah and we'll see you guys during the q a good very good morning everyone and thank you for joining today me and my colleague ming we are going to talk about compatibility regression testing for apache pino as part of this presentation what i'll be covering is motivation for designing this test framework we'll discuss about design and architecture we'll discuss about the test framework as an holistic view and test suit we'll also see how you can build your own test suit and mung will cover the sections for demo and we'll also discuss about future work and opportunities in this project let's discuss about the motivation behind creating this test framework we know consists of controllers brokers and servers sometimes it happens that pino version upgrades or roll backs can lead to incompatibility issues this results in downtime or service disruption where users are not able to query their data and components are not able to interact in order to mitigate this problem we have designed a test framework where users of pinot can use them and they can verify if the incompatibility if there are any incompatibility issues during version upgrades let's move towards the architecture the regression testing in the test framework are done for both offline as well as real-time tables for this regression test data can be ingested using some set of version of components controllers brokers and servers and the data that has been previously ingested can still be queried we'll discuss more about this in the design section also in oss we see a sample test suit this sample test suit consists of support for offline data push via csv data and for the real time we provide a kafka setup in the real time we also make sure to leave some rows behind this is to make sure that we test the re-consuming segments after restart there there is also a support for custom configurations for controllers brokers and servers say if you are trying to push a new commit to apache pino you can test your commit using this sample test suit against any of the commit present in github or against any previous release moving towards the design section as you can see in the diagram the diamond boxes are used to represent the set of operations which are to be performed after these upgrades for different versions of different components of pinot to explain this in a bit say you upgraded your controller and you want to verify certain results against this upgraded controller so here what can you do you can upgrade the controller you can push your data and verify the current results now say you have upgraded your broker and you want to test the data both after upgrading broker as well as the previous data which was ingested during controller upgrade so in this way you can verify all your results this is to make sure that all different versions of different components of pino are tested together for compatibility issues now currently in the github workflow compatibility testing is part of every pr every pr is tested for compatibility against previous commit as well as against previous release along with this we definitely support and encourage you guys to build your own test suit this test suit will be associated with each installation of pino and you do not have to write any particular code for it you'll get all the sample yaml files present in the sample test suit and this test suit can run in your specific environment so your all your custom configurations all your secrets everything is okay if you want any extra operations to be added into the test suit please file an issue will be happy to help yeah so let's discuss about the test suit so this is a sample test suit will definitely get into more details when ming shows you the demo this is currently present in oss it consists of two sections the first is configuration and the second is yaml files in configuration all your data all your queries query results and table configs and other information is present the yaml files actually represent the set of test suite operations which are to be performed on your upgraded versions and or downgraded versions of different components so for example say there is an operation of table op so you can create a delete real-time table or offline table for segment op you can upload your offline data for stream opeli or stream op you can upload your real-time data for query op you can run and verify your queries now to give you a holistic picture of how the test framework works as you can see in this diagram the right section of it represents our pinot components these components are having different versions so controllers brokers servers and their different upgraded set of different versions we also have a support for kafka and pinot data for offline as well as real time the test suite operations are those which i mentioned just now in the yaml files where you can create delete and test against different versions of the controller broker and server and this test driver will be used to run them all now with the presentation i wanted to hand over to ming to actually show you how this works in current github ming you want me to share your screen oh yeah okay so hold on he's going to share his screen in a moment cool ready you let me know when you're ready to bring your screen in that was awesome sherry thank you for you know taking the time going over all this may you let me know when you're ready and like i said for those watching if you have any questions in the meantime like feel free to ask um we do have another uh person from the linkedin pinot team here as well who's also gonna be helping answer questions subu who presented yesterday on uh optimizing performance in pacupino um so he will be joining us for the q a at the end as well but if you have any questions feel free to ask during the presentation we'll just be go we'll be taking them in order as we go let me know when you're writing yeah i think i'm good great okay yeah so first uh let's go through a successful a positive case uh so as as you know now the compatibility regression testing is enabled on github so for every pro request we will have uh we will run the compatibility regressed against the commit to its previous release and its previous commits so that's uh randomly picked up bigger pr and you can see and now this this tool comparability regression testing is enabled let's click into the log yeah we can see that uh the first uh uh will check out uh your current commit and the and then this task was against release 0.7.1 so it will check out your current commit and check how the release the previous release build them and you know i build them by the way because uh building is uh pretty uh time-consuming so we do some optimization uh we will uh build the the previous list and the current commits in parallel so after that building is complete uh we will run the cast against the default uh tests test suites but the user can provide their own uh customized test suite tool so first it will start up the cluster using uh older using the component to build from the old release so basically the first start zookeeper wait wait for that to keep the services ready in the start controller start broker and the start server after that it will basically create a table put some segments and run some query against that table and it will also push some uh as well as offline table it also will create some raw time table so it will create a top topic and push some stream and run some queries then after that we try to upgrade the controller again wait for the controller be ready and put some segments some stream and run some query against the table then upgrade broker wait wait for broke be ready push segments and run queries and update server after that we will try to downgrade so downgrade the server put some segments on some query and the downgrade downgraded broker on some query and finally downgrade the controller and run some query so if all the steps are succeeded then the test will pass otherwise it will fail so that's let me also show a negative case so for a negative case let's think about this case assume that we bump up forward index format from v3 to v4 so we stream is a current version we support so if we bump up bump up the version right so the new server can read both v3 and the waveform but all the server can only read the way three so it's it will be good when we upgrade but it's incompatible when a server roll back so here is so let me show you how we change the code so here is a uh the change i made for this uh negative case demo so basically i add a new version uh let's call it the demo version which is four so previously the default version uh is two we support three though but actually be used use two so now as as we bump up the version four uh so the new server uh the new server which which will contain this commit so you can understand version 4 format but when we downgrade it so it cannot uh it will have some issue we need to load the segment the index generated by the new server so so when we downgrade it it will have have issues uh since the time uh the build is time consuming so i built it previously so now i can show how it got failed so basically you can using the script to check and the build and to to build the release so you can specify a work dlr so the work of dr means that the code will be checked out into this this dlr and build the old release and the new release you can also specify some like commit lumber so for example you can you can specify that old commit hash in the new computer hashi uh apart from hash you can also specify the tag release tag so if you know to specify uh the committee hash then currently has well uh assumed so here because i'm in that com com compat demo branch so i don't don't need to specify the current commit so i use this command to to build that to release then we can run along that the no no no actually i i i built the two release into this directory now i can run the content ability test against this work at the era and i can specify the test suite yeah so it's first using the older release field to start the keeper and start the old controller with the folder controller be ready start broker wait for the broker be ready and and then you try to push some segments run some queries then we will try to upgrade controller now you can see that the control will start the new controller wait for the new controller be ready call the controller is ready then it will push some segments and the wrong queries then start upgrading broker wrong queries now now we will try to upgrade the server oh wait for server ready is a little bit time consuming so at the same time i can show you how how can ho ho how to write a custom my uh sample field so basically for each step we define uh you can define some yaml file too fast too fast specify uh what kinds of operation you want to conduct for example the poster uh let's say uh pre controller upgrade yamu so before controller is upgrade we can run some operation for example create a kafka topic create a table upload some segments into into the table right and then in the final and the last to run some queries against the table so basically uh for each step you can define some define some yaml files which contains that operation you want to conduct and you can provide your own table schema and the table data and your customized queries so basically the compatibility test conducted uh in undertrained way so it will verify that you can create a table for segments and the query result was expected oh cool call that the test has complete uh you'll see we upgrade the server with the for the server after the server is upgraded we uh we put some segments run some queries it works well then you try to downgrade the server so after downgrade then we have we have issues because the previously the old server will produce a segment with new new index format but the old server cannot understand the new format so so we have some issues so the query result is not as we expected then the test failed so now we are just test it in an entertaining way so basically we only verify uh create a table push segments and the wrong queries and there are a lot of things we need to do for example we currently don't test the minion component and the controller has some api for example you can list the wall tables you can fetch metadatas uh for file table you can uh list all segments for a table et cetera et cetera and those apis hasn't we haven't uh tested yet so we need to address address them in the future and in terms of the query itself we also have a lot of things need to do for example with it we're having a test balloon filter yet we haven't tested no dictionary column we haven't tested off keep memory for consuming segments in server and et cetera et cetera so we have a issue to track those to-do's so if you guys are interested in our contribution is uh welcome yeah welcome world contributions things so next is a qa session do you have any questions nice awesome hey thank you guys so much um i'm gonna add subaru to the stream as well um if anybody watching um if you guys have any questions we do have some questions that came in i'll just put them on in a second but i'm gonna add subu um hey suvu thanks for joining looks like you have a rockstar team over here i mean i'm not surprised but that was really awesome thank you guys so much for taking the time to you know put this stuff together this is awesome um i know it's not easy to prepare for presentations okay so um we have a couple questions from ken that came in um so uh our pinot releases built each time um he said he missed a bit about the about an optimization to avoid the long build times uh yes i think currently the previous peanut release was built every time um we conduct that over the maze optimization uh so each time we need to compare the current uh release or current commit against the previous release or previous uh release of previous commits right so we build these two release or commits in parallel but i think uh you point out a very interesting uh topic so for example if we want to compare each uh commit against its previous release then i think we can pre build the previous release and catch it instead of [Music] building it for every pr awesome thank you great answer yeah yeah yeah that's a good optimization though but from that time time point of view i don't think it will save too much time because we need to build the current committee anyway right so at the same time we can build that previous release um got it um that's a really good answer thank you um it's more elaborate that i expected but it's very insightful um some of our tables have replication uh greater than one does the framework framework support multiple server processes oh so i think you may mute oh i can hear you i'm okay uh yeah sorry about that so at this time though we don't support multiple uh uh server processes uh it spawns one server and then it does uh whatever it needs to do most of the things shouldn't depend on replication in terms of upgrade issues like file formats and protocols and stuff so we should be okay with setting the replication to one and for your tables and doing the testing there got it thank you um it's good to have you here to add your insight thanks for joining for the q a yeah sure of course uh okay uh if you upgrade just the controller aren't there situations where pino requires controller version equals the broker version or broker server versions to function properly so um no the short answer is no we don't require that the two versions be the same um what we what we what we do is as a practice is to upgrade brokers uh sorry controllers brokers and servers in that order minions can actually go in in any order but we usually upgrade them at the end um so as long as you follow this order and reverse the order when when you are backing off you should be good and we make sure that the compatibility uh is is there in that the new controller version and old broker versions are okay and similarly the new broker version and old server versions are okay to talk to each other awesome cool um so if anybody has any other questions please ask now before the presentation ends and then you i mean it's not like they can't get to you guys but um and also comments what you thought about the presentation um i'm also going to put the link in here for the slack for the pinot slack if you guys have any questions if you're watching it on demand or watching it later sometimes there's a little bit of a delay ken had another comment i thought it was a rolling upgrade uh where there were tests done in between each component's upgrade correct so so there is you can specify your test to be done at each stage like what uh sure you presented in the slide so you can say here is the setup that i need to do before before the before the upgrade process is started so that's your pre-controller and then yes there is a rolling upgrade off first upgrade the controller do something and then the broker and do something the idea behind this is uh in a live installation it is impossible to upgrade all components at the same time and if you do you're taking a down time so so then what we do is we upgrade a bunch of controllers for some time we watch it and we make sure that everything is good and during this time millions of queries are going through pinot at linkedin definitely with the controller running the new version and the broker and servers running the old version and then we do the same thing with the brokers and then there are hundreds of these so obviously they cannot be done in one go so you do some brokers are running new and some brokers are running old and so all all of these should work but at no time do we want that the server is upgraded first and then the broker is upgraded that compatibility we don't guarantee in pino just because it's hard to guarantee upgrade uh in any order uh so we chose this order and we said we'll follow this in terms of our development process to make sure that new brokers can talk to old servers but may not be vice versa interesting okay thank you oh this is great guys thank you so much for like taking the time and putting together the presentations and live demos are so nerve-wracking but they're so much fun um so if if in the future you want to do more live demos you know let me know well we'll bring it back um so i think that's all the questions for now um if anybody is watching on demand afterwards because this is going to be published to youtube so it's going to be available on demand in the future if anyone's watching in the future you're welcome to um join the pinot community slack and ask any questions there any follow-up questions so yeah and i wish to add that in linkedin we have our own pipeline of test suite so we have a test suite that we run and we have a ci pipeline that you know we know our deployed versions and we know our the version that's out there in the master now and we consistently constantly run the upgrade uh you know test the compatibility test across the deployed version and the current master so that we know uh when something gets checked in is it going to be still good against our deployment uh and uh and that happens like every night at linkedin wow so so then we have we have a good picture and we have we put our own table configs and our own um uh you know the server settings and whatever configurations that we use uh along with the uh um you know special any special settings that we have um so that's that's what we encourage all of you to do and all we need to do is actually change the yaml files uh to to do our own our own tests there wow did you want to add something share you uh yeah i just wanted to say uh you can add your custom configurations into those you can add your secrets and you can run in your own environment so we definitely increase it oh that's awesome yeah yeah that's really cool it's really nice to have like the insight on how you guys are doing everything over at linkedin because it seems like you have the formula right because we've been bitten by it before where you do an upgrade and then you do a roll back due to some other reason not necessarily though and then roll back you know doesn't work and now you have you're stuck in a you know very hard place between you know two very hard places so yeah that makes sense i mean it's good to i mean it sucks that you had to learn from your mistakes but it's good that you're sharing it with other people so they can also learn on how to avoid um you know obstacles in the future so cool um well thank you guys so much um and thank you everybody for joining and watching and uh don't forget to give the video a thumbs up if you enjoyed the presentation and yeah hopefully we'll see you soon thank you bye bye bye
Info
Channel: StarTree
Views: 91
Rating: 5 out of 5
Keywords:
Id: a-STZry4VdA
Channel Id: undefined
Length: 33min 35sec (2015 seconds)
Published: Wed Aug 18 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.