How GitLab uses k6-- and how you can do it too, with Grant Young (k6 Office Hours #27)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
for kasich's office hours i am nicole van der hoeven and we are doing a little bit something a little bit different this time this is pre-recorded so i will be in the comments live uh but uh one of our guests today um couldn't make it at the the normal time and we still wanted to have them on so today i am not joined by my usual co-host emma aronson instead i brought in mikhail who we've seen before who are you mikhail uh i'm one of the k6 open source developers i have been with cases for like three years and i'm glad to be back on the okay in the office hours yeah a second time last time we were we had ivan skiba talking about how to make plugins how to make extensions for k6 and our special guest today is grant could you introduce yourself i don't know how special i am but yeah sure um uh my name is uh grant young i'm a staff software engineer a test at gitlab the devos platform okay so what what do you do as a software engineer and test so um my particular role in get lab is to help with the quality testing of performance as well as actually testing gitlab at scale so we need to actually a lot of that work involves how to actually set up gitlab at scale because it's like most software is comprised of various components so we need to actually define how that actually looks at different scales and then we actually test that performance and various other various other things so that's my my main day-to-day so what do you think is is the difference between a performance tester and a software engineer in test focusing on performance um in quality roles are quite uh flexible uh usually you'll get people kind of dabbling into different spaces because usually companies will need their quality teams to kind of go into those spaces performance in the past always has been more of a side gig uh talking several years ago of course not recently and it's become obviously more and more of an important aspect where now there's more dedicated roles out there my role uh is to ha is to manage the performance testing would get leveled was not my title but that is kind of the main part of my role and then everything else that comes off that like see actually figuring out the right environments the right test there all that stuff so yeah um people just can't just usually most companies people there's not much dedicated resource um hopefully that continues to change because performance testing is its own it's his own discipline really so yeah yeah i definitely see that it seems like because i come from the the testing side and i definitely feel like the lines are blurring significantly between performance tester even site reliability engineer there's a bit of blurring there and we're kind of existing in the intersection of several fields but there's nothing like as a testing engineer in dev unfortunately i mean i think you know we we might be heading towards that in some spaces for sure i mean uh the quality space in particular and software has changed dramatically since i started around 10 years ago whereas back in the old days it was i click some buttons and report manually that those buttons our worker don't work um and then it's just completely kind of expanded massively since then and into all these different disciplines and and and in case like performance testing or another security testing they require quite a lot of knowledge and discipline in that space and then that does kind of start to blur lines further about you know a person who could test perform as well could probably also develop in that regard as well and and yeah the lines are continuously blurring qualities or he's an evolving space so who knows where he's going to go next really but yeah went in that direction yeah so just to take a step back here um could you give us like a a brief the spiel on git lab what is it what do they what do you do you have to describe gitlab succinctly um it's it's we call it a devops platform but as hard as a software it's a source code management tool but we've also all these other tools that we want to give developers in one place for them to be to do the full development of software from start to finish you know source control ci and various other tooling uh in one place and one you know in one product so they can do go from start right to the end without having to use multitude multi-tool chains and all the complexities that come that comes in with that so the reason that we wanted to have you on on office hours is because we heard that you at git lab started to do load testing started including load testing as part of git lab with k6 how did that happen well yeah i kind of got bored out of the fact that when i was brought when i was brought in we need to start really ramping up the performance testing of git lab itself and this is where the the fun part being gillab is a lot of stuff that we do for git lab also could be a feature in gitlab for every user to use with gitlab against their own software which uh it's a main melter of a statement but that's that's that's the space we live in this kind of a yeah circular space so the fir we we needed to ramp up performance test efforts against get lab itself so we could start improving the performance of the of the products and uh we did a selection of what tools they want to use out there to try and help us in this journey and kasich's immediately kind of popped up and really showed off to be a really decent performance tool that ticked a lot of boxes for us and we took that we added a small wrapper just for some specific gitlab stuff that we needed and some additional features and we started really performance testing gitlab and that went really well and then the question kind of came up with can we now add this and for users to use and gitlab against their own software so it wasn't a direct copy and paste it was it was a similar idea so we then look at we looked at adding in a ci feature where people can write up a k6 test uh in their ci and run it and then get lava report some key metrics from that test in the in the ui for easy kind of consumption so not to make mihael's head too big but why did you end up choosing k6 um the performance tool kind of space right now there's a lot of there's quite a few tools out there but some of them are quite old or are of lost uh you know maintenance or they're not being actively developed anymore some of them are quite specific in what they want to do and that's okay they're just there just to like hit one endpoint and that's it the thing that attracts the k6 is that it's performant it's based on goal and is scripting languages javascript which means that you have a lot more flexibility in writing tests and and actually having having a full language there to actually work with is is quite important is quite a big feature compared to say working with like a yama file or something else along those lines so k6 really did kind of pop up and the fact that was like actually being developed and it's becoming quite a a leading tool in the space that just helped mihail i know that you are also not too long ago doing some pretty extensive tests on on the performance of ksx wasn't it uh i think that i'm doing pretty well mostly they're not expensive i'm constantly testing k61 it's performance not more girls because like we sometimes have had regressions which have been like uh mostly very easy to fix but like they do happen and uh yeah like uh k6 is in generally very performant i would definitely not say that there are not corner cases where we have like extremely bad performance which is mostly good like uh mostly bad api of the current system but that's probably going to change like we have fixed a lot of those and i'm pretty sure we're going to fix all the current ones but yeah i would in general say that this is performant and we're trying to like make it as performant as possible because obviously it's very strange that you're using to performance that's slower can't do its job yeah do you think that a lot of k-6s a lot of what makes it performant is owed to go to the fact that it's built and go i would definitely say that like i don't think that we would have been able to get this performance and this speed of development in practically any other language like yeah so yeah go is definitely one of the reasons that this is as performant as this currently i also think that it's the only performance or load testing tool that i know of that is written in go unless either of you has heard of anything else there are a couple that are mostly like uh what was the apache too like apache has like a very base two that basically hits one end point like there are a couple of clones of that it'll have like maybe they can hit like a list of twos but without list of things but they're literally like you give them this six links and they get them so i generally don't consider those to like perform like tools for performance testing i guess they are testing performance but like i don't really constantly like even in like competition with k6 i guess there is like some user base that's actually like okay with them and like basic something that they can use and i would argue that in practically all cases the other twos are going to be much more performed because they basically don't have to do anything else but like the requests so yeah i think that's kind of what grant was saying earlier that that there are a lot of tools that just make the requests but i think that load testing or performance testing is way more than just being able to mechanically make the request it's it's also creating a load profile around that it's also think time and workload modeling and and all the things that are a framework like k6 is is more than just a request sender right it's also a way to verify that the response that you're getting is the correct one it's it's it's a way to schedule requests throughout an entire testing cycle like it's there's a lot more involved than just sending a request yeah and i also don't know like if any of those who that can like to put five items in a cart and then uh buy all the five items for example like they just don't have any scripting so yeah so those are definitely more performant than kasich's and i know of like a couple that are written angle but like i don't remember at least off the top of my head any like scriptable such one so yeah yeah so so grant you're saying that um it sounded like you used k6 initially just internally without any expectation of using it further so you were mainly just using it to test git lab itself and then later on you decided to add it into your offerings why why did you decide to do that um the tool we built uh with with to test gitlab itself we call the aptly named gitlab performance tool um that tool uh kind of just grew kind of got a little bit popular internally uh and other people would start using it as well for their testing because and then also kind of started to get a little bit popular some customers as well so that kind of grew out of organically in that way and then the question was asked that would be good to have this in git lab itself this question we as often at gitlab so it was just kind of internal discussion about oh this could be good in a different way to make it work within git lab itself um so that's that's kind of how that happens really yeah i think that's one of the benefits of eating your own dog food and using the thing that you're building right because you are also a user of of get lab and so when something makes your life easier when you are using it then there's a good chance that it'll make other users lives easier too yeah we love to go for the gitlab it's difficult at times obviously and not everything applies like the the the cases we use the test gate levers say they the we built a wrapper we don't have in in the gitlab feature itself because that wrapper only really applies for git lab testing but um the core of it the k6 core was something very uh viable to move in so that's what we did so what is the difference between k6 and this this wrapper like what are you adding in the wrapper and the main thing we added that like because kasich is is a good tool to run like separate tests uh if you write out different test files the wrapper we added was simply just to kind of run multiple tests in uh sequentially so we can have like 50 tests running and then the rapper will collect all those tests individual test results and present them in a nice way at the end so those little things like that also there's a whole bunch of specific gitlab stuff like it checks to get lab server in its version and detect so i can't actually run that test it's not valid for this version and stuff like that yeah nothing nothing nothing crazy okay um would you be able to to show us how how it looks uh i could do i don't have that one so bear with me i'll get something to have and i can show you that in a second yeah no worries um no no pressure i mean no no no live demos are difficult but i'm just i'm just personally curious i've used gitlab before but it was some time ago and i've definitely never used the the load testing part when did that actually come out uh the load testing little performance test feature where you can run it in gitlab was added sometime last year i don't have the specific time on top of my head but yeah maybe maybe towards the end of last year but uh i'll double check now um so it's probably fully built june july last year okay cool so i could show you how the how our test tool which test a gitlab environment works uh okay seconds sure no problem mihail i think you also you talked to grant last year was it last year that you talked to was it earlier this year no no we talked earlier this year i think that we have had a lot of back and forth in like the issues and maybe the community forum might be slack but that was like last year but yeah we talked about maybe a month ago about exactly the internet that they're using and what they actually tried to do and so on so forth and like we do plan on having some kinds of uh case experience multiple tests and then the reports something nicer but like it's summer and yeah i don't even know like how like what it's going to look like and yeah so grant are you ready for me to add your okay here you go all right all right thanks so much could you just um zoom in just a little bit uh yeah just in case people are watching with normal monitors or laptop screens sure is that better yes thank you so and this is like uh i feel run where we we set up an environment in gitlab which is about a 10 000 user size environment and we run various tests against various endpoints and web points with various k6 tests but our wrapper just takes them all and small sequentially collects the results this is like the fill output which you you guys will be familiar with this is pure k602 at this point and the bit like say that at the end uh it comes in with uh because there's because of the the uh the the ratio the table's been wrapped but this is just a table here at the end where you just see the specifics about how every test did and there's some specific metrics around like what rps was achieved time to first bite um average that first by uh 90th percentile uh how many requests were successful and the result and that's really the key thing of her rapper just does it just makes it all packaged so because we need to come in every day and look at this every day so we need to first up and and make it uh you know consumable at a glance so oh that's great um and did you do this by using the the handle summary um function that that we released and what is it 0.29 um no because we this this predates that so we had to go um we do want to go back and look at the nearest k6 features and and switch over to them because you know uh there's no point doing it ourselves if k6 doesn't but just not the chance yeah so i mean we know that's there and that's a really useful feature um i think i did ask a bit early on about changing the summary output and stuff but yeah you could see mihail is that is that the way that you would suggest if peop if someone were to do this now if they were doing this now would you say handle summary would be the way to go i mean like it's probably going to make it a lot better i don't know like i don't remember whether or not you're currently using like an output and then aggregating all the things from an output or are you doing that or are using like thresholds and basically parsing the the current summary like the next one that's up here we we parsed the current summary yeah that's how we currently do um yeah i i guess in that case you can you even use like the just the json uh summary which has more or less all the stuff that the handle somebody gets just like apologize from them just yeah but use it we do use that in places as well yeah um yeah we use we use like the tech except for the json up in various ways yeah to to do the collate with output once k6 uh gets the full ability to run mobile tests and has a summary of although how those all those tests run um our wrapper would become pretty redundant although we'll probably still need something to select what tests to run and stuff like that by because we need to ask it we need to pull the gitlab environment for various things but in terms of the actual wrapper features that that would be the next big step that would uh that people would be able to do like this i remember that we actually added the original json summary in part because you probably asked for yeah i didn't ask about it yeah that's right yeah yeah because the um there was no there wasn't a summary output for foreign debug like output that was very hard to parse um but then you guys did that the json output which we do improve this tool is output and we do definitely read it for some stuff so yeah thanks for that so so how would if a user wanted if a user of git lab wanted to add a k6 load test to fire off after a commit or a you know a release or something like that um do they have do they then also provide the kasich script or do you just kind of use a default one and they should provide the cases scripts um by design because everyone will have a different yeah uh i think we do default to like a very basic script just if you guys because we needed that we needed something to run if you set up the job so it's like one of the example scripts that kasich's themselves provide that you provide that is uh it literally just hits like a http server a few times and that's it but um yeah we encourage people to dive in and write down the load test because every every application is different so okay and do you find that for ci cd pipelines that these load tests are like peak load tests with you know many users or um or is it mainly a couple of users just to to verify the performance on a lower level it's completely up to the up to the end user that's that's that's why we designed it if they want to do a big load test that's uh huge then they can write the test to do that and change the settings to make that work or if the most likely for this for this feature because it would usually be used with an mr or pool request and github terms um usually that's specific to a feature area or something you know more focused so usually a load test in that and that vein would be something specific for that feature so again it completely depends on how the user how they want to test whether it's another users or they just want to hit an import a bunch of rps okay do you want us do you want to talk us through this um the instructions for how to get it going sure yeah um so the it's an extension of gitlab ci so we have to assume that there's some for people who want to do this would probably need to meet up with the fci first and how to set up that and then i have we've got a whole bunch of specialized jobs that you can call in ci that would then report in the git lab ui in various ways we already had a feature kind of like this with browser performance testing which is the other side of life where um we actually test the web page itself to see how it performs in a browser that was already in gitlab for several uh years so and then we just wanted we that's kind of where this conversation came from we thought we'd better we could extend this to the back end little testing essentially to see how the server performs at that um the configuration uh as you can see here i mean i'm gonna i'm not gonna be that the page of course could be quite boring but the configuration here you just see this is the ammo file for galaxy i you'll see things that include this template which is something that we find this is like this is what a special job is we already have the full template already built into gitlab you just need to pass in your file as well as any options you want to have as as a as a just as a variable and then when you depending on how you set up the job if it's like runs avmr or if it's a manual task you click on a button to run or something else like that every time it's configured to run it runs and then gitlab should take the output of the test and show various uh key aspects in the the merge request ui which i can show you now here so this is a i get a merge quest um and this is for both features i described so one one is with the browser side and one's with the load side it runs two jobs in the background uh and it's then going off and collected their artifacts and then purchased them to show them in this in this in these widgets here so here you'll see this lobe forest test uh actually shows several metrics that are relevant so the 21st byte p90 p95 the rps how many requests was successful it also compares it to the last run if it's the same branch it will it is able to go back and check and that's what this text up here is three degrees and one same which means that you know the last one has actually got worse in this case but uh that that's that's fine that's good to know are the is that based on the response time or the feed degraded this is literally taking each value and comparing it to the previous room oh okay all three of them including the checks so that's the one that'll be the same here the the checks which is good it shows you how 100 very successful so that's what you want to see that's really cool that that's actually a feature i would love to see in k6 the ability or probably k6 cloud the ability to compare between runs and and say you know how much worse it was on k6 cloud you can manually compare two runs but it would be kind of nice to be able to compare to a baseline even just in case xoss but that's probably more difficult yeah that was already built into git lab so we were to piggyback on that so um is able to take different jobs in the same branch and look at the previous and see what's going on i think that one of the original like things that you could do with like a candle summary is exactly this in like k6 but you basically would need to like do it on your own so like you can like in the hand of somebody get the previous results somehow like obviously like in the case of github you have some logistics on how to like get the previous run whatever that means and then like compare it to it in in the case of case it's like we don't know this so like until we decide some or if you at all like decide to like integrate it very tightly we need to like have some music on how to get the previous one and then what comparison we should do and like what they mean basically and um slightly off topic here what what do you use for the browser performance test it's high speed i'm sorry could you say that again i coughed when you said it so it didn't we use site speed for the for the speed okay all right cool and is that also um that's that's not there's no script for that right it's high speed is a big bunch of different tooling that they've kind of pieced together to to handle the obviously browser testing is is no that's that's that's quite complicated uh you know you need to run a browser and and have all these different things to measure met the different metrics and and all that so speed this best run and our experience is running best as a docker i also need to run each component manually which is quite a heavy burden so in a docker door everything just runs in docker obviously and you point out a page and it just fires a whole bunch of different stats and statistics so yeah i really like this because i i love that it's both sides the front end and the back end and one of the reasons that you should do both ideally is just this it's being shown right here that like if you just looked at the browser side the front end side you'd see oh 13 improved one degraded so it got better but if you look at the back end side it's three degraded so it tells a slightly different story because they're measuring different things and you kind of have to have both because neither of them shows the complete picture this is why performance testing and the spaces are disciplined because um they are complete to the two sides of of the same coin you could say but they are completely different disciplines in themselves a server failure performance wise is completely different to a browser failure before it's wise in fact the verbs are usually essentially a bit more complicated typically versions are usually quite simple sorry servers usually it's quite simple comparatively in context because silver is just like okay there's a ball next somewhere in our server code where's you know it's much more of a a straight line of finding out where things for browsers it's all about css and davidson and all these other javascripts and paints and this there's a lot of layers to the to a modern browser but uh that's that's the way it goes yeah now this is really good because once once tests are tied to you know the the development cycle then these are these testers is firing off without manual intervention i mean that's that's the whole point but i think it's really cool that performance tests are and then i'm starting to see performance tests in ci cd pipelines too because it seems like it wasn't too long ago when that wasn't the case no and i think i think it's probably still early days of that and that's the reason is that the performance testing itself is actually somewhat the easier part of the whole problem when you're wanting to do for browser testing it's a little bit easier because burs is all it's encapsulated it's easy it's in a bubble but for server testing you need to actually have a server to test again so you need to make sure it's kind of like a scientific experiment you need to make sure it's exactly the same server each time so that the results are comparable and you need to you need to then figure out what size server do we need what kind of makeup or server do we need kind of oh should it be able to even handle this repeat that we want to target can it handle it and then you also then need test data as well so in my experience actually the actual saying of the test themselves actually a fraction of the effort yeah yeah that's a good thing um was there something else that that you wanted to show us or should i remove your screen no no that's it yeah feel free to switch back just to see you both better um no i think i i think that there's a lot to be said for for adding performance and and you know how it's normally broken testing is normally broken up into two things one's functional and one's non-functional but i don't really like that distinction because really they're both functional right they both affect how how a user will be able to use your site i mean if your site's down technically it's supposedly not within the realm of functional testing but i argue that it really should be yeah those terms are quite um they're long-standing terms and in the old days maybe they were a bit more appropriate but yeah these days i find that i find myself using them less because they're functional it's too hard of a line to define the tests as you say and performance yeah i agree i i say uh um they they really do cross into their own functional pieces uh there's a difference between something that's a bit slow but still workable you know it's like okay well it's loading a bit lower slower but it loads and i can use it to something that doesn't load at all uh you know that that very much crosses into the realm of well as then it's literally not functional it literally can't work um yeah i'd say those terms probably need to be archived shall we say but um they still they still they still linger a lot because unfortunately they do capture it a lot when you see them functionally okay that means not specifically click the button doesn't work kind of testing is the other stuff but yeah we need more we need more a bit more terms essentially to show what each one means really the reason that i i brought it up is i think that breaking it up into functional and non-functional also gives the impression that you don't really need non-functional it's almost like in in people's heads it can also mean mandatory and optional which is not at all the case or at least it shouldn't be it has kind of it does it does unfortunately fall into that pattern yeah more often than not and that's and that's been the case for performance testing in the past as well it usually has been relegated to a secondary test or function or done after the fact after you know after the main functional testing is done and um yeah these days that's that doesn't really much fly anymore users are prayerfully rightfully wanting a performance application and uh it should be created as top tier i think it's kind of two sides of it right one there are a lot of testers or development teams that still don't have performance tests built into their pipelines even if they have functional functional tests but also on the other end there are a lot of performance testing tools that aren't nice to use within a ci cd pipeline i i think i know that with k6 that's one of the one of the things that we want to achieve we wanted to be developer friendly right yeah but i think that that was one of the like the primary goals and like the things that it was started with like the idea that like we want people to run their performances practically all the time preferably on if not on each commit then like every day at least once and like that requires basically very good automation because otherwise someone will have to run manually which is definitely not going to work so yeah like integration with series is like very common thing that we talk about and work on like there are probably more blog posts about how to integrate with different ssd solutions and case exam like actual like how to write k6 scripts probably at this point i'm pretty sure yeah because i think that that's i think a lot of the value of the activity of performance testing or or the study of performance testing is making is setting up a framework so that it's repeatable because the old the traditional method of going in and fixing an issue and then running a test um and then fixing more issues and then leaving that's that's not really sustainable i think it's it's much more if you want to add value you need to make it so that it's just part of the process it's not just one activity that happens at the end yeah and that's basically the reason why we try to be very easy to integrate and like to be helpful like returning cultural errors when like errors happen so like your pipeline will stop and tell you why it stopped hopefully so you can do something reasonable about it not just like well we did not manage to start your test that probably doesn't mean that like you should like think that your application doesn't work you probably should like try again instead of that and so i think gitlab um is is pretty cluey on that because you were already using it internally before you rolled it out to get lab users have you noticed any any gotchas for for setting it up um yeah the main the main challenge is it it is easily integrable now with ci which is the good thing but the main challenges then actually setting up the environment and having it work in a consistent way at scale for every mr or every commit that scale for a large team can be quite heavy and so it's the challenges for each team to have to or each company to try and figure out what is what is an acceptable level uh for performance testing we decided on daily um we test each major nightly build at gitlab daily because we already have that kind of daily cadence uh and so that that works well for us because we can we run it overnight come in in the morning and check the results make sure everything is all maybe make sure everything's all fine and then that keeps us on on track and make sure and we have obviously caught some performance slips in the past thanks to that so um the challenge yeah performance testing this this is still even even though the tooling is improving and the guidance and the knowledge but it's proven there's still quite a few different ways for different companies so the challenge is just trying to figure out what what works best for you yeah i i think you also mentioned earlier that consistency is key i like what you said about how it's it's like a scientific experiment you know you in a scientific experiment you have a control group you don't just start with like let's say you're it's an experiment for the effectiveness of a drug you're not just going to get a bunch of people and administer the drug and that's it that doesn't give you that doesn't show you much because you don't know how they would have been affected without the drug or how it would have how their conditions would have changed without it so i think i kind of think like the the control group in scientific experiments is like a baseline in in our world right you need some sort of baseline so if you're not running regularly and you don't know how it was before you applied changes then you're not really sure what changed or what you can do to change it back or to improve it yeah an individual performance test doesn't actually tell you that much in in isolation i can tell you obviously that endpoint is performing bad now in this case but it might be performing better tomorrow and there's a whole bunch of reasons why that could be the case and so that's why i call it a scientific experiment because uh you need to control the entire test out from from start to finish you need to have the same environment essentially uh it needs to be clean it can't be used by anyone else at the time because someone else could be doing something that could be slowing down the server for some unknown reason uh and you need to control the test there to make sure exactly the same each time and so you need to basically it's a it's a very it's a bit of an intensive process about removing every potential factor that could impact its performance test results um and once you've removed as much as you can then that's probably where you want to be and that's where we are right now but yeah i can it is the harder part of performance testing for sure to have that set up and have a consistent way once you get there it's good but it's hard to then be a little bit more flexible and say oh but can we test this other environment it's like we can't we can't test this because it's not controlled and the results will be off and yeah so it's a challenge but thankfully one that can be worked on and so i'll just just need to take some time and effort to do yeah when you talk about making sure that you're testing the same environment i do i remember times when people would would run a test in in like a pre-product environment like staging or or one specifically for performance testing and then compare that with results in production and and it's like there is there might be some value to it but you have to know you're not comparing apple's apples here they are depending on how it's set up depending on things like the the test data like you mentioned or or just the size and the power of the the servers that are underlying those environments it could be it couldn't be that you can't even compare them at all that's a hard pill to swallow for small teams that might not even have a pre-prod environment yeah it's a journey uh performance testing and and people typically will go into those kind of common areas where they think oh we can just compare these two oh look this is this is good this is bad but it's a journey where you have to take people on along the reasons why actually that's not it looks the same it quacks the same but it's actually two different ducks i guess for lack of a for a bad analogy it is apples and oranges really um and even though it's the same software they could the server make up the test data what's what's actually happening on those environments at that particular time all of that and more can affect because it can can affect a performance test to make it actually even substantially different and that obviously that's bad data and bad data can leads a whole team's history uh and then bad pass trying to figure out oh there's a there's there's a ball neck in the database when actually there's none there's actually a different problem so it is important to get that right and it can be challenging um it's definitely the biggest challenge performance testing but once you get through that and actually have that consistent performance test the the data again is then golden and can help substantially we've made multiple uh performance fixes in gitlab over the last couple of years i can't even count them all really we've actually found good ball next to good problems that we fixed we're still on that journey but uh uh yeah hopefully hopefully our users and customers are are seeing uh performance improvements that's the thing it's like a race at the bottom right like i i don't know of any company who's ever like we're done we're done with performance testing it is as performant as it's ever going to get that's a cue to run i think i guess you could say that maybe for specific areas and stuff we do have targets and say once we get to the target then that if that endpoint is performing to that target then that's good but yeah the actual overall performance journey yeah you could you you own it for forever really yeah yeah i mean i guess go on i was going to utter a little bit like one of the things that i noticed i was like the last three years so i have worked in k6 because previous to that i didn't have any uh i i guess i had performance load testing because i have done it for other things that i have built but like not to the degree that i currently work with like a lot of the questions that we get and a lot of the problems that we see our clients have at least the ones that i i have like uh seen personally they basically go in like one of the two directions for like they either don't know what exactly they should test and then they don't know what exactly like like if they get the number 10 what does 10 mean like it's 10 good stem band and so on and so forth which is like in general the product we are like well we can't really tell you how you should like treat your data like if this end point will be caught like by a thousand people and like it needs to answer very fast because it blocks the for example the loading of your webpage it probably should be fast like that that's the idea but like if it's something that like someone will call and like he can wait like 10 seconds in order to get the result because it does some very complicated calculations it's probably okay for it to be like 10 seconds it's not a problem and so on and so forth like all those things are different and also like the fact that the one end point can handle like thousand and they are kind of like two ghosts a second it's also like it's it's all depends on what you do and there are things like people are generally like we want to run our like uh like our production is amazon web services and we have like endpoints across the world like we want to do that but like not actually have things across the world and i'll actually test them across the world and like have the same results as if we like just have one server like in a lot of cases that's not really like again possible because the thing is that like maybe if your architecture is correct it will scale exactly the same but in other cases it won't because like something will be saturated somewhere and let the europe go in japan for example would be really really slow compared to your other goals and so on so forth yeah that's i i still get asked that a lot like what response time should should we be looking for and you can you can't really answer that there's no one answer for everyone there are a few things that you can do to try and figure out that number you can look at studies like the the one that google did saying that after three seconds people dramatically drop off you can look at competitors site and measure their performance with one user don't run load tests on other people's sites but the reality is if you have a direct competitor and your site is really slow people are probably going to that competitor's site so maybe see how they're doing um another thing you could do is look at http archive which is an archive of a lot of things but also response times of different web pages you could look at ones that are in your industry in particular because people have different expectations like if you're applying for a mortgage you don't necessarily need an answer in one second right and in fact that might even be a warning flag if you get a decision in less than one second for something of that magnitude but maybe if it's if it's just a an e-commerce site maybe people want to be able to buy quickly and in that case one second might might be reasonable you know it's hard to tell but i also like that like what what grant what you what you were saying grant was um in the gitlab results you're not just looking at the response times and and saying oh that passed or that failed because it's impossible to know it depends on people's situations what you're looking at is comparative response time which is i think a lot healthier because that's more useful for everybody yeah it can be it can be difficult if i should say that the research on how quicker a certain point can respond is limited so there is some classic studies from the past where yeah if you're over a second you'll probably start to get into a bit of a no not so great territory because yeah that people will start to respond negatively if it takes more than a second obviously the worse it gets the worse the response um and since then there's been various kind of uh targets thrown out there by different companies like google and everyone else we've selected a target for 200 milliseconds for server endpoint response time and that that doesn't mean you will get your visible response in front of you by then but that's just kind of taking that that part of the process and trying to keep that as lean as possible because then there's the browser response time which then it goes through hole it takes the data and has to render on the browser and that complete depends on the machine it's running on it so there's a whole different there's a whole chain for for a request to actually fully complete but um we we do have criteria along the way but then we also have that we do have the comparative as well to to to to mainly detect if there's any major slips yeah well those um you can certainly i think if it's an internal test you you are with that company so you you know your architecture you can talk to your business people and you can all collectively decide on targets but um i imagine that that for your users it's it's difficult to tell them whether their tests have passed or failed because it's so it's so dependent there's so many variables that get wouldn't know yeah we that we can do okay like because gitlab being a development tool uh obviously our main target users want a speedy application and we'll be more vocal if it's not so we set we set an ambitious target that we're working towards to try just get get that to be as lean as it can to get out the way essentially and let people get on with developing their code um but the but yeah when we develop the feature in gitlab that's why quite a few times that i've said i was up to user we've left it open because there's so many different ways that performance can be tackled by people with their software and whatever targets they want they can they it completely depends on their own specific uh uh what they're looking to do if the software who the target users are and stuff i just say some software doesn't need to have a super quick response time whatever's due so you know um it completely depends yeah so that's why we leave it open to the end to the users to make their own decisions another awesome thing about having load tests in a cicd pipeline is that you get a lot of data points and so what i've done in in previous engagements is if i have especially if it's if if you're targeting production not necessarily with a full-blown load test but with any sort of performance test if you have those data points then you can take them and graph them on the same chart as conversions so if you're an e-commerce site then you then you can get out that you can pull out from your google analytics or something how many people actually checked out and then if you graph them together by time then you can actually see how performance affects conversions and that might help a company sort of narrow it down and see okay well this is clearly unacceptable for us because conversions went down significantly and you can sort of see how how responsive or how sensitive your users are to performance and sometimes it might be that they're not sensitive at all if it's like a key a good example here is if it's like a student portal to a university there is no competition students might not like it but they're still going to use that portal because they don't have a choice so they might not be as sensitive to response times yeah that's the sort of thing you can only do though if you have the data points yeah the that's the golden dares talk about earlier having that comparison piece uh is uh just just widely important we have we we do have uh we do we do have a separate another test where we run our we run our test tool against the last five versions of get lab every month uh and then we can actually compare them like for like to see if there's any major slips that we've missed in our testing as an additional layer so it is very once you once you get into that once you've got everything else worked out you can get into that space there can be can be immensely valuable so we've got a few more minutes and and up until this point um has been all very nice things to say about k6 what were issues that you experienced with k6 um in scripting or features that you you wish k6 had yeah i think i think uh mimi heals already talked about this in in our last chat but yeah i mean the things that we had to immediately tackle is that we definitely want to run more than one test that's definitely the key thing i think k6 would massively benefit from them where you can actually just pass a mobile test and tell it to run it and have a nice summary output at the end that's what our wrapper does and we we look forward to in case this can do itself um because that would just unlock because that also impacts the the feature in gitlab because right now we can only run it's just pure native k6 really so it can run one test and and it can do whatever that test what you can do in k-6 today with whether it's with scenarios or or stuff like that and um that's that's that's it's an amazing start but then the next step would be i want to assure and i'll do a test on different aspects of the application of the website and uh that that's definitely the big big feature we're looking forward to but i appreciate that that's that's not easy so we we we're okay we're not we're not blocked by that so we're just we're just waiting for you guys to to get on with that waiting for mihael he's not the only one on on the k6 oss team but he's the one that's here so yeah well we're not working on it i guess also because there's a workaround right you can do it with scenarios it's just it'll be a bit of a like a big script yeah so yeah yeah yeah so understand that in a lot of the case you can write like a small shell script basically to like run the tests and like then do some processing if that's required like the problem here is that you want the processing as well to be done all the tests which is like the actual problem i guess in this particular yeah i mean that that's that's that's yeah that's i guess that's the ask yeah but i appreciate you know it it's not easy so that's fair enough but yeah that's the ass there and then yeah i think i think um the other aspects that we've we've uh had to work on a little bit and there's been features in cases that can help us with it but we we dabbled a little bit with more scenario tests where a user is doing multiple actions uh in one test script but then the summary output would be an average of of all those actions so then you see a high result but you actually missed the fact that actually the right all the actions and set one were good but that one was actually bad and that's actually that impacted the full results at the end we forget that by doing tagging in the results so we can actually we can when we see it happening we can we can we can go into the actual test and look at the individual endpoints and how they've performed and that helps us but um any improvements in that space would be would be welcomed as well i guess that's not easy yeah i think multiple people have like said that like they expected that the summary will be first scenario which we haven't done like i don't think that like the whole uh scenario or implementation which is infamous with its pull request number of 4007 was like i think that we finally pushed through basically like we get some stuff that like we didn't one two i think that we didn't get enough stuff out of it but like that's a different matter and like and unfortunately i don't think that we have even discussed multiple uh likes having some repair scenario it would probably be pretty uh big in like the if you have like 10 scenarios you will have like the summary that's probably going to be quite big and i i'm not really certain if like even people do 10 scenarios currently and so on stuff like i have seen scripts with like 50 scenarios in them like i have i have tested scripts like a thousand stars but like that's not like yeah uh but yeah scenarios technically do back their uh metrics like all metrics emitted by the scenario are actually tagged with the scenario name so like you can technically like the same way that you probably currently do it like the uh specific endpoints just look at the scenario which like another workaround but yeah i guess this is the thing that yeah it's a kind of on the roadmap like we're currently uh working on redoing our metric system which will take me hopefully give us a way to like do this without actually being very unperformant so yeah and per transaction to mihail i want that too like not just on a scenario level when we fix the metric system like when we get more performance yeah because i i think that seeing an average response time for everything when one's opposed and one's a get it's like i don't even look at that because they're so different so that's actually one of the ones that i use cloud my general point in the summary is that like it's not supposed to be looked at by anyone that actually wants to like like see how how their performance actually is if they have partner on like one end point they probably should have an output like do it there yeah or something like this like for small tests i guess it's fine like if you have like yeah i'm going to do 20 requests to this and like i guess like 20 requests are not going to be that much so i'm basically like measuring how this system handles just 20 like a request how works response time is i just do 20 so i have an average of something and so on so far i have to do like a thousand of those because like i have a thousand endpoints that's probably a good reason to like not have an endpoint to like actually like grab you the different response times but in general we have seen users were like yeah like you have like some amount of response times for some time and then like suddenly it basically gets a lot worse or a lot better because like your system scale your or summaries or such like uh you for example have memory something like this so things like this are really hard to show in the summary like we will basically have to like be our own output which makes k6 like a a dude that's not only like a lot performance too but now it's also like a metric storage at some of some point and like then we have to do a graph in the in the summary uh given like some issues we have with different uh command lines and shells and so on so forth that's also is probably going to be a very fun task to tackle and fix for everybody so yeah like it's just like it starts to get harder and harder the more you get into it because like if you do it just for yourself you can do whatever you need to like you can just do like i care about these three end points instead of the uh the thousands of uh i have like i have thousands that like i do but like i only care about three but like to do it like for the whole test we basically have to do it for every end point so yeah i guess there are solutions we just haven't gotten to them yet so next time that we have you on mihael you'll have addressed grants wishlist uh [Laughter] grant was there was there anything else i i i'm mindful of the time but was there anything that you wanted to plug either personally or for git lab or anything that you want to say before we close up uh nothing nothing nothing major no we're just working away diligently trying to improve the pharmacy gitlab so if i was watching and thought that gate labs on performance uh please take another look uh we hope we hope that you'd be hopefully you'll be impressed and uh and we're continuing we're it's a first class feature now in gitlab essentially we're working very hard trying to improve performance and we hope we hope people get to see the effects thanks to cases it's it's always nice to see a company that eats their own dog food and it leads to nice um gains like performance gains like the ones that you've shown here but um thank you for coming on to to office hours and thanks for using k6 too that's that's really cool it was it was awesome to to find out that a big name like gitlab used us as well no it we it was it's really as well desired yeah we've been very happy with the tool overall okay well before mihaela's head grows any bigger we'll end it there all right thank you everybody for watching and we'll see you next week bye
Info
Channel: k6
Views: 248
Rating: undefined out of 5
Keywords:
Id: YTGkq0m1bYk
Channel Id: undefined
Length: 60min 33sec (3633 seconds)
Published: Fri Sep 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.