What’s in the Box?! A Tableau Server Deep Dive

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome in everybody there's plenty of room at the side so keep streaming in I'm actually going to get started because there is a ton of stuff to cover today if you're late don't worry you're only going to miss the intro it won't be that relevant right welcome everybody welcome to what's in the box a tableau server deep dive my name is Tom Christian I'm a solutions consultant here at tableau effectively and I suppose purposely for this training a lot of the content that's going to be in here is the type of things that I talk about on a daily basis a few housekeeping points unfortunately as I said there's lots of content so I'm going to save questions for the end there should be a good 10 minute chunks if you do have burning things that you need answering feel free to ask it then I'll also stick around at the end for any particular specific questions and in terms of who I am and why I'm stood here in a Tiger based shirt talking to you all today these are a few things about me as if you're close enough to see I love stickers I was going to give stickers away and then stuck them on my laptop instead but mainly as I said so these this topic tableau server is something I talk about all the time and more importantly it's something that I teach our own internal sales consulting team about this stuff all the time as well so you're actually sat here getting the same training in a condensed amount of time that I would normally give to the people that do my job and it's also I consider this session really important because just by show of hands who here classifieds themselves as a server admin who would like to be a server admin ok who's here because it's a good opportunity for an after lunch nap good nobody put their hands up so mainly the reason why I see this topic is being super important is the fact that you all have likely invested a large proportion of your time and money in tableau server and it's best to get the most out of that investment as possible so a lot of the teaching that will be here today will fit that gap to make sure that you know at least roughly the levers that you can pull and tweak to make sure that you're getting the most out of tableau server and understanding what's going on I've made a few assumptions so like I said for most people in the room I'm targeting you as if you're a server admin if you're not and that freaks you out a little bit don't worry there are plenty of jokes you should also classify yourself as intermediate I don't know what that means nobody seems to know what that means all that animation wasn't supposed to happen but anyway you classify yourself as intermediate and that can fit a wide range of people that have used tableau server for years and years and people that have never touched it so I'm going to try and ensure that there's enough learning to hit everybody's needs that is my challenge and mainly I hope you're interested in learning and enjoyed potentially relevant memes and pictures because there will be plenty of them number one tableau server tableau server is a black box installation so there it is floating through space the clouds tableau server has many different components and up until today's keynote if you were listening to France well talk about the repository changes we tend to not let you customize it with your own parts your own bits of hardware so you have to almost trust us that it's going to do what we say it's going to do which can be a very scary thing for anybody it can leave you feeling a bit like this what's in the box what's in the box you'll find out but first 2017 I went to find some relevant things that happened in 2017 look it up nothing good happened so I'd refused to put them on the slide but you will see a picture of tableau server tablet server in 2017 looks like this it's quite nice there are a couple of processes it's it's kind of familiar for those of you that have been around that long seeing tablet server like this makes me feel pretty happy I'm comfortable with all the components and then let's fast forward to 2019 and it looks a bit like this that list has expanded massively we've included new processes we've introduced new top new features that's the word we've got new features and also we've split things out into sub components so that allows you to monitor exactly what's going on but it means you've got to keep more things in mind can leave you feeling a bit like this so if we look at what happened in 2018 we saw the cluster controller TSM was introduced the licensing service comes up as its own component now 2019 we have micro service containers you've got prep conductor ask data the elastic server this list is growing right there's a lot to keep in mind so today I'm going to be taking you through these processes and introducing you the purpose of these processes why are they there in the first place why should you even care going to point out any recent changes as well as things that really you should just know we'll we'll follow that set structure this whole time and possibly the most important for all of you today is what can you do to ensure that you can optimize and get the most out of these processes the larger you grow the more complex your environment the more you have to keep these things in mind and most importantly I want you to feel like happy Drake all over again right tableau server again this time with a deep dive of processes there in the screen this is going to be our guiding light so as I said we're going to cover what what does it do how does it do it and the best way to do this in my mind is to go through this from the eyes of a user journey so when a user comes to tableau server and they start interactive with things they're going to start lighting up certain processes as they do those things so I'm to introduce you to our first user our user is no stranger to love in fact he knows the rules and so do i but does he know tableau server yes our user is Rick and Rick's going to go to tableau server and the very first thing he hit is going to be the gateway so the Gateway what does it do when a user comes to tablet servers have just said the way that they communicate with the server is actually through HTTP requests whether that's from the web browser from tablet desktop when they're communicating and hitting the server well they're actually doing is sending HTTP traffic requests that are hitting the server and telling it what to do with that request so the first things first is they're going to hit the Gateway and the gateways going to route on to where it should go next so the gateways the thing that's basically going to divert all the traffic where it needs to go next up if you have a multi node environment if you have a tableau server cluster what the Gateway is going to do is as soon as that traffic hits that server it's going to determine where it should go next so go to the Gateway and it's going to go ok this is going to go to node 1 this goes to node 2 and the way that it does that is in a round robin fashion so it's going to hit it's going to go 1 then 2 then 1 then 2 if you've got 3 1 2 3 it's going to do it in that order up until the point that that node gets overloaded and also importantly if you have a custom logo or some static files it's actually the gateway that's going to serve up those static files to the user within the browser so again multi node clusters if you have multiple gateways just ensure that you have your custom logo file in the same place on each node to make sure that they get that same custom logo each time now what should you be aware of if you've been around tableau server for a while and again if you have multi node clusters you've probably heard the recommendation to have an external load balancer of sorts it's important to note that an external load balancer doesn't remove the need for multiple different gateways because the Gateway will still determine where those requests go to after it hits that service so user comes to the server they log in it's the gateway that's determining what the next step is going to be regardless of where it sits on the notes but an external load balancer ensures that if one of those nodes go down then it still communicates the traffic to a Gateway process to then go on so if you're looking for a highly available tableau server then this is what you'll need to keep in mind ensure that that load balancer is there and importantly just again what should you be aware of the Gateway is required if you have this QL or this portal also known as application server on one of the nodes so there is our gateway if you're interested in optimizing this further then as I said I'm a bit of a log Whisperer I tend to look for information there there's a whole bunch of rich information that lies there for those of you that are a bit more pro you can actually look into the log files and you can see the exact requests that are sent to the Gateway so if for example your users are getting an error reaching the Gateway that'll be captured in those log files or if you're noticing that there's all you think that there's a network issue for example likely you can find that out in the logs of the Gateway one down Rick's now got to tableau server he's now communicating with tableau server he's there the next thing he needs to do is log in and the process that manages the log in is the application server or visit Alice we just had another name for it so visit all slash application server what does it do what should you be aware of the first thing it does is authentication and authorization so when Rick goes to that server he hits the gateway the next thing the gateways going to do is go I need to check if you're allowed to even be here that's going to send it to the application server it's going to get his username from the repository and then say okay Rick you exist are you allowed to access the server if you're using local it will be in the repository have you using something like Active Directory it's the application server that's going to go hey Active Directory is Rick even allowed to be here that's going to say yes and we're going to proceed the other important point is when as a user you're there looking at that web UI that interface seeing all the different projects it's the application server that's rendering these images for your user so again any issue it's any issues and just rendering that potentially there could be an issue with the application server so again it manages all of this web UI rendering but also it handles the publishing process so if you're a user that publishes data sources workbooks any of these things then when you go through that publish process it's the application server that's going to go okay I need to divide this into the XML the raw code behind it and then figure out the actual flat files and then store them in the correct place that's going to be managing all of that side now what should you be aware of first things first the default is one per node you could have two but really that's only to try and provide additional redundancy within the same node so again if you're looking to ensure that nothing gets broken nothing gets interrupted then maybe have two but otherwise one is fine it's actually not resource intensive so you shouldn't ever see the application server spiking in terms of the amount of requests or capturing all your CPU memory but if that does happen then chances are you're potentially using an incredibly heavy REST API or tab tab command requests so interesting story a customer of mine that I've worked with quite closely was using tab command to generate over 10,000 images so once a day would go run that script download 10,000 images that capture of every single screen and sheet that starts putting intensive resources on the application server but notably it'll probably hit other things first before that even starts taking up resources right so he's gone to the gateway he's logged in he's got his credentials he's allowed to proceed the next thing Rick needs to do is find content for him to access and as the name implies search and browse is the part of tableau server that will do that search and browse you can think of as being almost your tableau servers personal librarian so anytime you're searching for content and that top bar and you type something in its search and browse that's going to go oh you've searched for sales here are some things relating to sales search and browse is the index behind that as well so it's going to ensure based on things within the repository that the for example the content that's viewed the most comes to the top of that result so we ensure within tableau server that the content people find is the most relevant as possible so it's the search and browse service that manages that also worth noting that it's not required to be in the same place as the repository even though it communicates with the repository quite a bit it doesn't need to be in the same place but it does need to be in the same place as the Gateway and application server and I'll show later on why that could be important for some configurations so I mentioned repository a few times now so it's probably best that we now cover this up Rick's logged on to the server he's accessing he's communicating he search for some content he's found it now let's actually access that content and he's going to use the repository to do that so the repository is actually in in my minds a hugely powerful piece of tableau server that is sometimes underused by people that have it sat there the repository is actually a source of tons of applique data so again possibly too much technical information but worth over sharing the Nunda sharing in these scenarios anytime you have a workbook the code behind that workbook is actually XML a tableau workbook is written in XML language when you publish that workbook or data source to tableau server the XML code will actually go and sit in the repository itself so it gets stored as part of the XML code and sits there waiting to be called upon the moment you need to access that workbook there are positives also the source of really rich usage data there are tons of examples out there on tableau public i've got personal views that i like to use when i'm trying to judge how performant a server is but it's the repository that goes through and tells you things like how long do requests times take how what's you can look at what's the 90th percentile of requests is that too high are we spiking too often how much content do I have what permissions on this content do I have all of that is stored within the repository and just to reiterate that so not only can you see which permissions users have been granted by accessing the tableau repository but it also stores things like wit groups the users belong to and are they even allowed to access content in the first place all of that is the source of this information again what should you be aware of the default for the repository is just one repository SAP somewhere on the server again for highly available or I suppose lack of redonk or increasing redundancy on a tablet server and then you'd likely have two repositories and that's it you can't have more it's just one active one passive repository it's not resource intensive so likely the repository is never going to be a point in which your tablet server is going to drain resources from it writes things to it reads things to it but it's never really intensive tablet also automatically streams data from the primary to the other active to the passive repository and it just does that continuously all the time so in the event that the node with the active repository falls over then the proper the secondary repository will then take place words need to say words right an example of how you can use the repository to better your tableau server you have default admin views on your server and actually there's a session tomorrow dedicated to how can you get the most out of tableau server reporting but you can also customize them so this is a great example of something we use internally to judge how well something's performing so right here down here in the bottom left for those of you on this side it's all the way that way you can see what we're looking for here is basically big red bubbles because big red bubbles in this case indicate that there's lots of requests that are taking a long time to a particular piece of content and the less red bubbles the better but more importantly you can say things like well actually this takes on average 24 seconds in this particular set of data based on a ton of hits which is interesting that's not too bad but it's when you start going okay but the 95th percentile is actually a minute so this is a good example of where we're identifying that there were issues here that we needed resolving so that custom admin views and like I said there's a session tomorrow that is focused solely around this this topic I really highly recommend it right so Rick's gone through he's got his content we're ready to go let's start looking at the data server so once we've got this content the next thing we need to do is connect to the data behind it and it's data server that does that data server is core and key to governance within a platform when you're looking at extracts and live connections to a server on tableau server that is data server in the background there's only a need to ever have two data servers survey you don't need more than that but what it lets you do is first of all have multiple connections to the same workbook in the same data source so it handles all the parallel connections to content it also acts as a proxy to those published data sources so ensuring that all that effectively when you're dragging and dropping things around it's making sure that that's sent in the language that the database understands it also enforces permissions so whether or not you even allowed to see that data in the first place its data server that's just going to ensure that you can do the thing that you're trying to do and lastly it also centralizes the driver deployment so for example if you're using Hadoop based data source and you're using Apache drill let's say then it's going to ensure that once you've installed the Apache drill driver then that's accessible across the server for everybody to use to communicate with that data source data server does all of these hugely important things that ensure that you're governing your server correctly another session to call out specifically around this is that there's a session again tomorrow called optimizing your data sources for enterprising data strategy and that's going to be about 2:30 to 3:30 p.m. tomorrow recommend looking at that if you want to have a deeper dive into what data server is actually doing behind the scenes right so we've connected to our data source we've got and establish that connection the next thing is the data engine it's time to start querying this data source and the query engine for us for those of you that don't recognize it go it's hyper so hyper is our German element of tableau server it's the thing that's really cool but most importantly hyper so again for those of you that don't know we acquired it out of Munich roughly about two and a half years ago now and it what it does is it does things like creates extracts queries them and operates query duration so when you have cross database joins it performs that locally within a hyper extract but the important thing to note about hyper is it's fast it's very very fast the way that it queries things is incredibly fast but what it also does is consumes as much CPU and memory as it can get ahold of so who here has upgraded to a pre hyper version to a hyper version ok couple of people did you notice resource queries went up or spikes went up few nods that was expected it's it freaks a lot of people out but that was fully completely expected the way that hyper works is it tries to get a hold of all of that CPU memory that it can get a hold of to ensure that its operating and processing those queries as quickly as it possibly can it's ensuring that that can be as fast and performant as possible so it comes to the next question which is ok fantastic we've upgraded to hyper what happens if now we've noticed that we have extracts and some of the queries on those extracts take quite a long time to process upwards of let's say 5 seconds a query we've actually introduced a new deployment methodology type called hyper on a standalone service which I will be referring to as Hoss because that is a long sentence there is documentation online which causes a call they're optimized for query heavy environments but again Hasse just sticks and rolls off the tongue that's something about it and the way that Hass works is effectively as the name implies is that you have the data engine in its own place it sits on its own node ready to compute extracts the moment they're accessed it looks a bit like this and you'll notice that the file store is with it but then there's also data engine on the other nodes a bit I suppose learning on this point so one of the things that the data engine does as I said does it manages query Federation which means that if your workbooks have multiple connections to different data sources so some two sequels some two again Apache drill and Excel the way that it manages is that is it will bring that in memory to manage that operation at runtime so the data engine needs to stay on those other nodes to work with this QL but ensuring that the data engine adjust file store live on two and three in this scenario means that any queried or any extracts that are queried will be queried locally so what should you be aware of with this first things first hyper is as I said built to consume as much CPU of memory as it can get ahold of that's the way that it queries things fast it's what it's built to do but it also scales up incredibly well so if you're interested in using hots then potentially consider putting it on larger nodes or the larger the node then the faster it'll run those queries one thing that you can actually do is again if you're a bit sad like me and like to look at log files then you'll notice that within the hyper log files there's a flag the spooling flag and what that indicates is that whenever that spooling flag is true the Pipers run out of memory and has to start writing to disk so if you like looking at log files you can actually use that to say okay it looks like this flag is hit consistently then likely we're running out of memory and therefore we need to put more memory on our server it's a nice way to make data-driven decisions with tableau server you only ever need one data engine per node you never need more than that it recovers itself that's the way it's built to handle but it has to exist in the same place as background to create extracts it's QL to handle federated queries application server data server and file store just so it can do its job right plenty there so far we're starting to light up the board as we go through and go through this user journey so now we've started querying data the next thing to point out that I've that I mentioned is the file store so as we saw before file store lived in the same place as that data rendered node so what does it actually do why is it even there let's take this this user scenario forward and we've just got some content a data source that we're going to publish to tableau server what file server is going to do is take that file and when you publish it to the server it's going to store it effectively no surprises there it is a file store but importantly what it does is especially in highly available deployments where you have multiple file stores it's going to sync across those different file stores so it's ensuring that those extract those files live across the nodes and are ready to be accessed in the event that something happens so again in fictional scenario that middle node goes down at least those files the same identical files in the other file stores ready to be queried if you're like some of my customers and you've moved to a cloud environment and you've started trying to spin up and decommission nodes on a sometimes a couple of twice let's say twice a day basis the important thing that you need to do is ensure that as part of that process you decommission a file store this is hugely important because in the event that data sources are midway through streaming the last thing you want to do is can that and then you've lost content it needs to ensure that all the files exist everywhere and then you can remove it ok so we've gone through our data journey Rick's most of the way through his server processes at this point and now it's time for him to load up his favorite dashboard and to do that he's going to use a little bit of help from vis QL when we talk about vids QL it would almost be rude to not mention these to people this is Pat Hanrahan and Krystal T they're the inventors the creators of visca well and for those of you that are unaware this QL is a revolutionary visual language to database language its vids QL that as you drag and drop as you have your dashboards those reports that are down the server its vids QL that's turning all those visuals into some type of database language to then be processed to then turn it back into some type of visual in other words its vid QL that turns all those nice green and blue pills into sequel code for example or Apache code or Oracle code v ql speaks multiple database languages fluently and it's better than it and it than most of us sadly but this QL is that piece of the puzzle that just makes all the things the pretty things that all of us here care about but also with that in mind that dashboard that's in the back there this is something that as part of 2019 point two I'm actually going to be putting on tableau public as soon as I get some data that I can share that's sensible and what this actually does is looks within the vistur lock so each one of those points for example in the top there is a query that was processed by tableau server you can also split this out to see what's the complexity level of a workbook how many requests was it sending and how popular is that content I'm going to struggle to manage do it from here but the reason why it's 20 19.2 and not earlier it's because it uses hide containers because they're awesome and then you can change what you're looking at from an axis and then go back so I look forward to seeing that on public soon if you don't follow me on Twitter that's where it's going to hit first right what should you be aware of with this QL with vids QL typically it's 2 nodes 2 video I'll processes per node weirdly there's only one there it's probably going to pop up later each process can roughly handle a throughput of 50 active users at any given time so importantly throughput is the key word here because the more vids QL processes you have the more throughput you get there are the notes there were the things that were supposed to come up so it can actually be increased and decrease that count of visnu l can be increased and decreased without having to restart tableau server so if you're interested in looking at the impact of changing the number of processes to see what more throughput would give you then you can do so by increasing the number of processes without having to interrupt anything else that's going on at the same time but as I said more vids ql processes means more throughput does that mean more speed yes no few nods few shakes of the head it doesn't necessarily mean more speed it might mean in the event that you have lots and lots of users accessing tableau server having more nodes may allow you to handle that those processes better but it doesn't mean that it's going to handle them faster it just means that they're going to not be queued as much it does again also require data engine to be installed in the same node again that's just for the query Federation element that we were talking about before again pretty dashboard you can do so much with the information that lies within vids queue up and just for an example difficult to see for those of you at the back lots of zooming in works let's try that again yeah again still very difficult to see but there's a K value called QP batch for example and QP batch records every batch set of queries that vidiq ul sent in order to be processed so you can actually go through and see every single query that was compiled and then how long it took to process and which dashboard caused that query to happen so again some very neat stuff that you can do using this QL so Rick's loaded up his dashboard it performed in less than three seconds he's super happy he thanks his server admin for that and then he goes actually I think that this bit of content this dashboard that we were just looking at is relevant for one of his colleagues so he reaches out to his colleague bono and says hey you two should check this out I'm sorry so to do that now that Bono's access the same dashboard he's actually going to hit the cache server this is where the cache server comes into play the cache server handles these requests on behalf of the user so for example when they navigate to the same report the same dashboard if that's already been accessed then the queries that were will use their images the tiles that were used to form that dashboard all of that stored in the cache server so they can actually access and preload some of that that content importantly that also works and brings true for data server so if your web editing and people drag-and-drop the same bit of content in or the same fields in then they're going to also hit the cache server same for application server so all those images all the folders all of that stuff is cached pre cached ready for people to quick load the images on server and again background there works in the same way so as much as it can be that's cached it can leverage important things to know about cache server is its single-threaded so similar to visit ql the more you have the more throughput that's available but it's not CPU intensive so again it's never going to cause resource issues on your server it only relies on memory but an important thing that we found through testing on tableau online is that the default is to but for truly truly large environments for those of you in this room that have upwards of let's say 10 to 20,000 users you'll never really need more than six so it's the way that it handles and cycles cached it does it very well and we'd never found a need for more than six even at six you'll never use more than you actually need it's also important the location we found isn't important so again for those of you that have been around tableau for a while if you're looking at older diagrams you'll typically see cache in a one-to-one ratio with this QL server so be in the same place and have the same number of processors we found that that's actually not required it's not really true if you want to put the six cache server all in the same place on the same place as your background er for example that's absolutely fine it performs in exactly the same way but also something else you should know let's take our fictional users reckon bono and they're accessing this beautiful dashboard from Skylar Johnson showing Spotify his top tracks in 2016 bono goes there first Rick goes there second Rick hits the cache server because Bono's already loaded it fantastic but what happens if we've got row-level security acting on this dashboard any guesses we're going to bypass that cache the whole reason why is let's say in this fictional scenario Rick's got a Joe Moore as pop Bono has a drummer of rock that he's allowed to see or Irish rock because there's no way there really truly Rock importantly when they go to that dashboard to ensure that no way possible can they see the same data can data leak out then they're going to bypass things like the query cache it just won't load for them so important bit on this topic if this is something that you're interested in and how to better optimize there is a session again too called can't touch this on row-level security and tableau again highly recommended attending that just specifically around this topic very quickly one of the things that you could potentially do in this scenario if you want to ensure that there's a cache ready for users to access is to use for example tab command to pre generate or pre load a dashboard to ensure that that's hit first so then acting as each user then they've already got a pre-loaded individual view and hit the cache right so we have our content we've queried some data sources we've loaded some views and it's starting to look quite blinding with all these lights and with that scenario done we can also say farewell to bono so for those of you with eagle eyes you would have noticed that the dashboard that we were looking at before was related to 2016 so we haven't run that extract in quite some time now so it's probably do a refresh and the process that's going to do that we've mentioned it several times it's been there in the background hint is the background err now the background is responsible for multiple things things like extracts so creating those extracts it's actually responsible for prep flows so so for those of you starting to play around with tableau prep it's the background err that's going to run those tableau prep flows anything like subscription so all those user subscriptions where they've subscribed to their most wanted content they're their top views background is going to handle the process of triggering those subscriptions to be sent out and also it handles alerts to so if you have an alert set a visual alert to say send me a notification when this exceeds whatever value the background of that's going to periodically check that to ensure that that threshold has been set a bit of a note on that one point if when you're using alerts you set a threshold like once anytime it's true I new have a live connection to that dashboard it's going to do that roughly every 15 minutes to check to see if that criteria has been met if you're using an extract then it's going to run immediately once that extract is finished so in this scenario the back render creates your new extract the extracts finished it then will send or check the alert and then send the alert if it's happened that's a lot to remember so just be aware that it's all of those things that are task menu within the tableau server all of the things that are listed within that task menu are run and handled by the background err so a few things that you should know the default for background is actually two per node and then when you think about all the things that it's actually doing to per node isn't a huge amount similar to visit QL and it sorry I forgot so I'll actually come back to that point I'm not just going to let it sit in the ether it can actually be increased and decreased without restarting tableau server so for those of you that are sat there thinking oh that's quite interesting what this opens up is because you can increase and decrease the number of this QL and background of nodes if or processes sorry if you're finding that for example you have lots of users accessing views during the day and lots of extracts running overnight then you can actually increase the number of background is during the night decrease the number of visca well and then reverse that in the morning so that ensures that people are having their request processed as they're accessing views and then background the tasks are handling all those extracts at the same time similar to vids QL the more background processes means more throughput it can handle more tasks at once so it can handle more extract refreshes without queuing those tasks it doesn't necessarily mean more speed it just means more things can happen at any given time it also requires the data engine to be on the same node again the reason why is it's the data engine that's actually doing all that querying and building but it's the background er that just handles that task and this point probably the most important for those of you that are fairly new to tableau server management the background is actually one of those tasks that can be very highly CPU disk i/o RAM intensive it handles a lot of things at any given point in time what typically happens and what we've seen happen is users like extracts extracts are fast so of course they like them they then publish those extracts to the server set it on a schedule you all of a sudden have tons of schedules running lots of extracts and then the background err gets quite intensive what you can do to ensure that this doesn't impact those users loading content at the same time is isolated if you have a query heavy environment that uses lots of extracts and you just have lots and lots of extracts then isolate the background err so what this looks like is this is our typical default one node out of the box nothing special tableau server configuration if you drop the background err from that first one what it allows you to do is put it on its own node and increase the number of background errs running on that node it's actually interesting when we look at default number of recommended processes here so the default recommendation is actually it's it's a n over two if you're on a single node it's n over four but by itself it's n over two and being the number of cores but actually what I've found is going as much as n minus two so if you have 16 cores you can go as many as fourteen backgrounder processes on that single node and you'll actually not be impacted it'll still handle that throughput and still handle those tasks so isolating background err is a good solid trick for ensuring that your content doesn't interfere with each other now with the background in mind I mentioned that the background of runs prep so what does the prep conductor do let's find out so with twenty nineteen point one we introduced prep conductor as an option to tableau server and what prep conductor allows you to do is publish flows to tableau server you can see here here's an example of a flow published a tableau server it shows you what those steps are doing it shows you the tasks that have recently happened where the outputs are going it's prep conductor that's managing all of those bits and pieces prep conductors responsible for running that flow it checks to see if connection credentials are met so again ensuring that you can actually run that flow in the first place and tracked the history and importantly send you an alert if that flow fails for whatever reason so if all of a sudden the credentials have changed as have expired prep flow let you know it's failed and why that's happened but again hang on a second we've just gone okay so prep conductors there but where does the background oh come into play let's go to our isolated background a step that we were just looking at what actually happens here is the moment that that flow needs to be run the background is going to manage that process and what you're actually able to do within a tableau server environment is classify whether a background process is allowed to run prep flows or not allowed to run prep flows or if it just wants to handle extracts you can say it should do just in nothing but handle extracts now a recommendation this is still a growing area we're still figuring out the impact my personal recommendations through my own testing is that if you're using tableau prep tableau prep is more intensive than creating an extract that's quite natural it has a lot of transformation steps it needs to do quite a lot of things in order to create that output and sometimes there are multiple outputs as part of that flow too so if you're using tableau prep it's generally a good idea to also isolate preps backgrounders which means we can start doing something like this let's now take this as an example in this fictional tablet server configuration what we've actually done is we've ensured that prep flows are isolated they're not going to impact anything else on tableau server and also we've got lots of extracts so we've also isolated the background err if you've got lots and lots of things that are extract heavy and prep flow heavy then it's probably a sensible idea to ensure that they're separated out like this and you can handle them individually so a few things to know about tableau prep conductor as I said prep flows are more complex than extracts that's shouldn't necessarily be surprising just because of all the transformation steps they're doing and because of that just to reiterate that point isolation is highly recommended in the event that you're using tableau prep you can also use TSM commands to set whether or not a background err process handles flows or doesn't touch them entirely for example these here this is all documented by the way and so don't worry about keeping track of this but importantly you can basically say this should handle flows or no flows or both depending on the amount of extracts and flows you have in your environment we are almost there we've got all the processes starting to light up and the big one glaring right there in the middle is our lovely new ask data you saw that demoed on stage y-fronts while before and now rick is going to go and ask a few questions of his favorite data source ask data is actually split into two components the first if you're looking at a list of processes is something called elastic search when you go to any data source ask data will start kicking in it'll start indexing that data source to figure out the metadata behind it to understand what you're actually going to start asking an elastic search is going to handle that process it stores the metadata to figure out what type of fields are there what's the most common what's the highest result the minimum result and it stores all of that good information ready to be queried the next if you're looking at a list of processes on tableau server is the ask data process itself the ant's data process is the bit that turns that user typing refugio fronts war in his case spelling mistakes turns those good spelling mistakes into a bit of understandable query to be sent to the data source but we're not actually quite done once you've asked your question so for example what are they or what's the clothes price per day filter stock name so it contains tableau once that sense it's turned into whatever code sits behind that data source and you guess that we get our data engine involved again so it's the data engine that's going to handle that query that the arts data server creates processes it and then returns the results again using this QL to generate the line so actually ask data leverages and uses all the bits of tableau technology that's already there on tableau server already so a few things that you should note if you're starting to use ours data any data source is set by default to something called triggered by user request which means that once a data source is published to tableau server or if it's already there a user can start asking questions in ask data it's they're ready to use you can actually set this to be automatic so all data sources if the moment you turn it to automatic it starts refreshing the metadata or you can turn it off which is if you have a huge data source that in no way do you want people asking questions and you can also disable it fantastic news it also has a set heap size for the elasticsearch service so like I said all that metadata all that useful information that's going to be leveraged when you start asking questions you can actually increase that size of memory it provisions if you find that you have lots of data sources using ask data and again it uses and leverages the data engine to query those data sources so scaling is going to actually dependent be dependent on everything else anyway it's not going to be necessarily more demanding than loading a workbook if anything potentially less because of the things that you're likely doing right so we've gone through this user journey all of those lights at the top and now lit up fantastic news at this point we can let Rick go or never gonna give him up but we will let him down and we can introduce ourselves to our final 1990s pop star which is pirate Madonna pirate Madonna is our server admin virgin and she's about to touch tableau for the very first time and the two things that she really needs to be cautious of are aware of is the cluster controller and TSM so these two bits that we're yet to touch now let's break all of these things down there are six key components that fit into the boxes the bottom at the bottom of the screen and they are the cluster controller licensing server the client client file service coordination service admin service and finally the tableau services manager each one has its own key part to play in the management and governance of a tableau server process the first thing is the cluster controller so whenever you're looking within your tableau server status tab it's the cluster controller that's reporting to you to say hey this is healthy hey this has gone down cluster controller manages and monitors all of those services to make sure that they're working if for example as we discussed before your repository goes down it's actually the cluster controller that's going to trigger that automatic failover to take place so it gives it a second to see if it's going to come back it's determined that it's definitely gone down right our passive repository our secondary repository kicks in now the licensing service is the next key thing to be aware of the licensing service lives only on the initial node so for those of you that have used tablets for a long time you'll notice that there's no concept of a primary node anymore that's gone but there's still an initial node which is literally the node that's installed first on that node is the licensing server what the licensing server does is every 72 hours goes I need to do a check to make sure that this tableau server is licensed that the license key that we have is valid and ready to use so it communicates with that licensing server and reports that check again key thing to know is in the event that let's say that initial node goes down if there's no way to recover that initial node first things first you can recover the initial node and things are fine but if there's no way to recover it within roughly a 72 hour window then likely you'll have to move that licensing server onto a new node to ensure that everything remains licensed the client file service the next one or CFS for sure for those of you with super good vision I'm not one of them you can notice there's a little Kerberos icon on that certificate because the client file service manages things like certificates for the server so if you're using single sign-on or Kerberos then all of those Kerberos certificates are managed by the client file service important thing with client file service is install it everywhere where the coordination services which brings us on to our next process the coordination service is actually the single is the source of truth for quorum so if you're setting up a tableau server cluster that needs to be highly available you should have at any given time three or five depending on how big your tableau server services Coordination service ensembles what they do is determine which one acts as the leader in the point of any decision so in the event that something goes down they can still determine who makes the correct decision the coordination service ensures that quorum is met and your server is highly available next up is the admin agent the admin agent is the bit that handles all those TSM requests that changes the things like that hot topology that we were talking about before or if you add a new node to the cluster it's that admin agent that ensures that those services are those configurations are replicated checked and happen within your service it will actually install itself on every single node so it's always there you don't have to worry about it not being there and the very last point within this little section of our tableau server is the tableau Services Manager or TSM for short so for those of you that are on Windows deployments we introduced this as part of twenty eighteen point two now everybody has TSM before that it was only available on Linux and TSM is that bit where you can remotely connect and manage your tableau server from wherever you are so long as you can access that TSM component again it automatically installs itself wherever on across each node so it's always available and again it's really unique in terms of we call it admin at the beach so if you're on the beach and you really want to add a new node to your cluster then you absolutely can without sacrificing that tan that I don't get right so we have covered absolutely every single process in a very short amount of time I've thrown information at you if you're really interested in learning more about this I've given you a few sessions that are available later on today and tomorrow but there's also this workbook available on tableau public what this lets you do is it's within our official tableau software public page you can say what type of workflow do I want to see from our tableau server interface so for example this was a user logging in with Active Directory and then you can flick through and see all the different processes that's going to hit as part of that journey so as I said before if we're logging in with a dear you can see it hits the application server ensures that it gets his username from the repository go speaks to the ad unit comes back and then they can log in so you can go through those user journeys and see that come to life and then a quick recap on the most important things that I think will you'll take away from an optimization in configuration standpoint is this is the default tableau server installation for a single node this is the default if you have more users then you have two options you can scale up the number of processes you get more throughput and you can better leverage the resources that you have on that machine if you have if you're starting to hit and use up all that machine capacity then you can scale out and just handle just more users in general if you've found that you have lots of queries that are query intensive then you can go and use haas right at the very end and isolate the data engine aspect if you have lots of extracts you can isolate the background err specifically for querying extracts and if you have lots of query flows then you can also isolate the prep aspect of background or - or if you have lots of everything then you can isolate absolutely everything scale everything up out what's your budget let's talk about that separately final few points so we've had our 1990s pop stars take us through the journey but mainly that user journey point covers everything from visit or tour viddy well server all of those things that we saw in the top half of the screen our server admin will cover the six components that can be wrapped up into tableau server services and hopefully you have all taken a peek inside the box and can be much happier with the way that tableau server is set up squeeze it in with a minute to go thank you very much for taking part today thank you for being a great audience and I'll be around for questions you
Info
Channel: Tableau Software
Views: 12,231
Rating: 4.9748425 out of 5
Keywords: Conceptual, Tableau Server
Id: s33OJeovr3Q
Channel Id: undefined
Length: 59min 35sec (3575 seconds)
Published: Thu Jun 20 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.