Pipeline Monitoring | How to Start Monitoring Data Health in Palantir Foundry

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Music] reliable applications require reliable pipelines you may have built a production pipeline but how can you ensure that it continues to run smoothly and in line with your expectations without having to manually check it every day or every hour or worse well data health checks can automatically alert you when something goes wrong with your data set or your pipeline so there's no more waking up to an unexpected or unanticipated email from an end user who's looking at stale data because now you'll be immediately notified as to when and where something goes wrong when thinking about the health of a data set or pipeline there are many factors that come into play examples include did the recent build succeed when was the last time it was updated if it's an ontology backing data set you may also need to think about whether the sync succeeded and did it do so in an expected time frame and there are many many more now if we were to brute force this process we could do things such as manually inspecting the build history looking at the contents of the data set the dates etc but instead we can avoid this process entirely by relying on these checks automatically evaluating so let's jump onto foundry and have a look at what this looks like on one of the data sets that you may have seen in a previous video called flights now here we are looking at the flights data set on the left pane i can already see a section titled health checks from here i can see an overview of the health checks on this data set which include answers to some of the questions i asked a few minutes ago now i can already see that something is wrong indicated by the critical failure on the time since last updated check so for additional clarity let's click view details to see more about this from there we've been brought to the health tab on the data set which gives us a more detailed breakdown of the data sets health so now not only can i see the current health status but also indicators representing the historical health of this data set allowing for better traceability and root causing so normally a data set critically alerting after 133 days of staleness would lead me to drop absolutely everything to fix it but in this case we're just looking at a training pipeline that hasn't been modified in a while so i'll at least finish off this video before doing something about it but if we look into the configuration of this check i can actually see that this alert started firing after seven days in a moderate fashion and then it escalated to critical after nine days so if this was in fact a production pipeline we definitely would have done something about it by now so now that you know what a health check looks like let's add our own on a clean version of this data set here at the bottom we can see a health check helper which shows the different categories of checks that we can add to our data set let's begin by adding a job status check now a data sets job or build can fail for many reasons anywhere from computational memory and resource related problems to well poorly written code though i hope that's not the case for you for the most simple configuration we can just hit save straight away we could also add a description to help others understand this check in the future but now if the build fails the check will flare up let's assume that this is a pretty important pipeline so we'll definitely need to add some thresholds to determine critical urgency which will allow us to prioritize the different health checks when they fire for example if a build fails once it could be the result of a transient network failure but it also could mean that bad code has been merged so for this case we'll set it to escalate to critical after three attempts meaning that after the third job failure we will get a critical alert coming through our notification system now health checks can't do very much if no one's aware of them so one final thing we have to be careful about is who will get notified if this check fails and there are two main ways of doing this first you can subscribe to a health check here we can see this i symbol that says all failures which actually indicates that we're already subscribed to this alert and that's because if you create any health check yourself you're automatically subscribed now for the purposes of demonstration i'll unsubscribe and then re-subscribe when you're not subscribed to a health check you'll see the option to click watch and from here you can either subscribe to all of the failures or just the critical ones being subscribed means you get a foundry notification and an email when your check fails though technically you can modify your notifications in your account settings additionally if we go back to the configuration of this health check you may have seen the option to automatically create an issue when this check fails if we check this it will create an issue and assign it to a designated person or group you'll hear more about issue management in a later video so keep your eyes peeled so now we've added a check on this data set and i hope that gives you a better understanding of what health checks look like and how they can keep your production pipelines running nice and smoothly if you want to learn more about health checks check out the documentation on foundry otherwise stay tuned for more videos thanks for watching [Music] you

Info

Channel: Palantir Developers

Views: 7,551

Rating: undefined out of 5

Keywords:

Id: 8aBPbQgqU5U

Channel Id: undefined

Length: 5min 58sec (358 seconds)

Published: Thu Mar 10 2022