Today, we're going to install Prometheus,
Node Exporter, Pushgateway, and other monitoring components on Ubuntu. To visualize metrics, we will use Grafana. Also, I'll show you how to secure Prometheus
with username and password. Finally, we will install Alertmanager and
configure it to send notifications to the Slack channel. You can find all the commands that I run in
the video in the blog post. The link will be in the description. First of all, let create a dedicated Linux
user or sometimes called a system account for Prometheus. Having individual users for each service serves
two main purposes: It is a security measure to reduce the impact
in case of an incident with the service. It simplifies administration as it becomes
easier to track down what resources belong to which service. To create a system user or system account,
run the following command. --system - Will create a system account. We don't need a home directory for Prometheus
or any other system accounts in our case. --shell /bin/false - It prevents logging in
as a Prometheus user. Will create Prometheus user and a group with
the exact same name. Let's check the latest version of Prometheus
from the download page. You can use the curl or wget command to download
Prometheus. Then, we need to extract all Prometheus files
from the archive. Usually, you would have a disk mounted to
the data directory. Also, you need a folder for Prometheus configuration
files. Now, let's change the directory to Prometheus
and move some files. First of all, let's move the prometheus binary
and a promtool to the /usr/local/bin/. promtool is used to check configuration files and Prometheus
rules. Optionally, we can move console libraries
to the Prometheus configuration directory. Console templates allow for the creation of
arbitrary consoles using the Go templating language. You don't need to worry about it if you're
just getting started. Finally, let's move the example of the main
prometheus configuration file. To avoid permission issues, you need to set
correct ownership for the /etc/prometheus/ and data directory. You can delete the archive and a Prometheus
folder when you are done. Verify that you can execute the Prometheus
binary by running the following command. To get more information and configuration
options, run Prometheus help. We're going to use some of these options in
the service definition. We're going to use systemd, which is a system
and service manager for Linux operating systems. For that, we need to create a systemd unit
configuration file. Let's go over a few of the most important
options related to systemd and Prometheus. Restart - Configures whether the service shall
be restarted when the service process exits, is killed, or a timeout is reached. RestartSec - Configures the time to sleep
before restarting a service. User and Group - Are Linux user and a group
to start a Prometheus process. --config.file - Path to the main Prometheus
configuration file. --storage - Location to store Prometheus data. tne the address - Configure to listen on all
network interfaces. In some situations, you may have a proxy such
as nginx to redirect requests to Prometheus. In that case, you would configure Prometheus
to listen only on localhost. --web.enable-lifecycle -- Allows to manage
Prometheus, for example, to reload configuration without restarting the service. To automatically start the Prometheus after
reboot, run enable. Then just start the Prometheus. To check the status of Prometheus run following
command. Suppose you encounter any issues with Prometheus
or are unable to start it. The easiest way to find the problem is to
use the journalctl command and search for errors. Now we can try to access it via browser. I'm going to be using the IP address of the
Ubuntu server. You need to append port 9090 to the IP. If you go to targets, you should see only
one - Prometheus target. It scrapes itself every 15 seconds by default. Next, we're going to set up and configured
Node Exporter to collect Linux system metrics like CPU load and disk I/O. Node Exporter
will expose these as Prometheus-style metrics. Since the installation process is very similar,
I'm not going to cover it as deep as Prometheus. First, let's create a system user for Node
Exporter by running the following command. You can download Node Exporter from the same
page. Use wget command to download binary. Extract node exporter from the archive. Move binary to the /usr/local/bin. Then, clean up, delete node_exporter archive
and a folder. Verify that you can run the binary. Node Exporter has a lot of plugins that we
can enable. If you run Node Exporter help you will get
all the options. We're going to enable login controller, just
for the demo. Next, create similar systemd unit file. Replace Prometheus user and group to node_exporter,
and update ExecStart command. To automatically start the Node Exporter after
reboot, enable the service. Then start the Node Exporter. Check the status of Node Exporter with the
following command. If you have any issues, check logs with journalctl. At this point, we have only a single target
in our Prometheus. There are many different service discovery
mechanisms built into Prometheus. For example, Prometheus can dynamically discover
targets in AWS, GCP, and other clouds based on the labels. In the following tutorials, I'll give you
a few examples of deploying Prometheus in a cloud-specific environments. For this tutorial, let's keep it simple and
keep adding static targets. Also, I have a lesson on how to deploy and
manage Prometheus in the Kubernetes cluster. To create a static target, you need to add
job_name with static_configs. By default, Node Exporter will be exposed
on port 9100. Since we enabled lifecycle management via
API calls, we can reload Prometheus config without restarting the service and causing
the downtime. Before, restarting check if the config is
valid. Then, you can use a POST request to reload
the config. Now you should have a new target in the Prometheus. To visualize metrics we can use Grafana. There are many different data sources that
Grafana supports, one of them is Prometheus. First, let's make sure that all the dependencies
are installed. Next, add GPG key. Add this repository for stable releases. After you add the repository, update and install
Garafana. To automatically start the Grafana after reboot,
enable the service. Then start the Grafana. To check the status of Grafana, run the following
command. Open the browser and log in to the Grafana
using default credentials. The username is admin, and the password is
admin as well. When you log in for the first time, you get
the option to change the password. Let's use devops123 for the new password. To visualize metrics, you need to add a data
source first. Click Add data source and select Prometheus. For the URL, enter http://localhost:9090 and
click Save and test. You can see Data source is working. Usually, in production environments, you would
store all the configurations in Git. Let me show you another way to add a data
source as a code. Let's remove the data source from UI. Then, create a new datasources.yaml file. Optionally, you can make this data source
as a default one. Restart Grafana to reload the config. Go back to Grafana and refresh the page. You should see the Prometheus data source. We can import existing Grafana dashboards
or create your own. Let's create a simple graph. Go back to the Prometheus, and let's explore
what metrics we have. Start typing scrape_duration_seconds and click
Execute. This metric will show you the duration of
the scrape of each Prometheus target. At this point, we have node_exporter and prometheus
targets. We're going to use this metric to create a
simple graph in Grafana. Go to Grafana and click create Dashboard and
then add a new panel. Give a title Scrape Duration and paste scrape_duration_seconds
metric. You can also reduce the time interval to 1
hour. For the legend, we can use the job label and
for the unit - seconds. There are a lot of configuration parameters
that you can use. Let's keep it simple and click apply and save
dashboard as Prometheus. Since we already have Node Exporter, we can
import an open-source dashboard to visualize CPU, Memory, Network, and a bunch of other
metrics. You can search for node exporter on the Grafana
website. Copy 1860 ID to Clipboard. Now, in Grafana, you can click Import and
paste this ID. Then load the dashboard. Select Prometheus datasource and click import. You have all sorts of metrics here that come
from node exporter. Next component that I want to install is Pushgateway. The Pushgateway is a service that allows you
to push metrics from jobs that cannot be scrapped. For example, you can have Jenkins jobs or
some kind of cron jobs. You can't scrape them since they are running
for a limited time only. The installation process is very similar to
Prometheus and Node exporter. Create a dedicated user first. Download archive with Pushgateway. Extract all the files. Move pushgateway binary to to /usr/local/bin. Then, clean up. Check if Pushgateway can be executed. Also, you can get configuration options by
running help. Create a systemd service. Enable the service. And, start Pushgateway. Check the status. Pushgateway can be reachible on port 9091. Let's add Pushgateway as a target to Prometheus. Check Prometheus configuration. If it's valid, reload the config. Make sure that the target is up and healthy. To send metrics to the Pushgateway, you just
need to send a POST request to the following endpoint http://localhost:9091/metrics/job/backup. Where backup is an arbitrary name that will
show up as a label. Use curl and pipe the string with echo to
Pushgateway. Let's imagine that the Jenkins job that we
named backup took almost 16 seconds to complete. You can find this metric in Prometheus. Refresh the page and start typing jenkins_job_duration_seconds. When you install Prometheus, it will be open
to anyone who knows the endpoint. Fairly recently, Prometheus introduced a way
to add basic authentication to each HTTP request. Used to be you had to install a proxy such
as nginx at the front of Prometheus and configure basic auth there. Now you can use a built-in authentication
mechanism in the Prometheus itself. Let's install the python module to create
a hash of the password. Prometheus will not store your passwords;
it will compute the hash and compare it with the existing one for the given user. Now, create a simple script that will ask
for input and return the hash for the password. Run the script and enter devops123 for the
password. Copy this hash and create an additional Prometheus
configuration file. Now, we need to provide this config to the
Prometheus. Let's update the systemd service definition. Every time you update the systemd service,
you need to reload it. You also need to restart Prometheus. And check the status in case of an error. Now, we can test basic authentication. Go to Prometheus and reload the page. Enter your username and a password. If you go to the targets section, you will
see that the Prometheus target is down. Prometheus requires a username and password
to scrape itself as well. We also need to update the Grafana datasource
to provide a username and password. If you click test, you get an unauthorized
error. Let's update the datasource config for grafana
to include basic auth. Restart grafana. Next, let's update the Prometheus target to
include usermane and password. Check the Prometheus config and reload it. To reload you need to include username and
password. Test grafana datasource. And Verify that Prometheus target is up. To send alerts, we're going to use Alertmanager. It takes care of deduplicating, grouping,
and routing them to the correct receiver integration such as email, PagerDuty, or in our case Slack. You can set up multiple Alertmanagers to achieve
high availability. For this demo, I will install a single one. First, let's create a system user for Alertmanager. Then, download Alertmanager from the same
downloads page. Extract Alertmanager binary. For Alertmanager, we need storage. It is mandatory (it defaults to "data/") and
is used to store Alertmanager's notification states and silences. Without this state (or if you wipe it), Alertmanager
would not know across restarts what silences were created or what notifications were already
sent. Now, let's move Alermanager's binary to the
local bin and copy sample config. Remove downloaded archive and a folder. Check if we can run Alertmanager. You can also get help and all supported configuration
options by running Alertmanager help. Next is the systemd service definition. Enable alertmanager. Start Alertmanager. And check the status. Alertmanager will be exposed on port 9093. It's time to create a simple alert. In almost all Prometheus setups, you have
an alert that is always active. It is used to validate the monitoring system
itself. For example, it can be integrated with the
deadmanssnitch service. If something goes wrong with the Prometheus
or Alertmanager, you will get an emergency notification that your monitoring system is
down. It's a very useful service, especially in
production environments. Let's create alert but without integration
with DeadMansSnitch. You also need to update the Prometheus config
to specify the location of Alertmanager and specify the path to the new rule. It's always a good idea to check Prometheus
config before restarting. Now we have a new alert. Alertmanager can be configured to send emails,
can be integrated with PagerDuty and many other services. For this demo, I will integrate Alertmanager
with Slack. We going to create a slack channel where all
the alerts will be sent. Let's create alerts Slack channel. Next, create a new Slack app from scratch. Give it a name Prometheus and select a workspace. You can modify the app from the basic information
tap. Let's upload the Prometheus icon. Next, we need to enable incoming webhooks. Then add webhook
to the workspace. The last thing, we need to copy Webhook URL
and use it in Alertmanager config. Now, update alertmanager.yml config to include
a new route to send alerts to slack. Any alerts with label severity equal to warning
will be sent to slack. Restart alertmanager. Now we need to include batch-job-rules.yml
in Prometheus configuration. Create alert to test Slack integration. Add a new rule to Prometheus. Check the config and reload Prometheus. Trigger the alert by sending the new metric
to Prometheus Pushgateway. In a minute or so, you should get a message
in Slack. If we send a new metric with a duration of
less than 30 seconds, Prometheus will resolve the alert. If you are interested in deploying Prometheus
to Kubernetes, I have another video. Thank you for watching, and I'll see you there.