Hi everyone! Welcome to another video. This time I'll be talking about InfluxDB
on Linux and using it from Python. InfluxDB is a time-series database. This type of database is designed
for storing time interval based data. With that, I mean log collection, tracking
information, things even like stock market data, but also a typical IoT use case like storing
information read from sensors and much more. All of these have one thing in common,
that is that time is a key factor and you typically do not change values
that you measured in the past. So in a time-series database, like InfluxDB, you read the data, you store it and
you read it back for analysis later. But you do not update it. You could store exactly the same
information in a relational database. Only a TSDB or time-series
database is better at it. It's optimized for it and, not only it's faster, it also provides you very handy and user-friendly
features that save you a lot of time. Like trending, aggregation,
continuous queries and much more. In this video my goal is to
get you started with InfluxDB, which is the most popular
time-series database on Linux. We'll start by installing it, creating a database and then I'll try to explain
you the concepts using examples. After that, I will show you
how to read and write data using both the InfluxDB client and with Python. Before we get started, I'd like to
mention that I also have a blog: jensd.be, where you can find all the information
that I will share in this video and more. So if you want to go through this yourself,
I really recommend you to have a look there. Also, if you're interested
in this and similar content, don't hesitate to subscribe to my channel. As a starting point, I'll begin with
a minimal Debian 10 installation. I tested the same commands on a fresh Ubuntu
installation as well, so the choice is yours. Installing InfluxDB is easy as it's part
of the Debian and Ubuntu repositories. We can simply install it with the package manager. At the same time, let's also install a CLI client
and the Python libraries to work with influxDB. Now that InfluxDB is installed
we can start it using systemd. If you want you can also enable
the service to start at boot time. Let's do a quick test now to see
if we are still on track here. We can launch the InfluxDB client which we
also installed with apt and run a simple query. That seems to be working but also shows
that we have some more work to do. As you saw, we didn't need any
authentication to access the database. So let's at least enable authentication. Before we do so, we need to make
sure that once it's enabled we still have access by creating an
admin user with all privileges. Now that's been taken care of, we can enable
authentication in /etc/influxdb/influxdb.conf. We can do so by setting "auth-enabled"
to true and save the file. To get these changes in effect, we
need to restart the InfluxDB service. Let's see now if this got changed. The same test which we did
before is no longer working, but when we provide the username and
password, we see things function as intended. Now before we can continue, and i
will try to keep this to a minimum, we cannot avoid taking a quick
dive in some InfluxDB concepts. Let's start with some sample data as follows. As you can see this data looks
very similar, if not the same, as you would see with a
classic relational database. We can see a table here which
has several columns and rows. A whole database would contain
multiple of these tables. In InfluxDB, the column's power_in, power_out
and sensor are either called fields or tags. These are a combination of a key, which you
find in the top row, for example power_in, and the value the actual data in a
column, for example 133, 1567 and so on. From this table we do not see any
difference between a field or a tag. The difference is related
to indexing and performance. A tag is indexed, a field is not. So if you need to filter or refer to
data often based on a certain column, it's better to define it as a
tag and the other way around. For example, if we often need the
values for power_in per sensor it would make sense to define power_in
as a field key and sensor as a tag key. This would optimize performance as a
query would probably look like this: In this query we can see that I
refer to the table called power_info. In InfluxDB terminology this
is called a measurement. The first column for a measurement: time, which contains a timestamp for the rest of
the fields and tags, is present in each table. Tt is the base of a time-series database. Rows in the table are called points. A point represents a single row in the table. Series, which is the last concept I will
go into, is a combination of a measurement, tag and field key. If we look at the sample data we
could have the following series: As you can see, especially if
you already have some experience with a classic relational database,
this is not too difficult to grasp. It should be quite straightforward to apply these concepts once you start to
use your InfluxDB instance. Now that you're a bit familiar with the concepts
and terminology, let's put this into practice. First thing we need to do, is to
make sure that we create a database. That database will contain our measurements. We also need to make sure that
we can access it in a secure way. Let's create a database called "energy". Once created, we can list the existing
databases to see if it worked. At this point we can access this
database only with our admin user, but it's probably not a good idea to
use that account with all privileges . So let's create a dedicated user that
has access to only that database. First, with CREATE USER, we
create a regular non-admin user. Then we can set the read and write permissions
for this user on the newly created database. Writing data or points to InfluxDB
typically happens using the line protocol. Unlike a regular RDBMS, it's not needed
to create a SQL-like query for that. The line protocol syntax looks as follows: We start with the measurement, a
comma and then we have the tag set. This is the tag_key and tag_value for it. Followed by a space and a field set,
so the field_key and its field_value. If we match this with the sample measurement
which we discussed before, we get the following: power_info is the measurement, the tag set consists of the tag_key
sensor with the tag_value motor1 The field set contains the field_key's power_in
and power_out with the respective field values. The tag set is optional and so is the timestamp. In most cases you want InfluxDB to
handle the time stamp on its own, which is one of the nice advantages
of using a time-series database. Between tag set and field
set, there is a white space. The same goes between field set and timestamp. So in case you would not use any tag
sets, the line would look as follows: We have our database created,
have a user that can access it and now we know how to format data to
write to it, so let's give this a try. With the InfluxDB client, I
will insert a single point. The one which I explained just before this. Now if we want to read back the data,
we can use a SQL-like query like this. You can see how this simplifies things a lot. We do not need to specify a timestamp. InfluxDB took care of that and, even more special, we did not have to predefine a
structure for the measurement. It got created while we
executed the query, on the spot. To show the existing structure in a database,
which is really handy if you're exploring an unknown database or you forgot how things
look like, you can use the following... Although this worked very well, it's not common to use the InfluxDB
client to write data to the database. In practice it is used for importing
large blocks of data or to load JSON, CSV or regular text files
containing lines with data. Much more common, is to write
data coming from a script, through the HTTP API or using a different trigger. When we installed InfluxDB at the
beginning of the video, we also installed the necessary Python
libraries to access InfluxDB. Here is a small script to write some data, the
same data as we wrote using the InfluxDB client. We first need to import the library, then we can connect to InfluxDB using
the hostname, port and credentials. Next we need to construct the
line to insert to InfluxDB, followed by the line of code
that performs that insert. If we have a look after executing the
sample script, using the InfluxDB client... ...we can see that this point has
been added as well to our measurement. In case you want to read this back using
Python as well, we can use the following code: Here we execute the same query, then
list out the points in the measurement If we test this we can see the exact same data. That should be enough to get you started. There are a lot more possibilities to query
and write to InfluxDB but hopefully this basic information will help
you to get results quickly. Thanks a lot for watching! As mentioned, you can find everything that i
covered in this video on my blog: jensd.be, for which you can find a link in the description. If you like this video, please put a thumbs up and if you are interested in this and similar
content don't hesitate to subscribe to my channel. Thanks again and i hope to see you back here soon!