Get started with InfluxDB and Python on Linux

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hi everyone! Welcome to another video. This time I'll be talking about InfluxDB on Linux and using it from Python. InfluxDB is a time-series database. This type of database is designed for storing time interval based data. With that, I mean log collection, tracking information, things even like stock market data, but also a typical IoT use case like storing information read from sensors and much more. All of these have one thing in common, that is that time is a key factor and you typically do not change values that you measured in the past. So in a time-series database, like InfluxDB, you read the data, you store it and you read it back for analysis later. But you do not update it. You could store exactly the same information in a relational database. Only a TSDB or time-series database is better at it. It's optimized for it and, not only it's faster, it also provides you very handy and user-friendly features that save you a lot of time. Like trending, aggregation, continuous queries and much more. In this video my goal is to get you started with InfluxDB, which is the most popular time-series database on Linux. We'll start by installing it, creating a database and then I'll try to explain you the concepts using examples. After that, I will show you how to read and write data using both the InfluxDB client and with Python. Before we get started, I'd like to mention that I also have a blog: jensd.be, where you can find all the information that I will share in this video and more. So if you want to go through this yourself, I really recommend you to have a look there. Also, if you're interested in this and similar content, don't hesitate to subscribe to my channel. As a starting point, I'll begin with a minimal Debian 10 installation. I tested the same commands on a fresh Ubuntu installation as well, so the choice is yours. Installing InfluxDB is easy as it's part of the Debian and Ubuntu repositories. We can simply install it with the package manager. At the same time, let's also install a CLI client and the Python libraries to work with influxDB. Now that InfluxDB is installed we can start it using systemd. If you want you can also enable the service to start at boot time. Let's do a quick test now to see if we are still on track here. We can launch the InfluxDB client which we also installed with apt and run a simple query. That seems to be working but also shows that we have some more work to do. As you saw, we didn't need any authentication to access the database. So let's at least enable authentication. Before we do so, we need to make sure that once it's enabled we still have access by creating an admin user with all privileges. Now that's been taken care of, we can enable authentication in /etc/influxdb/influxdb.conf. We can do so by setting "auth-enabled" to true and save the file. To get these changes in effect, we need to restart the InfluxDB service. Let's see now if this got changed. The same test which we did before is no longer working, but when we provide the username and password, we see things function as intended. Now before we can continue, and i will try to keep this to a minimum, we cannot avoid taking a quick dive in some InfluxDB concepts. Let's start with some sample data as follows. As you can see this data looks very similar, if not the same, as you would see with a classic relational database. We can see a table here which has several columns and rows. A whole database would contain multiple of these tables. In InfluxDB, the column's power_in, power_out and sensor are either called fields or tags. These are a combination of a key, which you find in the top row, for example power_in, and the value the actual data in a column, for example 133, 1567 and so on. From this table we do not see any difference between a field or a tag. The difference is related to indexing and performance. A tag is indexed, a field is not. So if you need to filter or refer to data often based on a certain column, it's better to define it as a tag and the other way around. For example, if we often need the values for power_in per sensor it would make sense to define power_in as a field key and sensor as a tag key. This would optimize performance as a query would probably look like this: In this query we can see that I refer to the table called power_info. In InfluxDB terminology this is called a measurement. The first column for a measurement: time, which contains a timestamp for the rest of the fields and tags, is present in each table. Tt is the base of a time-series database. Rows in the table are called points. A point represents a single row in the table. Series, which is the last concept I will go into, is a combination of a measurement, tag and field key. If we look at the sample data we could have the following series: As you can see, especially if you already have some experience with a classic relational database, this is not too difficult to grasp. It should be quite straightforward to apply these concepts once you start to use your InfluxDB instance. Now that you're a bit familiar with the concepts and terminology, let's put this into practice. First thing we need to do, is to make sure that we create a database. That database will contain our measurements. We also need to make sure that we can access it in a secure way. Let's create a database called "energy". Once created, we can list the existing databases to see if it worked. At this point we can access this database only with our admin user, but it's probably not a good idea to use that account with all privileges . So let's create a dedicated user that has access to only that database. First, with CREATE USER, we create a regular non-admin user. Then we can set the read and write permissions for this user on the newly created database. Writing data or points to InfluxDB typically happens using the line protocol. Unlike a regular RDBMS, it's not needed to create a SQL-like query for that. The line protocol syntax looks as follows: We start with the measurement, a comma and then we have the tag set. This is the tag_key and tag_value for it. Followed by a space and a field set, so the field_key and its field_value. If we match this with the sample measurement which we discussed before, we get the following: power_info is the measurement, the tag set consists of the tag_key sensor with the tag_value motor1 The field set contains the field_key's power_in and power_out with the respective field values. The tag set is optional and so is the timestamp. In most cases you want InfluxDB to handle the time stamp on its own, which is one of the nice advantages of using a time-series database. Between tag set and field set, there is a white space. The same goes between field set and timestamp. So in case you would not use any tag sets, the line would look as follows: We have our database created, have a user that can access it and now we know how to format data to write to it, so let's give this a try. With the InfluxDB client, I will insert a single point. The one which I explained just before this. Now if we want to read back the data, we can use a SQL-like query like this. You can see how this simplifies things a lot. We do not need to specify a timestamp. InfluxDB took care of that and, even more special, we did not have to predefine a structure for the measurement. It got created while we executed the query, on the spot. To show the existing structure in a database, which is really handy if you're exploring an unknown database or you forgot how things look like, you can use the following... Although this worked very well, it's not common to use the InfluxDB client to write data to the database. In practice it is used for importing large blocks of data or to load JSON, CSV or regular text files containing lines with data. Much more common, is to write data coming from a script, through the HTTP API or using a different trigger. When we installed InfluxDB at the beginning of the video, we also installed the necessary Python libraries to access InfluxDB. Here is a small script to write some data, the same data as we wrote using the InfluxDB client. We first need to import the library, then we can connect to InfluxDB using the hostname, port and credentials. Next we need to construct the line to insert to InfluxDB, followed by the line of code that performs that insert. If we have a look after executing the sample script, using the InfluxDB client... ...we can see that this point has been added as well to our measurement. In case you want to read this back using Python as well, we can use the following code: Here we execute the same query, then list out the points in the measurement If we test this we can see the exact same data. That should be enough to get you started. There are a lot more possibilities to query and write to InfluxDB but hopefully this basic information will help you to get results quickly. Thanks a lot for watching! As mentioned, you can find everything that i covered in this video on my blog: jensd.be, for which you can find a link in the description. If you like this video, please put a thumbs up and if you are interested in this and similar content don't hesitate to subscribe to my channel. Thanks again and i hope to see you back here soon!

Info

Channel: jensd_be

Views: 1,068

Rating: 5 out of 5

Keywords:

Id: CdorS9UgRk4

Channel Id: undefined

Length: 10min 11sec (611 seconds)

Published: Wed Apr 28 2021