Get started with InfluxDB and Python on Linux

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi everyone! Welcome to another video. This time I'll be talking about InfluxDB  on Linux and using it from Python. InfluxDB is a time-series database. This type of database is designed  for storing time interval based data. With that, I mean log collection, tracking  information, things even like stock market data,   but also a typical IoT use case like storing  information read from sensors and much more. All of these have one thing in common,  that is that time is a key factor and you   typically do not change values  that you measured in the past. So in a time-series database, like InfluxDB,   you read the data, you store it and  you read it back for analysis later. But you do not update it. You could store exactly the same  information in a relational database. Only a TSDB or time-series  database is better at it. It's optimized for it and, not only it's faster,   it also provides you very handy and user-friendly  features that save you a lot of time. Like trending, aggregation,  continuous queries and much more. In this video my goal is to  get you started with InfluxDB,   which is the most popular  time-series database on Linux. We'll start by installing it, creating a database   and then I'll try to explain  you the concepts using examples. After that, I will show you  how to read and write data   using both the InfluxDB client and with Python. Before we get started, I'd like to  mention that I also have a blog:   jensd.be, where you can find all the information  that I will share in this video and more. So if you want to go through this yourself,  I really recommend you to have a look there. Also, if you're interested  in this and similar content,   don't hesitate to subscribe to my channel. As a starting point, I'll begin with  a minimal Debian 10 installation. I tested the same commands on a fresh Ubuntu  installation as well, so the choice is yours. Installing InfluxDB is easy as it's part  of the Debian and Ubuntu repositories. We can simply install it with the package manager. At the same time, let's also install a CLI client  and the Python libraries to work with influxDB. Now that InfluxDB is installed  we can start it using systemd. If you want you can also enable  the service to start at boot time. Let's do a quick test now to see  if we are still on track here. We can launch the InfluxDB client which we  also installed with apt and run a simple query. That seems to be working but also shows  that we have some more work to do. As you saw, we didn't need any  authentication to access the database. So let's at least enable authentication. Before we do so, we need to make  sure that once it's enabled we   still have access by creating an  admin user with all privileges. Now that's been taken care of, we can enable  authentication in /etc/influxdb/influxdb.conf. We can do so by setting "auth-enabled"  to true and save the file. To get these changes in effect, we  need to restart the InfluxDB service. Let's see now if this got changed. The same test which we did  before is no longer working, but when we provide the username and  password, we see things function as intended. Now before we can continue, and i  will try to keep this to a minimum,   we cannot avoid taking a quick  dive in some InfluxDB concepts. Let's start with some sample data as follows. As you can see this data looks  very similar, if not the same,   as you would see with a  classic relational database. We can see a table here which  has several columns and rows. A whole database would contain  multiple of these tables. In InfluxDB, the column's power_in, power_out  and sensor are either called fields or tags. These are a combination of a key, which you  find in the top row, for example power_in,   and the value the actual data in a  column, for example 133, 1567 and so on. From this table we do not see any  difference between a field or a tag. The difference is related  to indexing and performance. A tag is indexed, a field is not. So if you need to filter or refer to  data often based on a certain column,   it's better to define it as a  tag and the other way around. For example, if we often need the  values for power_in per sensor   it would make sense to define power_in  as a field key and sensor as a tag key. This would optimize performance as a  query would probably look like this: In this query we can see that I  refer to the table called power_info. In InfluxDB terminology this  is called a measurement. The first column for a measurement: time,   which contains a timestamp for the rest of  the fields and tags, is present in each table. Tt is the base of a time-series database. Rows in the table are called points. A point represents a single row in the table. Series, which is the last concept I will  go into, is a combination of a measurement,   tag and field key. If we look at the sample data we  could have the following series: As you can see, especially if  you already have some experience   with a classic relational database,  this is not too difficult to grasp. It should be quite straightforward to apply these   concepts once you start to  use your InfluxDB instance. Now that you're a bit familiar with the concepts  and terminology, let's put this into practice. First thing we need to do, is to  make sure that we create a database. That database will contain our measurements. We also need to make sure that  we can access it in a secure way. Let's create a database called "energy". Once created, we can list the existing  databases to see if it worked. At this point we can access this  database only with our admin user,   but it's probably not a good idea to  use that account with all privileges . So let's create a dedicated user that  has access to only that database. First, with CREATE USER, we  create a regular non-admin user. Then we can set the read and write permissions  for this user on the newly created database. Writing data or points to InfluxDB  typically happens using the line protocol. Unlike a regular RDBMS, it's not needed  to create a SQL-like query for that. The line protocol syntax looks as follows: We start with the measurement, a  comma and then we have the tag set. This is the tag_key and tag_value for it. Followed by a space and a field set,  so the field_key and its field_value. If we match this with the sample measurement  which we discussed before, we get the following: power_info is the measurement,   the tag set consists of the tag_key  sensor with the tag_value motor1 The field set contains the field_key's power_in  and power_out with the respective field values. The tag set is optional and so is the timestamp. In most cases you want InfluxDB to  handle the time stamp on its own,   which is one of the nice advantages  of using a time-series database. Between tag set and field  set, there is a white space. The same goes between field set and timestamp. So in case you would not use any tag  sets, the line would look as follows: We have our database created,  have a user that can access it   and now we know how to format data to  write to it, so let's give this a try. With the InfluxDB client, I  will insert a single point. The one which I explained just before this. Now if we want to read back the data,  we can use a SQL-like query like this. You can see how this simplifies things a lot. We do not need to specify a timestamp. InfluxDB took care of that and, even more special,   we did not have to predefine a  structure for the measurement. It got created while we  executed the query, on the spot. To show the existing structure in a database,  which is really handy if you're exploring an   unknown database or you forgot how things  look like, you can use the following... Although this worked very well,   it's not common to use the InfluxDB  client to write data to the database. In practice it is used for importing  large blocks of data or to load JSON,   CSV or regular text files  containing lines with data. Much more common, is to write  data coming from a script,   through the HTTP API or using a different trigger. When we installed InfluxDB at the  beginning of the video, we also   installed the necessary Python  libraries to access InfluxDB. Here is a small script to write some data, the  same data as we wrote using the InfluxDB client. We first need to import the library,   then we can connect to InfluxDB using  the hostname, port and credentials. Next we need to construct the  line to insert to InfluxDB,   followed by the line of code  that performs that insert. If we have a look after executing the  sample script, using the InfluxDB client... ...we can see that this point has  been added as well to our measurement. In case you want to read this back using  Python as well, we can use the following code: Here we execute the same query, then  list out the points in the measurement If we test this we can see the exact same data. That should be enough to get you started. There are a lot more possibilities to query  and write to InfluxDB but hopefully this   basic information will help  you to get results quickly. Thanks a lot for watching! As mentioned, you can find everything that i  covered in this video on my blog: jensd.be,   for which you can find a link in the description. If you like this video, please put a thumbs up and if you are interested in this and similar  content don't hesitate to subscribe to my channel. Thanks again and i hope to see you back here soon!
Info
Channel: jensd_be
Views: 1,068
Rating: 5 out of 5
Keywords:
Id: CdorS9UgRk4
Channel Id: undefined
Length: 10min 11sec (611 seconds)
Published: Wed Apr 28 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.