PostgreSQL is the new NoSQL by Quentin Adam

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone my name is Quentin Adam I have this weird Twitter handle which was available everywhere which was great and then someone stole me from Pokemon go and I was sad so if we you find someone with this under lamp and Pokemon go please bring me maybe you have seen me on YouTube because I made a lot of conference or video and maybe you don't know my actual work which is leading a company called clever Club you know clever class and we're a very simple service which as the developer you just write code get push and we deploy it monitor it update system dependency collects the log we launched what fails made the auto-scaling managed security network everything so it's really easy to use and it's available on our true European cloud so we don't give you data to and we also have something you can install on your own infrastructure or on OVH on Amazon or anything else so that's what really lets go to talk about data data becoming more and more big and few years ago everyone was talking about Big Data and basically the old word got it by not sequel and big data a lot of new database engine popped out and starting to be interesting and many developer just discovered that managing data were applying what we call the capturing everyday say when you store data in a distributed system you have to decide if you need consistency network partitioning management or availability and you pick you can pick two but not be three with is the the eternal debate of everyone in the distributed our we are speaking about okay can what we can do everywhere and when you choose a database you need to know if you need transaction if you need data consistency if you need schema maybe you want to add some constraints because you know many people would touch to the DB maybe you need to have an even flow like anything can be an interesting point to look around when you choose database middleware so when you discover the oldest real world everyone go and rush to the no sequel world it was so fun who have played with MongoDB here we were still MongoDB on prediction here fewer so we discover that storing data in a nas equal world while not so easy because storing everything like garbage was not so easily to request and even just put the data can be complicated and we rediscover the comfort and some the old database world like transaction or I am sure that the data is written on the disk you know this kind of stuff which is really useful when you've write data but when you try to request or analyze data it can become complicated we all know like ten years ago we all buys the MapReduce architecture system so who have written JavaScript with the MongoDB stuff like MapReduce stuff you know the JavaScript engine and we'll find it like it was so cool yeah to be real it was a little bit painful we will all by this then push the new aggregation framework and you can use also elastic search which is his own decent stuff way to request after basically requesting the data becomes a little bit more complicated at what we think and we all go back to SQL if you look around Cassandra launched sequel which is a basically SQL on top of Cassandra I do project who have SQL up like JavaScript framework popping every day so you have I've you have fennecs you have Impala you have many stuff and even a new kind of DB you have many DB will actually try to have SQL pattern like if you go into the time series database environment you will find that anyone tried to have a sql-like syntax because SQL is really really interesting and expressive way to say okay I want this kind of data while we were all running away in the in the not equal world someone was improving the old DB we know and working on that which was really great say do you know pass breast who don't know Postgres okay cool phosphorus is a very old project it started in the last centuries and at the beginning it's a project from Berkeley and it was called ingress and Postgres got mature iteration it basically means the point of the first open source database environment where my sequel got the the the material roll and it becomes more and more trendy these days if DB engine ranking is not so perfect what we can see here is there is more and more user of PostgreSQL this last few years and it becomes something very popular with a large number of contributor using it and various companies who help the development of progress and build stuff around that the for the old talk I will use an example which is a playlist management application where you have multiple back-end like geezers fortify YouTube I don't know what and the playlist history so the entities I will have to use is a user playlist and the track is story where I will put all strikes and link to the playlist if you look at it like every track is becoming from Deezer Spotify YouTube or anything else so data is not common to everyone so I will have to normalize it on some a or not and what its need is I will need schema-less stuff okay and apart from schema-less I will need what we call arbitrary data def it mean like in Gaeseong you can have a jeezum with some fields but something can be a JSON will can be a diverged is an object and another is an object it's called arbitrary data depth that we can have many children so the data is complex so and we know that all the data will be already in jeezum because we will request like at the iron we get we will not use no sequel database on this project for real reason because if I say to you okay I need a unique email for for every user it's called constraints and constraints in the gnostic where world can be very complicated to apply and you will have it on path of SQL if you try to do this kind of a request like which are the tracks played at not from 8 to 9 on weekdays which is a complex stuff it will be very complicated to process it tune a sequel because the complex data is very complicated to explore but if you try to add stuff like in Paris area which means that we will need to have zero reversing stuff on the IP of the user or anything like that or yes complex data is very hard all by mail user which will link the data to the people which is actually using relationship in the data store this is where sequel can be very interesting because we will need the actual ability to link data to each other and when you start to have older buy of pagination inside the UI it can be very very very difficult to have it in zone or sequel words but SQL have been made to display data to the user and the beginning SQL have been made to be used by the end user so you can actually sees data and and some sometimes and this is another advice when you when you choose a data sir sometimes you think about what will be the analytics of your data but you don't think of the due to the usage of displayed to the user so it can be interesting to store the data in multiple place for example of all your data in Postgres and stream it also in elastic search so you will aggregate your data and make your statistic and dashboard on elastic search and you will have your day-to-day usage on Postgres ok so we use Postgres on this project for a real good reason first will creat the tracks double so we say the tracks double as a UID at the primary key and we have a playlist ID as a UID which is linked to the you play this table and where the user Heidi we also link to the user trouble just to say it like quickly we use UID and no serial yeah why first because if you want to use serial and say something about your data you will actually leak some business information outside in the URL to the user which is not good if you using the the Terrier to con things many many people actually using the last ID to count how many stuff there is in the travel if you want the database explorer expand try to explore it all the data all the table will have an entity called - because you using the same ID on all the database which is not good if you made a mistake okay and to be honest I don't think it's a good idea that's a primary key is generated by the database I prefers the primary key is generated by the software and we push it to the database so when you actually create a user you create a new random UID and push it to the database with it because when you when you give it to the database to generate you have this kind of weird code where you insert and select last idiom lat it's not it's not proper your database is only storing the data it's not creating data globally this big problem of idea is put name on things which is a computer science problem and we have written an article a long time ago about that which is summarizing everything so you can grab it online and pass it to all your team because we won't kill the serial stuff so that's it so that being said we can create a tractable and we continue so we will use the track status and we have named the interval and the plate and the create so first you have to use enemy because enemy is very good for example when you create a database of user you have you can say this is a male or a female with a boolean which is absolutely not good you use a enim when you use ennum it will be semantic and more you can add more stuff later so use ennum and every time you put date inside the database use date with timestamp okay because also you will have some real problem so and the last field we create is called track data and we see it called GNB when we so it's called JSON B we say to Postgres we will put this on here because Postgres is about to process jason in passwords we have two type of jeezum you have design which is basically the text stored and it pass it every time in it and will you have jeezum B which is already passed to be simple so basically used easily never huge isn't simple okay if you you look at it in SQL you can actually get actual jism so there I put a string and I committed treason and as you can see it's actually real Jason so we have the tracks travel and you had a user ID but don't care when we request jeezum in Postgres you have several operator so you just have you payload it's a jism you add the arrow you'd get the key you want and you will get it and each time you need it you can use it like for example you can try the existence of something say you you add some jeezum you convert it to jeezum B and you use the operator which question mark to see if it's actually existing inside your database but each time you do that you can use SQL function so for example I can say I will take the P load I will get Z date put it inside a timestamp in a timestamp with a time zone type inside Postgres and execute date Frank ear to just keep it the month okay so what is the really interesting stuff here all the data you put inside the jason not like P load you cannot touch you can use it in all your SQL requests and actually made data science on top of it okay so you have plenty of capacity you can have stuff who determine the length of an array you can have map or any manipulation that are here you can find her you can use JSON pass which is the same thing like X pass in the XML to get the data I want you and have ready filters you have like everything so it's really interesting and you can use lots of interesting function of PostgreSQL on top of your data for example this is the temporal function of SQL would have plenty of function you can use on the data which is actually in tourism just take it pass it and use it directly in your SQL which lead you to use for example something like count in Nasik well it can be a really complicated to have a count there you just use SQL because you can have where clothes digging inside so jason comparing stuff and get you all the entities and just count it which will be really simple to do compare to write a complex season of the tibia the regression framework for example everything you do with a JSON can be put in N and X so for people who don't know what is in index is every time you start stuff in buzzer SQL and you know you will request often Leon you can create an index which is a kind of recalculation of all the data you will have to compare and everything you put inside the jism can be compared using an index there so it's really useful because it can be really fast on your application to use it and you have many tools who can suggest you the index just reading the log of the query you actually do on pass rest to be able to do that you will have to choose the database and the version of the database you use and to be honest the ORM you using can be frustrating and it will not be your best friend there because you going out of its leagues managing jeezum data is not what it does so sometimes it's not a good idea to use it and when you go on the full usage of PostgreSQL you will have to build your own SQL take the data extract it and managing it when you will do that many times you will discover that gdb C driver and especially the Postgres driver is not complete so it's some type of possibly return it cannot pass and it's it's kind of a problem so there is a workaround the workaround is not very nice but it works well so you just say ok everything inside my my select my query will be going to jism and you actually get all the data to get your table in jeezum and when you have it in jason you know you just add Jack's on JSON or any and other jeez and Pasir you already have and manage your data with that so it's it kind of switch off to problem of gtb/c driver and it makes you Hubble to reuse some card you already make for your happy eye so it's it's kind of good and if the oval is easy the cost of performance to the database is really thin because it's streaming it's a very very good function so it's not a big problem okay you have more onboard data and you crazy starting to slowing inside the inside pass press there is a good news is Postgres since a fashion 10 add a parallel queries so it splits the queries between multiple CPU and add multiple workers to work on the query you just have to just put it some some value which is nonzero to a max power worker and max power worker pair gather which means the number of worker you can use with only one connection to the DB and and when you use that you will be able to see if you query is actually working so so that it's like everything in SQL work when you will try to understand you type explain you query and then after you ask for the query plan and as you see you say like I will use for worker to do that and split it on parallel so you have nothing to do here just see you can split your query to parallel workers you can also add multiple extension to poster SQL especially one which is an ability to use CUDA and the GPU to do all the ash and scan operation I never use it it's called P system it's it's it seems very fun I want to use it one day but I never have the use case to use it but it seems to be a very interesting point if you have mutable - to do and for example if you're building a Bitcoin tracker it can be a good usage of stuff if you start to having more motor at half and want to expend on many more nodes and having more availability you need to go on many stuff so the first possibilities have master slave classic replication style so basically you have to split your requests between the variety and the vaster and they read on the of the slave or on the slaves all you can use a tool like pager pool look all your requests and make the split from himself but more you can use more replicas for specific purpose just explaining if you have a data scientist in your team and it makes very big SQL queries to your database very long very easy SQL queries you don't want it inside your master because it would slow down the production maybe kill the master nel you won't you don't want that so you will put a slave which will get all the data by streaming you don't need the BI to be up to date so you can say to the slave you don't need to be to be uptime all the time so the Master will not assure that it I've writen to the slave to say the query is okay because it's only a a mirror you know it's not mandatory to be there and you will just using this slave only for the bi going further the people that PostgreSQL of creating something we call the logic replication which is the ability to say I will create a publication which is this table on this table on this table and I will create a subscription on the other side on the slave which is get all these estable we'll be able to monitor so you can replicate only part of you DB and not the entire DB and as you know no PostgreSQL of role level access control so you can control the access in the data per row and not only per table and I think the publication the publication subscription model of replication will be able to use this model pretty shortly which is very exciting because you will be able to filter the data you stream to a replicate so maybe you can stream only the data for a partner or something like that you can also use what we call notify which is the ability to pass race to send you events so each time you notify stuff you will have this so it's kind of pep sub inside PostgreSQL to be honest I never used it and I don't think it's a good idea to use it but it can maybe be useful for some debug purpose also in the replication system know there is the point of quorum so you will be able to say not remote apply like if I won the transaction to be valid I need it to be a remote apply but you can say if there is only the first two nodes or any of this node will have the transaction I consider it valid okay so this is a way to manage the quorum system like in traditional no sequel stuff and you are the foreign that I have wrapper for in the tough wrapper is something very big in pathways so you have four internal wrapper from many things that various quality of code but the point is we swore in terms of wrapper you can creat a virtual table we actually take data inside another system another system can be Postgres so for example you can request a Postgres using Postgres and integrate parts of another database inside your own database so it can be very interesting too some database without merging it and maybe for migration on stuff like that can be very interesting but more if you use tabble shrouding to be honest I never use doubles running on a real life so what I will show you is an example coming from a blog post I code underneath so table sharing is the ability to say okay I have a travel partition it using one point and I create a sub double which is the actual table where the data is and when you're using this kind of table partition what's happening is if I if I just put some data in there and I have the data from temperature you will see that the data actually lay on two different travel if you go further and I know people actually do it if you use a foreign that are wrapper using Postgres and the table all shouting you can actually shard and split the table across multiple cluster you can build a cluster of Postgres with all the data not in the same place in the same time which is really enthusiastic just remember you which all the stuff is tab data based stability is drivin by the predictability so used get matrix learn and improv there is a very cool software for this called Pawa made the people of valuable I think it's a really good idea to use it and if you want to try clever cloud offer free plants of PostgreSQL so it just one click launch you say I want an add-on and I'm going on and after that you're Postgres will be automatically updated will be automatically monitor will be automatically backup like everything you're using just Postgres as a service and just it works and it's no available with PG 11.1 and the auto-update allow you to just push butter and say ok i will update my database so thank you for listening and i think i have the time for one or two questions thank you very much [Applause] [Music]
Info
Channel: Devoxx
Views: 17,188
Rating: 4.2861953 out of 5
Keywords: Devoxx, Devoxx2018
Id: t8-BQjWJFKw
Channel Id: undefined
Length: 29min 4sec (1744 seconds)
Published: Tue Nov 13 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.