Joe Drumgoole - Introduction to Python and MongoDB

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon everybody as Deborah said my name is Joe Google I am director of developer advocacy at MongoDB and what that means to most of you people in the room is that I can give you money for your meetup events so if you follow me on Joe Dromgoole at Twitter or Jay from Google Twitter I'm happy to help fund meetup events across Europe especially around Python MongoDB doesn't have to feature but we like to talk a little bit about it and that's part of the package we do talks like this all over Europe and I did one at Europe life in a couple of years ago and this is just a basic introduction for people who are not familiar with MongoDB are not familiar with well I'm assuming something that they may already with Python and I've been using Python since 2006 actually build a business in 2006 SAS backup service built around Python and Django and I dabble in it I'm not an expert but I can show you some tricks with MongoDB that should make your life a bit easier so for those who aren't familiar with MongoDB it's a document store that means it stores Jason documents and you should be familiar with Jason if you've done any kind of programming at all it's one of the exon things about programming with Python and MongoDB is Jason documents look exactly like Python dictionaries and so there's a one to one equivalence between the types that you're going to use directly in Python and the objects that you can store straight into the MongoDB database this means there's no wrapping classes there's no extra code you can just use the Python objects directly and that turns out to be very straightforward if you were using C sharp or Java you'd have to put a wrapping object around it and that makes life just a little bit more noisy in terms of the cold base we get several benefits from nodejs but you know I I'm personal view nodejs is like that's just too crazy for me I can't work that stuff out at all callback so just that's where somebody else who's a better programmer than me so when you store Jason in a database it's not obviously pure textual J's that would be too expensive in terms of encoding and decoding you're not going to do that and of course you've got to stuff this stuff onto a wire and send it over a network to the database in the first place that's what the Python driver is designed to do so we actually encode it we encode type and length information so we understand things are strings or nested documents that things have a raised nest inside them that things are integers and we also understand geo spatial coordinates although I'm not going to demo geospatial queries today and the way it's stored is beasts and beasts and is our own standard beasts and spec it's an open standard you can contribute to it it's effectively in a binary encoding of the json representation so if I show you Jason hello world you can see there is a size at the start then there's a type field in this case it's a string so it's a two and then there's the field name and the field value hello and world and then it's terminated by a null at the end obviously these can get more complicated as you get nested documents and arrays and so on I just want you to understand that this is what's being sent across the world but that's the last time you need to worry about beasts and in your life as a Python programmer for you you're gonna be working with dictionaries arrays they're the key types you're going to use now when you download MongoDB yes I hope all of you will do you're just gonna install a de on your local desktop in a Windows it installs as a service on other linux's and variations like OSX you can just run it as you as yourself I'm actually running it on this desktop myself although it's a Windows desktop just so you can see the log files but the production deployment of MongoDB is as we call a replica set obviously a single node database like MongoDB if the node dies the data goes way and we don't have an independent log file on single no database is the way you do in relational database instead we keep whole replicas of the data and so when you build a replica set you build three instances a to be running in three separate nodes with three separate discs and then you join them together into a red I said I'm showing a replica set with three members there here you could have up to 50 members in a replica set again not many people do that that's a lot of nodes to manage rights go to the primary like the primary is designed to take the right operations you cannot write to a secondary but you'll see how that's managed in a couple of seconds once the rights are made to the primary that then effectively copied to the secondaries via an internal log called the art blog and the cluster the replica set manages all this for you all you've got to do is set it up and you'll see our MongoDB Atlas database in the cloud we'll do all this for you with one-click you don't have to do this if you want to set this up locally there is a Python package called M tools that will allow you to set up a complete replica set again with one line of code on the local desktop do using so it's all set up all the rights are going to the primary everything's fine but what happens if you have a failure well in normal operation the three nodes connect to each other using a heartbeat that tells each node the other nodes are alive and with replication streams from the primary to the secondary remember you read and write from the primary four consistent read and write activity if you don't mind a little bit of eventual read consistency you can choose to read from a secondary and of course this works out great if you're running a distributed cluster where you've got users in New York and London and Basel and you want to be able to eliminate that wide area loop when you're reading from data and and in those situations often it doesn't matter that you don't have the most up-to-date records so normal operation this all works away and again we all set this up and ran it manage it for you but imagine the virtual machine that's running the primary dies for some reason and we know nodes die that's just the life of an old somebody kills it it runs out of disk space it gets jammed because some process ones news is too much CPU well eventually the heartbeat that the other two nodes send is going to not get a response at that point remaining nodes and there must be a majority of the nodes remaining for this to happen we'll have an election it's like any election you can't have an election if you don't have a majority of people participating so the election effectively says whose node has the most up-to-date data and it uses for those of you are into this kind of thing a consensus algorithm which is a small variation on the raft consensus algorithm so they eventually decide and it depends on the size of the cluster and how many nodes it takes a couple of hundred milliseconds and eventually they will elect a new node and that node will spring to life as a new primary what happens to the clients while this is happening well the Python driver the client library that you're going to use collaborate to the cluster to ensure that no rights are lost so even if there's a write-in flight to the old primary and it dies the driver will restart that right automatically and recover and make that right idempotent Leon the new primary so there's no way to lose data if on the other hand you lost the whole cluster if your data center went on fire happens I know not very often but it does what I'll have eventually happen is your client will time out and get a server timeout that happens after 30 seconds by default and then you've got to do something else my database is down what do I do in that situation but with multiple nodes and the abilities of the primary to move to the nodes that are alive that's gonna happen much less often than in a single node database now those of you who've been watching astutely will realize that if all the reads and writes are going to a single node there's a point at which that node is going to saturate right recovery happens we're gonna go through that will it scale how does it jump from like a single no to scaling to millions of nodes scaling from millions of transactions millions of users well it can and we do with sharding effectively we run a partition of the data on multiple replicas sets and we use a separate set of demons called s's to route the reads and writes to those notes going into char is a whole talk in itself trust me it works but you don't have to trust me you can trust fortnight fortnight has 25 million users 10 million active users they run on a sharded MongoDB cluster an Atlas and trust me that's a high workload so enough of the slide where let's actually see it in action so I've got an eye Python here I actually wrote from daytime import daytime because every time I did this demo I forgot to import daytime and later on it would call something class so I want to actually connect to a database I've got a server running here so it's kind of shrunk down here so the database server running there it's kind of getting connections or whatever so I'm gonna do client oh I need to import the PI library for first so import PI and then I just need to make a client object the PI manga library handles connection pooling security encryption if you're using it and also encoding and decoding into bison that's why you don't need to worry about it so we do PI client and if you look at the client object it's actually pointing at localhost to 701 7 by default all clients all servers start on court to 701 7 so as long as you don't specify anything it's all gonna work right if you want to change that you can it's a - port argument now I still need to make a database so I'm gonna look at the client object and I'm just gonna make a database off the client this is the beauty of MongoDB it's very simple to spin up new databases so we're gonna make a database called test and then we're gonna make a collection think table when you think collection but instead of rows we're gonna have Jason documents so I'm gonna make a collection and we're gonna call that test as well because we just have no imagination in MongoDB and so if I look at the collection you'll see it's got a client and it's got a dict and it's got two things a database and a collection now inserting into a collection is just as easy as making a dictionary so I can do collection let's get that collection done insert one and I just put in a dictionary I'm gonna make an explicit dictionary here call it user name and username as Jo Drumgoole and close the Curly's and all we should do a demo in Python without ipython because it closes your kernels for you and it's gonna return a result object which basically says that's being inserted and we can actually look at that object we can do collection dot find one and we just do one because we know we only got one element in this collection and there will pop up and you can see user name done we'll but what's this strange ID this is added by the Python client library and it's created on the clients there is no round-trip to the server the object ID is your unique primary key for every object that you insert so I can redo this insert now and let's just do it again and if I do a fine rather I'm gonna find one because fine will return all the documents in the database I'm gonna get a cursor that's kind of annoying because I'm in the Python shell normally for other programmers they have to use the shell which is a node.js environment but because Python has his cool raffle and because Shane Harvey and Co are our developers have done such a great job of building the library we can we can look at this stuff we get a cursor so I mean I could do something really ugly like you know get the cursor and the cursor isn't iterable so I could do X dot next and you know you'd get the object like that's a drag who wants to do that so I wrote a python package just to make this stuff easier and called MongoDB shell now actually I want to get a particular object from this shell so I'm gonna do from MongoDB shell import MongoDB and this is like a super object that saves you some typing now let's get rid of those Curly's so now I'm gonna make a new client object called MongoDB and I can pass it in the database I want and the collection I want with the right quotes and now if I look at the C object it's kind of similar it's just a bit more legible it just shows you the URL and the to the database name and the collection name it doesn't put all the other craft in there and now I can do C dot find and bingo we get I've undone some inserts already we get the objects displaying so mangal shell effectively wraps the cursor and displays it out and it does pagination and organized this it adds those line numbers they can be turned off we're gonna play around with both of these in the future so that's fine in the shell but let's have a quick look at what an actual program looks like so here is the simplest Python program you can write for MongoDB it literally just pings the server with an is master command so I'm gonna run that and you'll see it'll just produce this job at this jacent document and it's basically systems information without the server these masters true local time etc etc etc so you just get a bunch of data about the server that's the simplest program you can write but what if I wanted to make a lot of documentation so I'm gonna build this other program and this is again let me just put this in a slightly more legible formats into a distraction-free mode so this is going to make a pile of documents so it's really imports the programs we get date time we got random we're gonna make random strings we're gonna make an article which just returns a document which eventually has a bunch of random fields in it an ID a title note that the ID field which we generate automatically previously can also be overwritten to insert your own unique ID because ID underscore ID is always indexed it means you can save yourself an index if you already have a unique ID for the database we're gonna make user which does the same thing and then we've got PI client we're making a database called EP 2 or 19 I'm gonna just change that because I don't want to overwrite the demo database I'm gonna use and we're gonna drop the users and collection articles and then we're just gonna insert them and we do something here that is an important performance improvement note here instead of being insert one we're building lists of users appending them here pending mate users to the articles and we're just inserting 500 at a time why do we do that well because each insert requires a Rhine trip to the you to the server and if you have to do one round-trip for every document that's going to take a long time with this model you can insert up to any number you want the nice thing about insert many is you can give it as long a list as you want and it will internally chunk it if it's exceeding the chunk size that it can use the default internally is about a thousand documents I'm setting it to 500 here so you can see feedback as you insert clearly if you're going to run an insect with a million documents you're gonna store your Python program for quite a while and there are async versions of this library which I again I'm not going to get into if you look at pymongo motor it allows you to do all this stuff and synchronously so having run this we're just gonna spin this up and put it back into non distraction mode and then we're gonna just run that again and we're gonna change it to many Docs and it's gonna chug away whacking those articles in and it'll Bale away doing that so I'm just gonna connect to a similar structure I've built already so let's just create a new client our new database articles which is off DB articles and we're gonna create users which is time for their users which is off again the database and its users and now of course if we do articles talk find we're gonna get a cursor so we're gonna let's just do that up particles off I'm gonna get it it's ctrl C out of that uncaps let's go back to that articles dot find we get our cursor back so we're gonna make our articles view using MongoDB shell and we're gonna make that a MongoDB and it's gonna be IP 2009 teen and the database is articles and the same thing with users and now we can do articles you don't find and will get piles of documents now that's insert that's query what about update how do we update an article well there's an update one article we can use so we can do articles dot update one and again we can look for an article because I know underscore ID and article 100 and I'm going to do an update operation and we're gonna do a dollar set and a dollar set just basically sets a field or if the field doesn't exist it pads it and we're gonna add a comments field and that's going to be an empty array I'm gonna close Curly's and again we've got this problem you can't see what we've inserted but that's okay for now we're not going to do too much about that we're gonna have a quick look at article view find one and we're gonna look ass underscore ID and we can see did I do the insert hmmm let's try that again so the articles are still on test that explains it articles equals d b EP 2019 articles okay just gotta set that database up again and then we will get two articles and then we can do articles dot update one and we want to pick our ID which is unique articles 100 close curly and then we're going to do the dollar set operation as before we're going to set the comment field to be an empty array close the Curly's close the other Curly's closing bracket I just need to okay let's just rename this article to articles view and make this articles the actual object on the dB and then we will rerun articles that way when we get a cursor back and then we can do articles view dot find one and will and underscore ID is article 100 thank you and you'll see the pole state they are the ways that come so the comment should be in a feel there I'm not gonna try and do this anymore so now we're going to what we do is we will then append to that comment right so we can say articles update one and we will open up curly again underscore ID articles 100 and now we're gonna do an a push which actually appends to the array dollar push and we would do open a comment and then we can just effectively specify another document here so it would be like username is Joe hello close those Curly's and now if you look at article articles of you want to make sure it's the right database article do you dot find [Music] you can look at the actual object we've created which is articles 100 something's wrong there when the school ID articles 100 that's okay let me just try okay I seem to mess something up there but effectively you'd get a push which adds a comment to the database and that will give you an update operation simple deletes are of course just articles delete one and you'd specify an object as an underscore ID and just articles 100 oops tied again articles and it would just go down the tree without reruns so what are we shown today well you've seen how to create your database in the collection you've seen how to read one and read many databases you find out how to insert a single document and insert many and also how to wind update those documents although the update demo didn't quite work out that's just my fat finger typing and I haven't shown you how to check performance and add indexes because we kind of ran out of time but I'm gonna show you one more thing you can build all of these clusters inside the cloud from MongoDB you don't have to build these clusters manually you can set this stuff up in cloud MongoDB comm you can set you can create your own plasters at any level and any scale it's pay-as-you-go you can turn the clusters off and save yourself money and if you use the code hack 100 you can go to open data or go to the top level organization when you create it and it starts you let down low once you have to go up to the top level to get to the billing tab you go into the billing tab you scroll down and you apply credit and put your hack 100 in there you get $100 a free credit to use MongoDB that's MongoDB in a nutshell free to use and the easiest way to use MongoDB in the world as with Python and PI ok thank you very much [Applause]
Info
Channel: EuroPython Conference
Views: 416
Rating: 5 out of 5
Keywords:
Id: VJ_d8jENWmo
Channel Id: undefined
Length: 29min 52sec (1792 seconds)
Published: Mon Sep 23 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.