ElixirConf 2017 - Thinking In Ecto - Darin Wilson

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] welcome good morning thank you all for coming there was a little bit of a room switcheroo so this talk is thinking in ecto if you're looking for the macros talk it's across the hall it's where I would be if I weren't talking right now so if you feel like that's where you need to be and you want to walk over there I will not be offended it's all right so echo echo is a database library for elixir and it ships by default with Phoenix so there's a pretty good chance you'll run into it at some point and in this talk I'm going to talk about some of the high-level concepts that I think are helpful as you're starting to wrap your head around the way ecto works these are the big ideas underlying Actos design at least from my perspective now I should clarify that when I say big ideas I'm not necessarily talking about original ideas or groundbreaking ideas ideas you've never heard before depending on your perspective you may not even think these are good ideas but these are in my mind the ideas that make ecto ecto and if you keep them in mind as you're starting to explore the API hopefully things will start to make a little more sense a little more quickly than not now I won't just live in the world of highfalutin high-level concepts the whole time I'll be illustrating with some actual code and if you're brand new to ecto you've never seen a line of ecto in your life or even if you have you may not Rock every single line in the presentation that's okay mostly what I'm hoping you'll take away are some higher level concepts and if every single line of code doesn't make sense hopefully it will once you actually start working with it so let's start with big idea number one the repository pattern so ecto follows the repository pact pattern for data access this might not be one that you've run into with the prevalence of object-oriented languages for the last several years or decades the active record pattern has become more popular for database libraries it shows up quite a lot the repository pattern is perhaps less well-known but if you've been working in active record for a long time you're switching to ecto this is the first thing that will start to mess with your head so the active record pattern sort of breaks down like this and by the way when I say the active record pattern I'm talking about the general purpose pattern of data of access that is called active record not specifically the active record library that ships with Rails which is based on the active record pattern that makes sense okay because I kind of forgot what I just said here's how the active record panner kind of manifests let's say we're building an app that is going to be a music database it'll track our music collections so we'll have artists albums tracks and whatnot we might fetch an artist with something like this we have our artist class or whatever happens to be we give it an ID we get a record back we change that record in memory and then we save it back to the database now this makes a lot of sense for object-oriented languages because of course objects have behavior it makes a little less sense for functional languages because data and behavior are separate the repository pattern has a different take the central idea is that there is some piece of code in your application the module a class an object whatever it happens to be and it alone knows how to talk to the database and so the rest of your application queues up operations and hands them off to the repository and the repository sends it across the wire to the database simply put if you want to talk to the database you talk to the repository and here's how it plays out in ecto same scenario so we have our repository module which by convention is usually called repo we call get we have an artist schema which we'll talk about in a minute give it an ID now if we want to change it there are a couple ways to do it but one typical way to do it is to use the change set struct to create a change set that models the change that we want to make so in this case we're changing the name and then we update and by handing this change set off to the repo for fulfillment so if you look at these two side-by-side so there's active record on the left there's the repository on the right one thing you notice is that an active record the communication with the database is a little bit hidden unless you really know the API you wouldn't necessarily know that that call there is sending some data across the wire whereas with ecto basically any time you see the word repo you're talking to the database so you can scan very quickly through the code and see exactly what's going on and when it's going on that brings us to big idea number two explicitness so just like the elixir language itself ecto favors explicitness and likes explicit behavior over implicit as a general rule ecto does not make a lot of decisions on your behalf that's not a strict rule as a general rule but for the most part it wants you to spell things out for it first place you run into by the way that podium is wobbly but doesn't seem to be tipping over anytime soon so every now and then I see eyebrows going I think you're all perfectly safe but if you want to move to the back that's okay first place you run into this is schema definitions so an ecto schemas are maps between your database tables and structs in your Electric Code now some libraries will figure this out for you they will introspect the database see what the fields are in each table and create some data structures for you to work with without you having to do anything ecto makes you spell it out and it looks like this so if we want to create a struct for our albums table we create a module called album we call use ecto schema to pull them the critical functionality and then we iterate through each of the fields so we call schema provide the table name which this case is albums and create a field for the title which is a string and a field for the release date which is a date and then there's a convenience function called time stamps which gives us created at and updated now once we've done that we have this struct available to us just as if we had called def struct using regular old elixir but this is a little bit special because it's got metadata that helps connect it to our albums table now this might seem like unnecessary work why should that you have to spell out all the fields in your schema when you know there are tools that out there will do it for you well this gives us some flexibility which we'll see see how that works a little bit later now the next place that explicitness can kind of bite you a little bit is working with associations so ecto supports this notion of associations creating connections between different database tables and their associated structs so we just defined an album struct kind of makes sense that we would have tracks as well so here's the schema for tracks there's a title which is a string the duration that's the length of the track we'll say that as an integer the number of seconds and then an index that indicates where and the album it is track 1 track 2 track 3 or whatever now we can connect the tracks to the album with the belongs to call so we're saying that a track record is always going to belong to some album and here's the name of the module where that's defined it assumes that in the database you have a foreign key called album ID but if your foreign keys name something else you can supply it and then in our albums schema we can make the Association the other way by saying this album has many tracks and again we supply the module name now how you use these associations is where you kind of need to be careful so let's say we've fetched an album and now we want to look at the tracks intuitively you would just say oh I know I just do album tracks but what you're actually going to get is this the dreaded aspecto Association not loaded now this is a placeholder value that tells us that the Associated tracks for this album haven't been loaded yet loading associated tables associated records from another table involves another round trip to the database we didn't tell to do that so it's not going to do that for us some libraries will do this for you if you made this call they would say oh look they asked for the tracks we don't have them I will go get them and then run to the database behind the scenes and bring the tracks toward you this is a feature called lazy loading it's very handy it's very nice it saves you some typing but there's a problem and it goes a little something like this let's say we grab a whole bunch of albums everything in our database and we want to iterate through them one by one and now we want to iterate through the tracks of each album let's assume for a moment this worked my neck though this won't work but we'll assume for a moment what it did each time we hit that particular line we have to make another round-trip to the database so we've got one album go get the tracks next time through we've got a new album get the tracks over and over again that's not a problem if you've got 50 albums if you've got 5,000 albums it's a very big problem indeed and it's got a name the n plus one query problem so named because n is the number of queries you have here in this case it's the number of albums plus 1 which is the album record that we did at the beginning this is a particularly nasty problem because this one sneaks up on you you won't run into it in development when you've got just a few records your Apple handled it fine it won't happen in the early days of production when you've got a few customers because again it'll handle it just fine the day it's going to hit you is the day that your viral campaign succeeds and suddenly you've got untold numbers of new users adding untold numbers of new records and suddenly and 9:00 p.m. on a Saturday night your app slows to a crawl and you don't know why because you didn't change any code but guess what your database changed so ecto solution to this problem is it does not support lazy looing at all when you want to load associated records you got to ask for it there's two ways to do it you can pipe your results to the repo preload function and say what the association is or if you're using the query syntax which we'll talk about in a second you can supply preload as an option and I'll do it for you in either case this will then work so this is a little more typing yes but in exchange for a little more typing you could potentially save some grief down the road all this explicitness we just talked about actually is there for your benefit eat your vegetables kids big idea number three operations as data structures this sounds a little abstract think back to the active record pattern that we looked at earlier in that case we may change it to the database by manipulating the in-memory representation we had of that record and then shooting it back to the database so those these objects that represent database records are the primary data structures that we work with now an ecto we certainly do work with our schema modules and the structs that we've created the artist the album the tracks what-have-you but the real action happens in data structures that model operations on those records one of those is query which we use to pull records from the database another one has changed set which we use to update records to the database and then multi for when we need to do more than one thing at the same time let's start with query so ecto uses macros to create a DSL for writing queries that feels a lot like sequel itself and I mean that in a good way here's a sequel query that fetches all of the albums by Miles Davis so we select from albums we join on artists then look at the artists name here's that same query and ecto as you can see a lot of the keywords the same and the structure is kind of the same so if you're familiar with sequel writing queries and ecto will feel pretty familiar now at this point the query is just a data structure nothing more it hasn't fetched anything not've it databases have been harmed in the making of this query nothing happens again until we hand it off to repo and then it gets fulfilled now because it's a structure that means that we can manipulate it and change it and build on it before we send it off to the database and that's possible because when we call the from macro to kick off a query we can provide the schema module we can provide the table name as a raw string if we want or we can provide another query in which case everything that we got from the first query will go into the second query and we can refine it so if we take our query that we had before fetching the album's of Miles Davis we can then create a new query and we say from a inquiry or previous one and now we're going to add some more parameters to it we're gonna filter further by looking for all of the album's that have tracks that are longer than 10 minutes we're looking for these self-indulgent albums by Miles Davis now from here you can see that it's pretty easy to make these queries a little more general-purpose by putting them into functions so taking this first query we could create a function that takes the name as a parameter and then generates a query for any artist similarly we could take the second half of the query parameterised the duration and put that into a function and the benefit of that is that we have these nice functions that give us building blocks that allow us to build up queries like this they make them nice and readable so breaking up queries into smaller pieces that get reused and put them together is a nice feature of the query data structure second big one is change set so change set is a struct and a module that handles the entire lifecycle of database updates and kind of in three main phases first is filtering and casting so we take a bunch of input from somewhere form submitted by the user an API call whatever it happens to be put them into the change set then we do validations to make sure everything is okay and then we actually hand the date change set off to repo for fulfillment and then we get a change set back if there were any errors and that changed that we get back has all the changes we made plus any error information that went along with it so the whole thing gets wrapped up in the change set and in code it sort of looks like this we import the change set we've got our params that came from somewhere we're creating a new track here so we've got a title we've got an index and a duration and some other nonsense we were maybe using for JavaScript who knows we create a chain set by kicking it off with the struct we want to use we call cast and we supply the params that we got from the user and we supply a list of the fields that we're interested in so we're saying okay from these params we want title index duration anything else just throw it out we don't care so at that point the Foo value is gone and now we can validate it and the change set module provides a bunch of Handy validation functions that do most of what you want but you can always write a custom on yourself and then we see if it works we hand it off to repo to insert and we get a tuple back if it's okay that means everything worked if we get an error then again we'll get the change set back and it'll have the errors for us now the interesting thing about this arrangement is that validations are part of the change set struct they're not part of the schema when we set up the schema we didn't say these are all the things that make a valid track record we put that off until this point where we're actually making changes and this is because validation requirements can change depending on how you're updating think about tracks for a second in the real world some tracks are parts of albums and some tracks are just singles that are released separately from an album so based on that we could create two different change set functions well we could create a change set function that pattern matches on the params that we got from the user if there was an album ID in there we can say alright this one's part of an album let's make sure that the album ID is valid let's make sure the index was supplied and that it's greater than zero but if the paramanandam idea we can say okay this one is just a single we don't need to worry about the album information just save the title so this allows for a much more granular approach to validation and can be really beneficial especially if you have different contexts in which you're saving things the last data structure I want to talk about is multi this is kind of a new one I think it came in Echo too and this is for when you mean to do one more than one operation at the same time so here we've got a change set that's going to make some edits to an album record it's all ready to go we're about ready to hand it off to the database but let's say that as part of this application we want to track that the edits that are made to the database and we want to credit the user that made them sort of Wikipedia style so in addition to updating the album we want to update the user record somehow and bump up their contribution count we want both of these operations to succeed or to fail and the database to be rolled back we wouldn't want the album change set to succeed and the user not get credit for it so the C easiest way to do that is with the transaction and this is sort of like a database transaction we call update bang on the album change set we call update bang on the user change set if either one of them fails we go back to where we were as if nothing ever happened in simple cases this works great and if this is all you need to do transaction is fine but you may want to take slightly different action depending on what which of these transaction fails it's very least you might want to just write something out to the console it says the album changed set failed or the user changed said field that's pretty simple even if you want to do just that your codes gonna look like this you've got to run the first change set see if that worked if it didn't print an error if it did try the second change set see if that worked and already this is like no fun to read on a slide in a presentation multi can help us out with this so using those taint same to change sets instead of going straight to a transaction we'll create a multi with the new command new function and then we start piping it into operations that we want to queue up so we want to say all right we want to run an update operation you got to give it a name each part of the multi needs to have a unique name so we'll use album for this one and we say the album change set we do the same thing for the user change set we call this one user we give it the user change and we're ready to go again at this point it's just a data structure nothing has happened in the database yet once we're ready to go we hand that off to repo transaction so transaction can take an anonymous function like we saw before or it can take a multi and we can capture the result now once we have the result we can pattern match on it to find out what happened if the first tuple is okay that means everything went through and all is right with the world but if the first part of the tuple is error we can then look at the second part of the tuple to see which one croaked that's what these original names are for so in that first arrow Clause we can see that the album changed set change set failed so we'll print that out or the user change that failed so already this cleans things up considerably it gets better you're not restricted to just database operations as part of a multi you can actually run any arbitrary piece of code in it let's say that we went a little further with our user tracking and once they reached a certain threshold me might want to award them a badge and send them an email saying thanks for being an awesome user we love you this might get all wrapped up in a function here which we call check user progress we can execute this function as part of the multi using the run function so now we have the same thing we did before but now we're saying okay if those updates succeed then execute this function and we use the capture operator there to indicate it and another nice thing about multis is they're very easy to test there's a two list function that will spit out a data structure that represents all the operations that you have queued up so one of the nice things about being a data structure is that you can compose it with different pieces that get reused like we could take that whole user tracking and updating and sending an email part move it off to a separate service module that gets reused throughout the application as you start putting those pieces together you want to make sure that your code is assembling the pieces correctly so you can introspect on a multi make sure everything is lined up the way you expect without actually having to send it to the database and that can speed up your test quite but so the important data structures query changeset multi get to know those that's where you'll be spending a lot of time big idea number for flexible schemas no earlier at the very beginning I mentioned that doing the hard work of mapping your schemas to your database table by hand would open up some options for you so we'll take a look at those now schemas an ecto are very very flexible and you can set them them up to do exactly what you need first thing to know about schemas is you don't actually need them if you don't want them here's a query that fetches an artist of Miles Davis you could rewrite this query like this instead of using the schema module in the first part of the query you can provide the table name as a string the only difference is now you have to provide a select option to say which fields you want back one of the benefits of working with schemas is that ecto will always fetch whatever fields you've defined in the schema when it does a query and pull some records back if you're bypassing using schemas you have to be specific about it and the results that will be slightly different each time in the first query you'll get an artist's truck back in the second instance you'll get a list of maps where each map represents a record that came back and keys in the map are the columns that you asked for in the Select and why would you want to do this well some queries don't really benefit from having schema think about reporting queries where you're gathering up a lot of statistics like in this particular case all we want are the names of all the artists in our database and the number of albums that they've created that we have access to now that doesn't really sound like any schema that we've defined we've got the artist name sure but we're not interested in any of the other artists properties and another number of albums just a count that's not something that's being persisted so we can write the whole query without the schema and get back exactly what we want which is just the artist name and the number of Records number of albums sorry schemas are supposed to make your life easier so if you're writing a query and you're finding yourself tripping over which schema should go where consider rewriting it without schemas it might be easier and you have that option and next thing to know about schemas is that we can bend them slightly when the need arises so think for a moment how we tend to work with our databases in the early days we start to figure out what our data is and because database tables can be kind of unforgiving we kind of start there and we design our database tables now we usually want some kind of representation of those records in our application code so we kind of end up usually creating data structures that pretty much mirror our database tables particularly if it's done for us automatically and then a lot of our tools tend to make it easiest to use those same data structures down in our view layer for example Phoenix forms work perfectly with change sets thanks to the Phoenix ecto package which bridges the gap you can hand a change set off to Phoenix form and you get lots of nice convenience functions so just kind of buying inertia it ends up being easiest to create forms that end up looking sort of more or less like our database tables however I don't think database schema definitions have ever won any design awards the alternative is to do a bunch of coding by hand but no one wants to do that so what are the alternatives well ecto gives us some alternatives think back to our tracks schema now this duration field is a little bit problematic we want to store it as an integer representing the number of seconds but that's almost never the way we see it in the real world when you see a track listing it's usually expressed as minutes in seconds as a string so nobody wants to deal with 205 what we usually see is 325 if we wanted to present this field as an integer field to the user they would have to do math and I think we all know that's not going to end well but at the same time scheme hasn't changed working with something like Phoenix forms we don't want to lose that so ecto has an out we can add another we're going to tweak this schema slightly we're going to add another field called duration string this will hold the duration as a string the way the user sees it 325 now we're also added the keyword virtual true this is our clued Actos saying I'm just using this for my own purposes don't try to persist this anywhere don't try to look for it in the albums table you won't find it or the tracks table I'm sorry now once we've got that set up we can present duration string as a form element to the user and create a change set just like we normally would we take the title take the index we've got the duration string we do some basic validations and now we're going to do a custom transformation we're going to convert that duration string to an integer and function might look something like this we take our change set we pull out this duration as a string using get field and then we take that and we convert it into an integer so we take the 325 the user gave us and we turn into 205 how you actually do that I leave to your imagination and then we call put change to take our new value stick it back into the change set but this time with the actual real value we're going to save duration and then back here once we've done that we can validate it to make sure it's all ok and then send it off to repo and we're good and no one's the wiser so we were able to adjust our schema slightly to make a better UI for the user but without losing all the advantages of using schemas and change sets together we can go even further in this direction what if I told you that schemas don't actually need database tables at all if you read this documentation for schema at the top it says schemas can be used to map any data source and somebody went to the trouble of italicize Amin it in that last example we were able to tweak the schema slightly to make it work better for a user interface but some models are a little bit more problematic than others think about artists artists are kind of tricky to model because in the real world they sold or fall into two categories on the one hand you've got bands the Beatles Depeche Mode the Shaggs and the like but you've also got solo artists who sound more like people Taylor swift Joan Jett William Shatner all your favorites and they have slightly similar properties but not quite exactly the same and in a perfect world this is how we'd present it to the user if it's a solo artist we'd give them three name fields so we can cover SIA as well as John Cougar Mellencamp and birth date and and a death date but if it's banned we just need one name field and bands don't really have a start date or an end date it's more like years active so we'd want to present it that way now in the world of relational databases it doesn't really make sense to have two separate tables here because basically fundamentally at an abstract level these are entities that produce albums and that's what we're interested in but real life is not quite so hard so how can we bridge this gap so first we'll make a schema for the artists table this is the one that will actually map to the database so we've got three named fields and we'll use start date and end date now it would be pretty indecorous to prompt a user for the name of their fate or for their artists start date or even worse for their end date that's not okay but in the cold hard abstract world of databases it's absolutely fine but now we're going to create a second schema and this will be our solo artist schema and this will look more like how we want a solo artist to look first name middle name last name birth date and a death date now noticed in this case we didn't call schema we called embedded schema so just like in the last example setting that field to virtual was our tip-off - ecto that this is not really getting persisted anywhere same thing here we're saying that this schema is not going into a database anywhere it's just a map that we're setting up for ourselves similarly we make a band schema and this one has simpler fields just a name your started and a year ended so how do we work with these so let's look at how we insert a band we'd create a change set function as we normally do that takes the parameters we cast them we validate them we make sure the years are okay make sure that your ended is not before the year started or something like that just as if it were a first-class citizen an ecto but we had another function called two artists and this takes a band struck and returns an actual artist struct that will save so we take the band name we put it in the name one field and for start date we create a date based on the year the band started and call it January first and for the end date we take the year that they ended and December 31st that's a reasonable compromise and to insert a new one of these we generate the change set with the prams just like we normally do check to see if the change set is valid if it is we can call apply changes on the change set and then what this will do is spit back the underlying band struck that's in the change set but with all the changes that the user-supplied applied to that struct so it's as if we had saved it to the database but without actually saving it to the database then we convert it to an artist record using our two artists and now finally we hand that off to repo so the flexibility of schemas allows us to break up this juggernaut and this tendency for our database tables to become our UI we can have all the benefits that come with working with schemas and change sets and how it's easy to work with in certain form and view libraries but we can shape our UIs the way that we want and still make it easy to save it back to the database without a lot of extra coding thank you ecto gives us these options and that brings us to the last one think of ecto is a set of tools and not so much a framework all the different pieces of ecto work together really nicely but how you use them is kind of up to you you can use schemas for all of your database records or don't skip it all together you can use change sets to do updates or you can go kind of low-level and use the update all function and repo to manage those changes a little more manually you can keep the bits that you want and throw out the rest it's not all projects are created the same and not all databases are created the same and the way you use ecto on one project might be very different than the way you use on another project it's got the power and flexibility behind it so you can craft the approach that you want there's no one right answer so five big ideas I hope to take away with you the repository pattern explicitness operations as data structures flexible schemas and again I always think of it as a set of tools before I wrap I want to say a couple of thanks first to my colleague Daniel compass who's a screencast on elixir and Phoenix and his blogpost really helped me get a good handle on this stuff and to Eric Meadows Johnson not just for creating ecto in the first place but for helping answer a lot of my stupid questions and speaking of eric he and I are collaborating on a book together called programming ecto it's coming soon from pragmatic bookshelf it hasn't been officially announced yet it was supposed to be sometime this week so y'all are getting the inside scoop but you can follow me and I'm sure I won't be shutting up about it once it's released so I'd like to thank you all for coming i'm darren wilson i'm the web team lead at infinite read we specialize in web development in elixir and mobile development with react native so if you're interested in any of those things feel free to talk to us there's a gaggle of us here at the conference and i'm just about out of time but I'll be hanging around afterwards so you can ask me questions then feel free to grab me at the conference at any time and afterwards you can find me at the internets here and that URL there is for our infinite read community slack and we've got an elixir channel in there so that's a good place to find me as well so I hope this was hopeful I hope you have fun playing with ecto thanks again for coming and enjoy the rest of the conference thank you
Info
Channel: ElixirConf
Views: 14,854
Rating: 4.9841585 out of 5
Keywords:
Id: YQxopjai0CU
Channel Id: undefined
Length: 35min 24sec (2124 seconds)
Published: Thu Sep 07 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.