ElixirConf 2019 - ETS Versus ElasticSearch for Queryable Caching - David Schainker

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome to this talk on using Erlang term storage to replace elasticsearch features I want to start just by asking a question so raise your hands if you've got a search function in your app you have users that search for stuff okay keep your hand up if your users really like the way it works okay cool and you show of hands how many of you have somebody building search full-time on your team that's all they do is focus on search no hands oh we got one over here yes all right so what do you give me some shouts what are people using to implement the search functionality elastic search anything else I hear database I hear what else I'll go Lea okay I don't know what that is cool so my name is David Schenker I work at Adobe on Adobe fonts you can find me on social media at shanks pretty much everywhere and last weekend I came in town early to try to enjoy the Rockies and I tried to climb a fourteener and failed because of electrolyte imbalance so I decided I would fill my talk with salt and bananas so that we can climb the mountain together Adobe fonts is pretty cool we're hiring come work for us fonts team is awesome and let's talk about features of ets and libraries that you can utilize to build search functionality in your applications when you browse font installed Adobe comm this is what we have today all of this is powered by elasticsearch we do not query the database and our library is too big so the database is really not an option for us to do searching on the left here we have filters that pull up subsets of our library based on typographic classification and typographic properties up top this is a really standard search bar we can autocomplete on names of fonts we can search for foundries we can search through descriptions type designers it's full text search with Auto completion in one and I really like this UI this UX because it's really simple and because of the way we're using elastic search we can do a full search really quickly this page will respond in milliseconds when you change your search terms so these three features of elastic search make it really useful for us on our team we have filtering auto completion full-text search if you're building a small to medium size application that does things like this with a library or catalog that has lots of different terms and things you want to filter on there's a good chance you'll need to implement these features somehow and elasticsearch is a totally good tool to reach for so let me just briefly cover why we're using it and then we'll talk more about UTS elasticsearch written in Java it's got Lusine inside of it it's a really great search engine and it also covers a lot of the other things that we need for handling data we can index the system while the system we can index data well the system is answering queries we can serve queries on newly indexed data we can switch indices atomically so we can make an update to our library and then make everyone start using it immediately and there's no no problems with that elasticsearch is distributed so we are resilient to node failures no disconnections weird problems that can happen on in distributed systems and we answer queries in soft real-time internally elastic search has this notion of tasks which are threads and then there's a scheduler that handles distributing work across the cluster does this sound a little bit familiar to some of us so long term this cluster is going to keep growing we're going to keep using it our index the indices will grow as library grows we might you know get more used using it if we need to add nodes they synchronize automatically and when nodes disappear the cluster will just keep doing what it's doing without any trouble when it comes to observe observability knowing what's going on elasticsearch gives you a lot of great tools for that also so we can also see how the cluster is doing when things are weird we can keep an eye out for strange behaviors then disaster management is also really nice with elastic search they give you an API to snapshot restore data you know if things are really bad you can get up and running pretty fast one thing that's unique about workloads for smaller applications is that your data set might not be too big but it might be too big for your DB right elasticsearch is really famous for handling tens of millions or billions of documents terabytes of data like there are all these great blog entries about people using elasticsearch with really amazing and huge requirements for us our entire library can fit in memory and that lets us query them really fast so that's why we can do all of this soft real-time stuff ultimately we treat it as a query Abul cache and its really good for this so what more could we ask for right well of course with all things there are costs and I do want to cover some of those things the DSL for elasticsearch is pretty straightforward the API is well-documented it's easy to learn once you kind of get used to using the API then you'll probably find a library that's a client and that actually might have a different DSL just because they're trying to be friendly to the language you're using so now you might be learning to dsl x' at the same time but at least you're getting good you're getting good ability to see what's going on under the covers integrating with elasticsearch is a challenge you have to work hard to make it do what you want because you know if it's your first time using a search engine there's going to be a lot of new things to learn it's not a hard learning curve it's just there will be some trial and error so you might spend time iterating on what works for your use running a really resilient search cluster can be expensive people that run it you know really high scale it costs money you might not care it depends you can throw a ram you can throw a CPU and your cluster will be more performant that's fine it just costs money version management an elastic search has become easier previous versions sometimes had changes in their api's and the backwards compatibility would sometimes break and so we had this issue of keeping up with new versions if we're too busy building products we might have to skip a version of elasticsearch but if you've been using older versions like like lastest search too and you're bumping up to 6 or 7 there there's some real big changes in there and it actually might feel like you're rebuilding everything you did before from scratch that's that takes that cost time prior to less to search 7 some of the defaults were kind of weird we are really glad they're addressing this we found some really kind of crazy defaults with some of the ways that you build a distributed cluster and I also got a little bit salty about the documentation about configuration they're trying to push people to not configure stuff as much because they want you to just rely on defaults which i think is a good idea honestly but when I couldn't find documented actual configuration parameters I found somebody that gripped the whole codebase and built a website that told you what the configuration things are and how they work because apparently this wasn't like part of the official documentation so yeah it gets I get a little nervous because I'm like Oh am i doing something am i doing something that's not right with my config I don't really know like I'm relying on a lot of other things that are not the official documentation to do what I need to do so yeah a little bit salty I get that overall it elasticsearch is really great it works well the free version works well and your requirements for your app are different and then maybe somebody else's and requirements are always dynamic and changing so this could affect your product development and for us it's worked out really well we really do like elasticsearch interestingly we also don't have anybody dedicated to doing search full-time we've been able to set up our cluster Liat let it do its thing and it's pretty happy we've heard of other teams that have actually many people keeping their cluster running the elasticsearch documentation buried in there somewhere it says elasticsearch is a living breathing animal it requires care and love and I'm like well that's nice but I just want my computer to do what the computer does and I don't know if I want to care and love for it that much so this is not an elastic search conference this is Alexa conf so let's look at some of these search features through the lens of Erlang term storage Erlang term storage is batteries included to give you in memory database functionality so quick show of hands who's using ETS right now in production awesome how much do you spend on implementing new things in ETS a lot of time not much time no time at all no nice cool is there anybody here that has no clue what ETS is a couple hands great cool so let's cover that real quick so I think ETS is a little bit bananas the first time I read about it I was really really surprised there's all this here so in the functional world we pass around States sequentially from process to process to compute outputs sometimes we need shared States somewhere that other processes can interact with concurrently likewise what if we need to destructively update data what if we don't want some piece of data garbage collected what if we have something that we need to access from anywhere in our app so ETS is built into OTP handle these situations just call out to Erlang from elixir with the : ETS with ETS you store information and key value tuples so we'll take a look at that in a little bit as I mentioned your data is far from garbage collection so none of this data is destroyed unless the process that owns your ETS table terminates but that's okay maybe that process terminated and you weren't ready for it to terminate there are behaviors you can implement so you can pass ownership of that table to some other process so you get some resiliency built in there data stored is concurrently accessible reads in particular are they are very concurrent but they're serialized under the hood which is fine this is nice because updates are atomic things aren't going to change in strange ways all of a sudden so yeah this I thought this was pretty cool I was like whoa maybe we don't even need Redis anymore so it's flexible not everything about ETS can be covered in this talk and in terms of use cases that are really common these four are definitely I think what apply to a lot of things since we're talking about queryable caching I want to focus a little bit on cache data will cover some bare-bones ways to cover query cache data and then we'll look at some libraries that also make that more user friendly so know there's music so to guide implementations with ETS I like to ask a few questions about our requirements so in our case where does data come from for us it's a transactional database our data set is different for most es elasticsearch implementations it's not billions of documents it's not terabytes of data it fits in memory on a pretty modest machine and once we index the data in we're only reading it the library doesn't change unless we want it to change data doesn't change that often either we might make library updates every once in a while three times a day but it's not thousands of times per second or anything crazy like that not that that's crazy in a bad way it's just you know we're a little more laid-back about our library updates so how do we put structured data into into ETS which takes tuples is this a good idea let's let's bring our salt and find out so back to Adobe fonts we have filters the search bar auto completion and full-text search so let's look at some filtering pretend you have a record like this coming out of your ecto repository so this is a font struct it contains in our example the name of a font the classification of that clot the typographical classification and then a description so we will call ETS new we'll give that table a name and then we will say what what are these what are the objects look like in that table in our case we're using sets and we're using public access so any process can read this sets in ETS give you one key per object there's no order and there's no duplicates this is a default ETS table type excuse me after that we'll put some data in ETS insert ignore this horrible typo on the third line and then when you want to look up stuff you can just call ETS look up and pull out in our case font for Torah which is a sans serif sans serif Oh what about filtering we put data in we pulled it out if we want to filter it we can use match two and match object one these are really useful functions they give you exactly what you asked for and this basically lets you filter stuff I can say match on for sans-serif fonts and I get back in this case acumen and Futura if I match objects I get a whole object back but what's this weird : doll one business here well the variable number dollar sign one business is the result position but not the match position so the syntax feel a little bit weird but it kind of you kind of get used to it for me this then takes us to match specifications this is an airline term describing a small program that tries to match something this basically lets you query data and you can actually use a select command to pull results out so you turn a query into a match specification and then my query query it in a way that's familiar and it it looks a little weird so let's break it down in this case we make a function that takes these fields and C and D which is just short for name classification description and then we want to find all of the fonts with the classification sans-serif so we get three parts back when we make a match specification the first part is the match head this is the shape of our desired objects the next are your conditions this is how we're filtering so you can actually see that we're saying what is equal to the second term in the object that has the value sans-serif as a string finally the match body which is the format of your results returned in this case we're only asking for the first entry of the tuple which is just the name of the font so then we call select and we get back a results 1 gotcha I learned about this approach is that if you have a compiled function and you a compiled function from your code and you try to call fun to ms on it well in a IX shell that's not gonna work likewise if you have a dynamic function and you try to send it over to compiled code that's also not going to work but wouldn't it be nice if we could maybe do some of this filtering from ecto directly I found out there's a community library for this so a fellow community member in the UK evadne wu wrote a library called ETSU and it treats ETS tables like ecto repository strux so one of the nice things about ETSU is you get a supervision tree for free that does really nice default implementations of resiliency so you don't have to worry about building your supervision tree in a fancy way just yet so does it for you and then we can still use match specifications inside of ETSU so before we had this match spec and now we can do something a little more familiar we can just call out to octo repo where classification is this select it and then get all entries from repo then match it this library is still new it's looking for people go check it out on github I put the link on the top of this slide but I thought this was a really nifty way to deal with I think what is arguably a really good set of kind of basic stuff you want to be doing with ETS without having to use a lot of ETS okay so that was filtering let's move on to autocomplete for implementing autocomplete I thought I would have to come up with something really crazy I was started learning about like Julia raised and like tries again and I was like oh that like computer science stuff I did in school that I forgot what is it but ETS actually has internal functionality for this match specifications can match exact binary values so you can return lists of results based on a string entered by a user turns out ets has really efficient data structures for doing this this example matches fonts containing serif in their classification so fitara is sans-serif and Adobe Garamond is a serif font so in this case our match body gives us both that's pretty cool okay so that went a little quicker than I expected I thought I'd have more but we're just on two full-text search already and guess what I'm gonna be honest here full-text search is a real gorilla in the room it's big I mean what what does that Google company do they do search like what were those search engines of the 90s they do search like this is elasticsearch they're publicly traded they do search this it's a really big problem so it turns out looking digging around they're a bunch of full-text search libraries available in the community already and the first one I thought was really cool II live one was built by Joe Armstrong he wanted to build something called an inverted index to allow you to do text search so Oh an inverted index basically takes a document and you build a table of which documents mention a given word and then you can look up a list of matching documents based on a word then beyond that you have some pretty heavy hitting libraries you could still learn to use Lucene you could try to integrate it with your app you could you know build something different you could use Redis search so that's a new tool from Redis where you can do full-text search inside of Redis and react also has a distributed search engine component built into it so you could also do something with react to have full-text searching with indexing and a lot of the out-of-the-box functionality that you would actually get with elastic search but of course all of this stuff takes time to learn and use and it really depends on your app and what what you need as a team like a team of 10 people might have more resources dedicated to doing something of this than a team of three so that's a little bit of salt are you ready for a lot more salt what about the real world we've just covered using ETS for components of a search engine that's not the whole thing there's all these other things that stop us from making an elastic search killer with ETS notably there are some performance limitations in ETS that other people talked about today did anyone go to Miriam panas talk yeah cool so in that talk she mentioned that certain as your like data structures get bigger and you put them in ETS you can experience certain kinds of performance and that's that's a real thing to think about how big are these documents are you storing three fields are you storing a thousand fields that will definitely impact the way that you think about building this component of your app most search engines include rankings you know you search for something you get back a list of ranks we didn't even cover that multi-language support searching for typography and Asian languages requires different characteristics than Western languages so think about your users what UX do they need if there are a lot of fields needed to represent information we might need to store them somewhere else or we might need to break out our indices Adobe fonts also has a pretty big font library and we have a lot of business logic that we pipe through elasticsearch so it gives us flexibility to adapt to the business needs so remember all those operational concerns I talked about like self-healing infrastructure nodes disappearing coming back we haven't really touched dealing with any of those things with ETS ETS will be highly available if you architect your system the way you want excuse me data synchronization is out of the box I haven't learned yet how to do that with ETS so if let's say I'm trying to build a horizontal horizontally scalable cache and I bring up a new note how do I get data a copy of that data from the old node to the new node so that they can both be serving the same data for requests elasticsearch also has this concept of index aliasing ETS can do this as well you can create a new index underneath a current index and then promote data upwards so there are some references all having my slides want to put them online that tell you a little bit more about doing this stuff but wait there's more the most bananas things I learned bananas thing I learned about for this was persistent term it doesn't do any filtering autocomplete or text searching but it's just a really supercharged key value store you can access things in constant time really fast it's been highly optimized for reading terms at the expense of writing and updating terms a good thing to note is that if a persistent term is updated deleted the Erlang VM is going to go through and do a garbage collection pass look at which processes are using that data and then copy it to those processes that way you can use persistent term for things that are frequently accessed but very rarely if ever updated so that was a bunch of salt and bananas it got us up the mountain of learning more with ETS and that's it for my talk today thanks for coming [Applause]
Info
Channel: ElixirConf
Views: 3,059
Rating: 4.5862069 out of 5
Keywords:
Id: J38RpkA1580
Channel Id: undefined
Length: 25min 57sec (1557 seconds)
Published: Fri Aug 30 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.