How To Design A Good API and Why it Matters

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

I love watching stuff like this just as I get to the end of a project. It really helps keep my happiness in check!

👍︎︎ 10 👤︎︎ u/rftz 📅︎︎ Aug 29 2012 🗫︎ replies

This is from 2007, the OP just linked to a copy of the video with very few views... here is another version (the original?) with over 100k views.

👍︎︎ 33 👤︎︎ u/taleinat 📅︎︎ Aug 29 2012 🗫︎ replies

My biggest gripe with APIs is if they use cryptic docs and a single example. A database of examples covering all the general cases is all I need.

👍︎︎ 4 👤︎︎ u/salgat 📅︎︎ Aug 29 2012 🗫︎ replies

This is badass.

👍︎︎ 3 👤︎︎ u/shaunol 📅︎︎ Aug 29 2012 🗫︎ replies

I find that different languages/platforms have different standards for what a good API is, or for that matter, what good code is.

For example, in C, classically (which most C code I see is in the "classic" style) you are supposed to write 400 line functions that do 10 different things and the more complicated parameters your functions have the better.

In Java, you are supposed to create a class for every contingency, document every class, and expose all of the classes in a way that makes it impossible to know what sequence of objects to create and methods to invoke without consulting the example.

In Node.js, you can never just return from a function, so proper code usually involves at least three levels of nested callbacks, and an API isn't any good unless the module does one single thing and makes the task as simple as it could possibly be.

👍︎︎ 5 👤︎︎ u/runvnc 📅︎︎ Aug 29 2012 🗫︎ replies

I can't recommend this video, the reasons being:

it's only ~3/4ths of the talk then it cuts off (Google has 20% time but god forbid a talk go long by a few minutes, what?).
it should be called "how to design a good API in Java" as a lot of the talk is related to Java specifics. For instance near the beginning there's a slide showing a rough sketch of a new API... the slide is a wall of text of Java keywords "public static <T> Callable <T> ...". I know Java fairly well and it was hard to make out what the actual API was with line wrapping (one method is 4 lines on the slide). Talk gets bogged down in Java specifics.
guy gave out paper handouts (how retro) and handout is not included in the slides.

There's a ton of little details in this talk that are decent, at least for Java, but I felt it was really lacking in Zen. Designing a good API is more than just a checklist of 'did I make everything immutable that should be' or 'did I document every use case', but details like that are basically all this talk is.

👍︎︎ 2 👤︎︎ u/0xABADC0DA 📅︎︎ Aug 30 2012 🗫︎ replies

I know APIs are not a language fare, yet, over the years I've realised how much Python have helped me towards being better and better at building them.

Python is an elegant language (there are others I bet ;)) and this is a subtle feature that has taught me a lot being more rigorous whilst understanding that programming is a some kind of art.

👍︎︎ 2 👤︎︎ u/chub79 📅︎︎ Aug 29 2012 🗫︎ replies

Saw this and now I'm scared to continue working on the library that I was working on. There are so many things I might be doing wrong!

👍︎︎ 1 👤︎︎ u/poo_22 📅︎︎ Aug 30 2012 🗫︎ replies

Very good presentation and some urgently relevant topics!

👍︎︎ 1 👤︎︎ u/MrBranden 📅︎︎ Aug 30 2012 🗫︎ replies

Captions

I'd like to thank you and welcome you all to the latest in our series of talks on advanced topics in programming languages the purpose of this series of talks is to expose all of the great domain knowledge and programming languages that we have at Google so that what that means is that everybody in this audience all of you geniuses who work for Google and myself discounted should give a talk so please come to me my name is Jeremy Manson email me I am me whatever and tell me what talk you're going to give and I will set it up and give you a talk you don't have to be an author like Josh or some of the people we have coming up in order to do so Josh and we're running late Josh didn't want me to build him up too much and give him a swell head so I so I want to introduce him I'll just say that in a company full of geniuses it's his star that shines probably the most brightly and that I just want to introduce this this man who's who's Bootheel I am not worthy to lick joshua bloch ladies and gentlemen normally at this point I I thank my introducer for the introduction but today I think I'll dispense with that formality so I should I should also say by the way that this is unfortunately a rather long talk I try to have short talks with one or two key ideas but this subject matter just doesn't lend itself to that so please hold your questions to the end and in fact we'll probably use up the hour without questions but then I'll hang around for as long as you want and answer all the questions that you have so why is API design important well api's can be among the company's greatest assets a good API is something that people invest heavily in they do this in obvious ways and in less obvious ways obvious ways are they buy products built around the API is their right to the API or the less obvious way is they learn it they spend hours and hours actually they spend months learning the api's and once they've done that you know they don't want to learn a new one because they have to unlearn everything they know and replace it with something else and furthermore the API is just wired throughout the infrastructure at a company so a successful API can make a company can give a franchise that lasts so I guess it's about 25 years now and and similarly a bad API can be among a company's greatest liabilities and there are a couple reasons for this first of all a bad API can cause an unending stream of you know support phone calls because people cannot make the thing do what it ought to do and it can inhibit a company's ability to move forward because once we have a bad API you cannot change it at will you're pretty much stuck with it forever you have one chance to get it right so that that's pretty scary and with that in mind you want to learn how to make api's that will stand the test of time so now you know why API design is important but why is it important to you not all of you may think of yourselves as API designers well it turns out that all of you are API designers anyone who programs a computer is an API designer and the reason is that good programming is inherently modular and these inter modular boundaries are themselves api's furthermore good api's tend to get reused if you've written a module and it's good at doing something you know one of these days one of your co-workers is gonna need to do the same thing and gosh you know you've already got this module that does it but once he's using that API you are no longer free to change it at will because if you change it you'll break his program and if he has ten friends and they all start using it then you're really hosed finally thinking in terms of API design tends to improve the quality of the programs that you write it tends to sort of keep you from just hacking things together it tends to make you want to write nice units you know that are that are composable that are reusable and that are sensible now one other question we've got to get out of the way at the beginning is why am I talking about API at what is billed as a language design series and the glib answer of course is that Jeremy asked me to do it and Jeremy used to be my friend so of course I said yeah of course I'll do it but in fact the real answer is that API design and language design are very very similar the only real difference is that API design is constrained by the syntactic sorry the syntax of the language for which you're writing the API whereas when you're designing a language you have the flexibility to do anything you like with the syntax but in fact whether you're designing a language or an API you are creating a tool for programmers to express their intent to the machine and to other programmers who read the program maintain it modify it and so forth finally these days you don't really think in terms of a language or a library alone a language and a library together comprise a platform and when you're designing a language you design the core libraries hand-in-hand with the language so really the skill set for designing good API is and for designing languages is pretty much the same so what are the characteristics of good API first of all it's easy to learn and it's easy to use even without documentation so a good API should be easy to memorize it should just plain makes and the flipside of that is not only should it be easy to use a good API but it should be hard to misuse a good API it should be hard or impossible to use misuse a good API that is basically a good API should simply force you to do the right thing it should be easy to read and to maintain code written to that API the API should be sufficiently powerful to do what it has to do note that I didn't say the API should be powerful it is not the case that the more powerful an API is the better it is it should basically be just powerful enough to do what it needs to do but it should be easy to evolve the API over time because there will be new needs later on so what you want to do is you want to write an API that meets its requirements and that can evolve to meet future requirements and finally the API has to be appropriate to the audience what is a good API for let's say Wall Street analysts is probably not a good API for physicists because they have different terminology they think differently so your API has to be aimed at its audience so now we know what the characteristics are how do we achieve them and that's what the rest of this talk is about the talk is divided into five sections the first one is on the process of API design I'm not a big process weenie but I've found over the years that there are certain things that all good API designs have in common in terms of the processes used to create them so I'll try to go over that then the general principles of API design then those principles as they apply to classes at the apply to methods and as they apply to exceptions and finally if I have time I'll show you a couple of refactorings where we improve API designs so what is the process of API design like well first thing you've got to do is you've got to gather the requirements but do it with a healthy dose of spec of skepticism because often when you ask people you know well what does this API have to do for you what you'll get won't be a real set of requirements it'll be a set of proposed solutions and a better solution may exist so you know if someone tells you let's say we need to precisely control the garbage collection intervals the maximum time that each garbage collection can take you know that's not really a requirement I mean the requirement is you know we need to be able to run a service smoothly while any garbage collection takes place how you choose to achieve that is up to you your job is to extract the requirements from the VB stakeholders in your API and often it's a little give-and-take process once you've got the requirements they should take the form of use cases and by use cases I simply mean the problems that your API should be able to solve and these are extremely important because they provide the benchmark against which you can measure any proposed solution one thing you should keep in mind is that it can be easier and more rewarding to build a more general solution than what you've been asked to do this doesn't mean you should just say oh I'm gonna build a framework every time someone asks you to do something but sometimes the more specific thing is more difficult to build as well as being less powerful so always keep an open mind one year when you're looking at those initial requirements let's see I guess I have another example here of what they say and what they mean when they say we need new data structures and our PCs with the version 2 attributes this actually happened to me at a company called trans Ark you know when we were kind of upgrading they said you know make a whole new set of data structures in a whole new set of api's what they really meant was we need a new data format that will accommodate all further evolution in the internal data structures you know because you don't want to have to make a whole new set of data structures and a whole new set of on the wire and on disk interfaces every time you decide to add a few more attributes to your server so in fact I made the system much more dynamic and we never had to do that again so you should start with a really short spec one page is ideal at this stage in an API design agility definitely Trump's completeness the worst thing you can do is to send six smart guys off into a room and have them sit there with the door closed for six months and come out with a 240 page specification document and believe me this has been done many many times it's an awful idea because at that point first of all their ego is invested in what they've just done they're gonna build it even if it's a piece of crap second of all you know how do you know if it's any good it's it's like this big long hairy speck it's no longer agile if they made a fundamental mistake then you've got to change all 240 pages of it but it may fail to satisfy some sort of key requirement that they didn't really understand before they started so what you want to do in the beginning is have the entire API spec on one page and this way you can bounce that spec off as many stakeholders as possible listen to what they have to say and take it seriously if they say no I'm sorry this won't do for me because I cannot write such and such a program think about it you know you may say well you shouldn't be write in that kind of program it's a really bad idea but more likely you may say oh gee I didn't think of that this really is important what if we structure it this way if the whole thing is only a page long you can do major refactorings in ten minutes if you keep the spec short it's easy to modify and you flesh that spec out only as you gain confidence that you're on the right track and this necessarily involves coding in particular it involves coding to the API that you are defining it doesn't involve implementing the API it involves pretending it's already been implemented so what does it look like when you do it right well here's an example that I was writing at about the time that I was putting this talk together for for oops love some number of months ago someone wanted the ability to retry a computation in the face of failure and I said oh well you know we have this executor framework otherwise known as the executor framework and really all we want here is a retry policy that tells you how you might choose to retry the thing in the face of failure so you know here's a little interface and it's got a couple of methods one tells you if a given failure is recoverable you're passing the exception and it just gives you true or false we should try to recover from this one or we shouldn't and the second one computes the next delay in terms of the initial start time and the number of previous retries and by passing all this data the the actual retry policy can be stateless so you can have singles you can have a retry policy called exponential back-off but you don't have to store any data in each exponent back off instance it really is just a retry policy and it's called exponential back-off and that's kind of all there is to that one and this isn't really very complete this is not you know a speck of a quality that I could use for Javadoc it's not a speck that someone could use to implement to but it's a speck that's good enough for someone to look at it and say yes it does do what I need or it doesn't the rest of it's on another slide here this is a set of static utilities that let you actually use retry policies so what can you do well the first thing you can do is you can pass in an executor service and retry policy and get a retry an executor service it implements the same interface which is executor service so if you already know how to use an executor service you know how to use the retrying executor service and that's great by the way that's a good way to keep the sort of the conceptual weight of an API small use interfaces that have already been designed defined in this case executor service and what else do we have we have another kind of a retrying executor service what are what are the difference between the first two I haven't looked at these for a while so I apologize for that anyway doesn't doesn't matter and then we have a couple of rappers one of which takes a callable and returns a retry and callable and takes a runnable returns are we trying runnable and then we have a couple of examples of the retry policies themselves these are static factories we have if you want an exponential back-off you get one and these are the parameters that describe your exponential back-off what's the initial delay in what unit you know 10 seconds 100 milliseconds whatever and then the the timeout which I'm not sure what that means so by the way this is actually interesting this tells me this wasn't quite good enough you know what I wanted to show you is that something that's really simple it fits on a slide or two is enough to communicate your intent and enough to sort of figure out whether it's good enough and you know try it out and and I think the answer here is it's almost good enough but not quite good enough I guess the timeout probably is the overall timeout like after how long of trying and retrying they finally give up but I think it should have said that somewhere on this but anyway you get the idea the idea is that this is a very small description of an API but it's big enough to find out if it's good enough to do it needs to be done and if it's not it's easy to modify it you should write to the API early and often of course you should start before you've implemented the API now because this saves you from having to throw away an implementation of a bad API you know if you if you first specify it then implement it then try the implementation and decide that the API was garbage well you've wasted lots of time implementing it and as I said you should start before you've even specified it properly because that saves you from having to throw away detailed specifications for broken api's you should continue right into the API as you flush it out and this is important some people sort of stop writing to the API about halfway through the process and just go on this death march to implementation the problem with that is you get nasty surprises about a week before you ship when you try writing to it again and you find that you know oh my gosh it actually doesn't solve this important case that we thought it solved and some people worried they worried that you know all of this coding to the API is a waste of time when they should be implementing it but that nothing could be further from the truth those initial pieces of code that you write to any API are among the most important pieces of code that you'll ever write to it the code lives on in the examples that you publish for how to use the API and those examples tend to get emulated heavily if you get them right you've seeded the market with good uses of your API if you get them wrong conversely you know you've ensured that there will be broken programs floating around the web for the next 10 years and I used to you know believe this with all my heart and soul but now I actually can point to a proof of it it turns out in last oops law there was a paper published called design fragments by Fairbanks garland and show us from CMU and they actually traced mistakes in the original applets that were shipped out with you know that the first release of Java into broken concurrent programs thousands of them that still exist on the web today so you know the way I put this is example programs should be exemplary there's a reason they call them example programs and programs that the first programs that you write to an API as you are flushing it out invariably become the example programs you know so you know my rule of thumb is you should spend 10 times as much time on every line of example code as you do production code that may sound backwards to you but I really believe it right into an SPI that is a service provider interface is even more important than writing to any other kind of API you you all probably know what spi czar or maybe you don't if you know what an SPI is raise your hand ok so an SPI is a special kind of an API which you use to provide a new means of doing something it's not the API which the programmers write to rather it's the plug-in API that lets say let's RSA dsi incorporated provide their encryption methods and Sun Microsystems provide their encryption methods so the user of this encryption API can use a higher level API which then dispatches to these encryption methods and thinking about SP eyes is that you're supposed to be able to hide very very different implementations underneath them and if you you know write an SPI and you only have one provider it turns out that as a practical matter you will probably never be able to support another once you try to do the second you'll find that there's something about that SPI that ties it forever to the only implementation you actually thought about if on the other hand you do two implementations rather than one then you'll probably be good enough you'll probably be able to support subsequent ones with some difficulty but if you do 3 it'll probably work fine for any number if it works for 3 the fourth won't be all that different from each of the first three I I found this out myself and then I saw it in a book will tracks discover this and and called it the rule of threes in a book which was subtitled confessions of a used program salesman because it was a book about software reuse and by the way here we explain the subtle coding that is used throughout this talk whenever you see something green it means this is good do it this way and when you see something in red it means this is bad don't do it that way you should maintain realistic expectations throughout the process of API design it turns out that most api does our over-constrained people want them to do more than they can possibly do so you're going to have to make compromises you cannot please everyone if you try to please everyone you come up with a pig you come up with big nasty api's that no one will ever be able to use properly so what you should do and this may sound strange is you should aim to displease everyone equally the idea is that you know if if one of your important stakeholders is very displeased and the others are really pleased that's probably a problem because your API isn't doing something it has to do if everyone is like less than a hundred percent happy but they're all happy enough then you've probably done the right thing but do not misinterpret this as saying I favor design by committee and you should take everyone's ideas and mush them all together you do need one sort of strong design lead that can ensure that the API that you're designing is cohesive and and pretty and clearly the work of you know one single mind or at least a single-minded body and that's always a little bit of a trade-off being able to satisfy the needs of many customers and yet produces something that is you know beautiful and cohesive expect to make mistakes API design is hard now luckily a few years of real-world use will always flush out the mistakes unfortunately by that time it's like too late to do anything about them although you can write nice talks and tell people about the mistakes so they don't make them again and it's kind of what I'm doing here today so given that you're gonna make mistakes and you're gonna be stuck with the original API right the API so that least you can sort of add to it and produce something that will help you get around the shortcomings of your original designs a recent example that in my life is in the collections API which I did around 1997 1998 there were some real flaws in the sorted set and sort of map implementations in particular you know there are a little bit a symmetric it's much easier to a forward then backward and a couple other things I knew about these flaws at the time but I didn't know how to fix them however we were able to extend that API so if you look at the most recent release of Java Java 6 you'll see something called navigable set and navigable map which extends sortedset and sort of map and provide you know additional methods that we fix those difficulties you know and it's not a perfect solution because there are some things that implement the old the old interfaces and they're not fixed yet but at least all the standard collection implementations from Sun now are fixed so what are the general principles of good API design first of all an API should do one thing and do it well and I should say that almost all of what I'm going to say for the next five minutes may just sound like motherhood and apple pie but there's more to it than that I'm gonna try to give you actionable advice I'm gonna try to take the sort of the standard old Sol's like an API should do one thing and do it well and see what it really means and tell you how to achieve it so you know in this case the functionality should be easy to explain if it's not easy to explain then it's not doing one thing and doing it well it's a mess if you can't come up with a good name for it then it's a mess the means are the API talking back to you so listen to them when you try to name those methods and those classes you know if you come up with a really complicated name like dying any factory operations or underscore binding iterator input base which you know actually violates that the naming conventions of the platen platform sorry encoding CDR end caps you know clearly you've got problems any API that contains this sort of stuff is a mess or what about this oMG VMs yeah I know oMG is oh my god but I can't I can't figure out the rest of it and by the way lest you think I'm just making this stuff up all of this comes from an actual Java platform API and I won't tell you what it is except to say that it's CORBA good API names are like font yeah I know what a font is sure you know it's like it's italic or bold or whatever you know a set I know what a set is a private key a lock a thread factory these things are you know they instantly communicate what they are and the methods you know the class is all of them should be like that looking at them you know it should be clear what they are and good names Drive good designs you know once you have something that's called a set you know what the operations are you insert things from sets you remove you tests from for containments so so good names Drive good designs and bad names are an indication of bad designs so listen to those names speaking to you and if you just can't get it to work out right then you're probably not trying to build something reasonable so always remain a meta bowl to splitting a module up if you're trying to cram too much into one module or to putting multiple modules together if you're trying to expose sort of internal details that ought to be hidden maybe you should just make a bigger module that hides some of those details an API could be as small as possible but no smaller this principle is usually attributed to Einstein although I I look really really hard and I don't believe he ever said it I believe that it's you know he probably believed it but he didn't say it at any rate the API should satisfy its requirements and if you only remember one thing from the talk today please remember this when in doubt leave it out that applies to everything it applies to functionality two classes two methods two parameters with a method anything if you have any doubts about whether to include something leave it out you'll probably be able to add it later but you most certainly will not be able to take it out once you've put it into an API once you put it in people will be using it if you take it out they will scream bloody murder so if you ever have any doubts about whether to include something leave it out and if that's all you take away with you from today's talk then I think it's been worth your hour of course that's just my judgment anyway when you're thinking about the size of an API the conceptual weight is more important than the bulk by bulk I mean the number of methods the number of classes the number what's really important is the number of concepts when I am learning this API how many different things do I have to learn about and there are a number of ways to decrease the conceptual weight of an API the most important one is reusing interfaces so for example if you look at the collections framework there are many implementations of the set interface you know whether it's tree set they're the original ones were you know hash that tree set and you know then we added linked hash set and more recently a whole slew of concurrent set implementations you don't have to learn any new api's you learn the set api and we can add new functionality you know we can add richness without making you learn any new API so that's one of the great ways to increase the power-to-weight ratio that's that's the important thing you want to be able to do a lot without learning a lot the implementation shouldn't impact the API once again this is motherhood but what does it really mean it means don't put any implementation details into the API they confuse the users and they inhibit their freedom to just sorry they inhibit the implementers freedom to change the API later so you know for example let us say if you have some API that's about phone numbers but it throws sequel exceptions and now you want to re-implement it on top of some proprietary data store rather than a sequel datastore but your clients are already trying to catch the sequel exceptions you know what do you do well you can emulate those old sequel exceptions but that's crazy so that means that you should make sure that the exceptions that you throw are kind of at the same layer of abstraction as the the rest of the API is so that's just one example where implementation details kind of leak into AP is the important thing is you have to be aware of what actually is an implementation detail you don't want to over specify your methods you don't want the specification of a method to involve something that is an implementation detail and that you would like to be able to change later on so here's an example where we did that don't specify your hash functions you might think that you know exactly what value is returned by the hashcode method is a proper part of the spec but it's an implementation detail the spec should simply say it returns an integer and you know with high probability the integer will differ for two different objects and furthermore it should be cheap to calculate the thing but exactly what number is returned you should have the flexibility to change that from release to release as you learn about flaws in your old hash functions and as the technology improves and hash functions improve of course you cannot do that if you're writing a persistent data store if those hash functions are going to be used to store data on disk then they can't change but that's a very special kind of hash function those ones must be specified but the great majority of hash functions out there shouldn't be and and we got this very wrong in initial releases of Java and unfortunately that tradition has stuck to the point where almost all of these hash functions are specified but there really shouldn't be finally you shouldn't let the implementation details kind of just leak into the API the example I gave you before with an exception is an example where you know you didn't really think hard about and say oh yeah you know we should maybe throw a sequel exception here you probably just wrote it and you realized gee I'm calling something that throws the sequel exception so I have to propagate it out that's a case where an implementation detail is just sort of leaking another example a really nefarious example in Java is if you simply say implements serializable once you've done that your entire implementation has just sort of leaked out as part of your API because the serial form consists of all of the fields that comprise your object even your private fields so all of a sudden the private fields are part of a public API and that's really really bad and the way around that by the way is to design your serial forms carefully don't just say implement serializable you should minimize the accessibility of everything that means you should make your classes your members your fields all as private as possible one specific case is that public classes should have no public fields with the exception of their constants which aren't really fields this maximizes what they call information hiding harnesses the guy who came up with that term and it minimizes minimizes the coupling between api's you know if things are kind of hidden behind intermodule boundaries they can be changed freely and this allows modules to be understood to be used to be built to be optimized debugged tested and what have you individually and in parallel so you could have multiple teams you know dealing with multiple api's concurrently if on the other hand the API sort of expose everything and and each you know module is sort of messing around with other modules then there's very little that you can do to any module without affecting a whole slew of modules around it names matter a lot there are some people who think that names don't matter and you know when you sit down and say well this isn't named right they say don't waste your time let's just move on it's good enough ah names in an API that are going to be used by anyone else and that includes yourself in a few months matter an awful lot the idea is that every API is kind of a little language and people who are going to use your API have to learn that language and then speak in that language and that means name should be largely self-explanatory you should avoid cryptic abbreviations so the original UNIX names I think failed this one miserably you should be consistent it's very important that the same word means the same thing when used repeatedly in your API and that you don't have multiple words meaning that same thing so let us say you have a remove and a delete in the same API that's almost always wrong you know what's the difference between remove and delete well I don't know and I listen to those two things they seem to mean the same thing if they do mean the same thing then call them both the same thing if they don't then make the names different enough to tell you how they differ if they were called let's say delete and expunge I would know that expunge was a more permanent kind of removal or or something like that not only should you strive for consistency but you shouldn't try for symmetry so if your API has let's say two verbs add and remove and and two nouns entry and key I'd like to see you know add entry add key remove entry and remove key if one of them is missing there should be a very good reason for it I'm not saying that all API should be symmetric but the great bulk of them should if you get it right the code will read like pros that's the prize so you know in this case the reads if the cars speed is more than twice the speed limit then the speaker should generate an alert that says watch out for cops that's that's pretty much English it reads like prose and that's an indication that that API is reasonably decent documentation matters as well and and harness the aforementioned parts said it much better than I could so I'm simply gonna read what he had to say he said reuse is something that is far easier to say than to do doing it requires both good design and very good documentation even when we see good design which is still infrequently we won't see the components reused without good documentation he said that in 1994 and I don't know about you but you know when I when I read that I get religion and the only thing I can do then is to document religiously document every single class every interface every method every constructor every parameter and every exception in my public API of course all of you do that right and when you go out on the web whenever you look at Java doc it's always the case that every public or protected method has a javadoc comment right um no you know it's really terrible because if you don't have a comment telling you what the specification is what is the specification who knows you have two choices either you guess in which case your program probably doesn't work or you read the code in which case that the implementation becomes a specification and it's over specified and you no longer have the freedom to change that implementation at all so so so document everything and there are you know what what does this mean for classes just tell me what an instance of that class represents for methods tell me the contract between the method and its client that is what must be true before I call it what will be true after it returns and any side-effects those are particularly important if you have side-effects and you don't document them people will shoot themselves in the foot I'll give you an example of this later for parameters people often forget don't just say mmm the size of the block it's the size of the block in bytes or in megabytes or whatever you've got to tell me what the units are the form if it's a string especially I've got to know is it XML you know what what form is this string in and finally the ownership if I'm passing an object into an API do I still own the object am i free to modify it after I passed it in or have I transferred ownership for myself to the object to which I've passed that other object if the thing that you're defining is immune is is mutable that is can be modified then you must document the state space very carefully if you have a badly documented state space then you have no hope of being able to use that API properly because people won't know when it's legal to call what and what will happen after the call is made I may discuss this elsewhere but an example of how to do this very badly are our date and calendar in particular the calendar API in Java the state space is almost undocumented and it's caused numerous bugs over time I'm happy to say by the way that just just days ago sun decided that that we would be pursuing a new jsr to improve the date and calendar api's based in part on joda-time and who knows what all else but so may we may finally be you know free of that mess you should consider the performance consequences of your API design decisions and this is funny because this tends to you know contradict the advice you've all heard that you know premature optimization is evil in fact I have a whole essay about that in ineffective Java however that doesn't mean you can just ignore performance John Bentley was big on on this fact and in particular it turns out that bad API decisions can limit performance examples of things that can limit it are making it type mutable when it should be immutable or vice versa providing a constructor instead of a static factory using an implementation type in an API instead of an interface which means that people always have to use that particular implementation even if better one comes along later but the converse of this is never warp an API to gain performance every once in a while you have something that sort of temporarily broken you know this thing is slow and in order to avoid this slow thing you break your API the thing that used to get slow becomes fast but your API is still broken you know so design your API is for the long Hall luckily good API design usually coincides with good performance here's an example of an API design decision that led to bad performance in in the original AWT there was something called a dimension if you had a component and you asked for its size you got back a dimension object which contained two coordinates it's simply a couple of loans that were wrapped the problem is that the dimension object was mutable those those Long's weren't really wrapped you know they were they were publicly visible mutable fields and what that meant was every time you called getsize you had to allocate a new dimension object because otherwise I might you know get the dimension object give it to you you might ask for it and then you might modify it modifying his copy as well and that would be really bad now these two you know independent threads or computations would be tied together in in nasty ways is that bad you know is it really expensive to allocate a little object containing two Long's no it's dirt cheap but unfortunately this thing gets called millions literally millions of times in a GUI app so you know all of a sudden you're basically allocating megabytes of garbage and that really does cost you it's garbage collector pressure that you just don't need and and it was fixed in 1.2 by adding methods that return each dimension individually as a primitive type which is in fact immutable but you know unfortunately old code that used the 1:1 api's is still slow and will always be slow the API is that you right have to coexist peacefully with the platform so that means that you have to do what's customary for that platform you have to obey its naming conventions you have to avoid anything that's just verboten in that platform you know whatever it is if you're in Java you know if you're in C++ everybody knows are certain things you just shouldn't do or shouldn't use so learn what those things are and then avoid them and and there are there are generally books that tell you that traps and pitfalls for every platform I could recommend one for Java but you know anyway the the thing you've got to do is mimic the patterns and the core API is in the platform because everyone who uses a platform knows its core API so if your API feels just like one of the core api's then everyone because they already know how to use the core API will already know how to use your API it's as simple as that and the real trap here is you should never simply transliterate api's that is the worst way to design an api and what do I mean by that suppose you have a C++ API and you want a Java version of the same facility what you should not do is take every C++ class and make a complementary Java class that contains all the same methods as the C++ class because what was reasonable in C++ will almost certainly be unreasonable in Java and vice-versa so you have to basically take a step back you have to say what is this class doing and how would I do this in Java that's the right way to do it transliterated api's are almost always broken okay onto class design I have 15 minutes to finish the whole rest of the talk so it gives me five minutes each for the next three sections and zero minutes for the last section so first of all minimize mutability classes should be immutable unless there's a very good reason to make them mutable and the advantages are that the classes you get are simple they are threads safe there are instances of them are reusable you never have to generate a new one the only disadvantage is that you need a separate object for each value so if you have a huge object let's say a big integer that's a million digits long and you want to throw it away but you just want to change the last bit if we're mutable you could do that in place you know at virtually no cost but because they're immutable you have to basically copy a megabit of data and then throw away the old megabit which is a little bit unpleasant but if you do have to make things mutable and you often do you should still make them as immutable as possible you should give them a nice small well-defined state space you should make it clear should you should make clear when it's legal to call which method so bad examples as I mentioned before our date and calendar calendar has these you know like the role method and when you you know you put a date into it who knows what state is behind it when it's legal to call what what state the calendar is in after you've used it you know did you know that when you're using a date formatter if one thread uses a date formatter and another thread tries to use it at the same time both threads are hosed you know come it feels like an immutable object except that it has state inside it you can't read that state but it's there you know so these are these are things that are more immutable than they should be a good example is timer tasks it's not immutable but it minimizes immutability in particular a timer task is inherently mutable because it represents an actual occurring computation and that mutates but you know a time to ask what do you do you create it you schedule it in runs as many times as it has to run and then it's dead it's gone end of story you know there were people who asked us when we were designing the timer API we want to reuse them you know it's like expensive to make a timer task the answer is no let it die in peace if you need another one make another one but by eliminating that sort of loop from the state space you make the API much simpler and much less bug prone you should subclass only where it makes sense to do so this is the lives Liskov substitution principle and it's actually very simple you just have to ask yourself a question when you have two classes in a public API and you're thinking of making one a subclass of another like foo a subclass a bar ask yourself is every foo a bar if you can answer that with a straight face then make it a subclass if not don't so a bad example is in Java properties extends hash table is every properties object a hash table heavens know a properties object is a special thing that map's certain strings to certain other strings so every properties object has a hash table perhaps typically it's implemented atop a SH table but you can't answer that is a question with a straight face or what about stack extends vector is every stack a vector no you push and pop on stacks and that's pretty much all you do to them you might also have a peek method and a size method but you know a vector allows random access accessing about index a stack doesn't so it was really wrong to have stack extend vector and the really bad thing about it was they took away a great piece of real estate we can't use the name stack for a class that actually does implement a stack anymore because the name has been taken good is set extend collection is every set a collection yes it really is set is just a special kind of a collection it's a collection that does not allow duplicates this is a fairly controversial one but I believe that you should design and document a class for inheritance or else prohibited outright make the class final or have no publicly accessible constructor and the reason for that is that inheritance violates encapsulation in a subtle way and that is to say that sub-classing violates encapsulation in the way that mere method invocation does not this is sometimes called the fragile based class problem where basically if you have one class that's implement that type of second class and you override a method in the first class it may modify the behavior of other methods in the subclass because the original implementations of those methods dispatched to the method that you just overrode and then if in a future implementation of superclass they reimplemented so that this method no longer is implemented in terms of this one but both of them are implemented in terms of some third method then you know this this new version of the superclass will break the subclass and the way to avoid that is either come clean on exactly how every method uses every other method that is document the self use patterns of a class if you've done that then you've documented it and designed it for inheritance so it's okay to make it non final otherwise it could just be final now if you look at the java api we got this wrong in many places so most of the concrete class in the j2se libraries in particularly the concrete collection classes like hash map and hash set they're all non final but they don't exactly define their self use patterns so they're a bit fragile abstract set an abstract map and the other abstract xxx classes are good they really are designed and documented for inheritance okay on to methods so this is if you remember only two things from the talk today this is the second thing by the way what was the first thing I told you to remember excellent when in doubt leave it out the second one is don't make the client do anything the module could do those two things are like the fundamental rules of API design so the worst thing you can do is write an API that just requires the client to call in and out and in and out just doing repeated calls passing junk from the first call to the second call this this causes boilerplate code which as you can see it's red it's really really bad why is boilerplate code bad because it's an opportunity for bugs you make boilerplate code by doing cut and paste and then modify but if you didn't do all the modification that you should it may still compile but it won't do the right thing it is ugly it is annoying and it's error-prone and here is a real live example from the w3c s Dom API suppose you have in hand an XML document and you want to print it to an output stream that's a very reasonable thing to do it should take one call you know print takes an output stream and you're done but it doesn't here's what you actually have to do first you have to import in addition to w3c Dom that's fine and Java dot IO you also have to import XML transform XML transform DOM and XML transform dot stream why I don't know but I do know you have to do it and then here's how the call should have looked it should have been called right doc and you know the document perhaps should have been the receiver and you pass in an output stream and it would throw an ioexception but here's what you actually have to do first you get a transformer and what is it transformer I don't know but I know you need one how do you get one well you take the the new instance method on the transformer factory and then that gives you a transformer factory and once you got the factory you ask for a new transformer so as you can see they read design patterns it's great it's like filled with patterns so now you've got your transformer out of your transformer factory and then you have to set an output property on that transformer you have to set its doctype system to be the doctype from the dock I don't know what any of this means but I know you have to do all of it and then you have to get the system a system ID and you have to that's part of this output property you're setting and then when you're all done you can actually do the output the way you do that is you call transform on the transformer and you don't just pass in the doc source because that doesn't implement the right thing you use the adapter pattern here to take your your doc and you turn it into a Dom source and then also you can't quite just pass in your output stream you wrap it in a stream result and now you get your output right except for one tiny problem which is it can throw a transformer exception when can it throw it well never actually but the API says it can and it's a checked exception so we have to catch it if it gets thrown and then we throw an assertion error because it can never happen so you know you've got like whatever it is six lines of just unreadable garbage code to do something very simple if they had started with a use case people might want to print their XML documents you get the idea another general rule when you're designing class method is don't violate the principle of least astonishment the user of an API should never be surprised by its behavior it's worth extra implementation effort it's even sometimes worth reduced performance not to surprise the users of your API because if you surprise them what will happen they'll simply do the wrong thing they'll think it does something it'll actually do something else in their program will be broken here's a real example from the thread API in Java um so we have this method called interrupted you got a thread and you want to check if it's interrupted you call thread interrupted and what does it do well it tests whether the current thread has been interrupted oh and by the way it clears the interrupted status of the current thread that's just like a little side effect it clears the interrupted status of the current thread looking at the name you know thread not interrupted there would be no way to guess that it does this but it does this and many people have you know spent hours chasing bugs because of it you know what is the primary thing that this call does it clears the interrupted status it's not an unreasonable call but it should have been named clear interrupt status and by the way it could have returned the old interrupt status as a favor to you but they named it based on sort of the second most important thing it did instead of the first most important thing it did and in doing so they violated this principle least astonishment you should fail fast whenever there's an error you should tell the user of your API as soon as possible after the error has happened ideally you should tell him at compile time you know because this way it happens in the lab or in you know here where the program is being written instead of out in the field where the program is being run and that means that I believe that static typing is a very good thing it moves errors from runtime to compile time I understand that this is another highly controversial topic but I do believe it you know I've seen it happen you know for example when we increase the static typing in Java by adding generics I found bugs in pre-existing code you know because it forced me to be more specific about the types that were expected and it told me where where things were wrong um if you are only going to be able to find out an error at runtime you want to find out the first time you do something wrong like if you pass some garbage into something else you should find out as soon as you pass the garbage in not ten minutes later here's an example of how not to do it in in in the aforementioned properties class that extends hash table if you look at the spec it's as a properties instance Maps strings to strings but if you look at this put call it takes an object key and an object value any object may be a string may be something else but it tells you right away if you pass in something that's not a string right oh if only it were so it doesn't in fact tell you right away it lets you put in any garbage you want and then ten minutes later when you try it when you call the save call to an output stream which basically takes this properties object and translates it into some garbage that isn't quite XML then and only then does it blow up with a class cast exception because you put something wrong into it ten minutes before but by that time debugging becomes almost impossible you don't know where the call was that put the garbage into your property's object you should provide programmatic access to all data that is also available in string form this is really important whenever you have a method that returns something as a string you should also have a method that returns the same stuff in programmatic form if you don't do that then clients will have to parse the string not only is that a pain in the butt but it turns the string into a de-facto part of the API you can never add information to that string because there's code out there that's parsing that string and if you change the format of the string you break that code so what you should do is along with the API that gives you the printable string you should have other api's that give you access to the excuse me to the actual information and this way you can add more information to the string later and in fact the spec should say that you are not specifying the format of the string and that anyone who writes code to parse the string is taking their lives into their hands so you know a bad example is initially the only way to get the stack trace in Java was to call this get stack try a sorry print stack trace API and people actually did go parse those things in 1/4 we finally added a get stack trace API that gives you all the same information a stack trace element consisting of the file name the line number the class name and so forth but this was a case of sort of the horse had already left the barn you should overload with care method overloading can be a good thing but it tends to be overused you should avoid ambiguous overloading that is multiple overloads that can do different things when passed the same values and and a bad example which I am guilty of here myself is tree set has two constructors one that takes a collection one that takes a sorted set the first one ignores the order of the thing that was passed in the second one says gee I'm making a tree set out of another sorted set I met as well ordered in the same way well here's the problem if you have a sorted set that's cast to a collection then you call miss constructor and you get one result whereas if you don't cast if you just pass it in you get another result so I really should not have done that way I should have done a dynamic test if the thing was instance of sortedset then I should have preserved the order so you know the basic rule here is just because you can doesn't mean you should often it's better to simply give something another name rather than overloading overloading can be a real trap you should use appropriate parameter and return types that means you should favor interfaces over specific classes you should use the most specific possible input parameter type if you accept let's say a collection but you'll blow up unless somebody passes you a set that's broken you've just taken something that could have been caught at compile time and it's dead you're catching it at runtime here's a really sort of another trap don't use a string if a better type exists in these days of you know XML and web services people always start off with strings strings coming over the web just because it started as a string doesn't mean it should stay a string you should turn it into something more reasonable and leave it there a really bad example of this that I saw in in a program years ago was a program that passed around a string for its whole duration of its execution that was either yes or no we have a good datatype for that it's called boolean you should never use floating-point types at Google you guys already know all this but never use the floating-point types float or double for monetary values they're not good enough to represent money you cannot do exact computations based in using floating point numbers binary numbers because of the fact that one tenth is not representable as a binary fraction so don't do it if you have you know an amount of money use big decimal perhaps use long big integer what have you but do not use float or double and when you're faced with the choice of using float or double you should almost always use double rather than float because it typically double will run just as fast and you know you you lose real and important precision but going down to float let's see I'm gonna have to just run through the rest of this one is really important use consistent parameter ordering across methods so you know here's an example of what not to do a real example from Unix we have two methods to copy data one is called stir n copy one is called B copy the first one takes a destination a source and a size the second takes a source a destination and a size so what happens you know if somebody assumes one ordering when they call the other method they clobber their source data with whatever garbage was in their destination array and how long does that take to find that bug probably a really really long time this is particularly important when the types of the two parameters are identical because if you switch them around you will not know at compile time good examples here are in Java util are collections the first value is always the collection being mutated or manipulated similarly an util concurrent when you have an amount of time you always specify it as a delay followed by timing it never the other way around and even if it were the other way around the compiler would tell you because these are strong types that that are incompatible along and a time unit are two different things you should avoid long parameter lists ideally you should limit them to three it's really easy to remember three things especially you should avoid long lists of identically type parameters because of the problems that I told you before if you get the order wrong your hose so here's an example of what not to do this is from the win32 api to create a window you know if you look in the middle of these whatever 15 parameters you see int x into y into n with in height so here's the you know whole string events and by the way some of these other things are also in just entered by another name so you know without support from the IDE it's pretty much impossible to use this API luckily there are a number of great techniques for shortening parameter lists one thing you can do is break up a method into multiple methods or you can create a helper classes to hold the parameters a specific example of the helper class is the Builder pattern where if you've got a constructor or a static factory that naturally would take ten parameters most of what you don't have to specify most of which have good defaults instead make a builder and then just plug in the ones you actually care about and then call a build method and that code will be much easier to write and to read you should avoid return values that demand exceptional processing in particular you should never return a null instead of a zero length array or collections here's an example something got wrong the the in the buffered image op class we have a method called get rendering hints and either it returns a rendering hints collection or it returns null and what's the consequence of this almost all code that calls this thing is wrong because it rarely returns null so people forget to code for that special case if they do get a null what happens no a pointer exception and it's completely unnecessary if they just returned a zero length collection then they wouldn't have gotten into any trouble so I think I'm actually going to skip the rest of the talk because I'm out of time in fact I've already used up five minutes more time than I have but if you have any other collection try again if you have any other questions I'll be around for a while afterwards and you can ask me anything you'd like so thanks for coming oh one other thing which is I have a have a giveaway for you which is when I gave this at oops law in the the proceedings they gave me two pages to put in a what do you call that like an extended abstract and instead of doing that I tried to do an API design by bumper-sticker I tried to basically take this entire talk and break it down into 50 little Maxim's like when in doubt leave it out or you know all programmers or API designers each with a sort of a sentence describing in a bit more detail so here pass these out amongst yourselves as best you can and I'll put it up on you know my until the JJB in case there are more of you than there are pieces of paper so thanks again for coming

Info

Channel: GoogleTalksArchive

Views: 119,783

Rating: 4.9788918 out of 5

Keywords: googlevideo

Id: heh4OeB9A-c

Channel Id: undefined

Length: 60min 15sec (3615 seconds)

Published: Wed Aug 22 2012