How To Design A Good API and Why it Matters

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Does anybody know if he's completed that talk somewhere, recorded for online viewing pleasure?

👍︎︎ 3 👤︎︎ u/mirvnillith 📅︎︎ May 25 2009 🗫︎ replies

Good talk. I like it when the speaker a) knows what they are talking about and b) can express it pretty clearly.

IMPORTANT: Anyone got that list from the end he was talking about?

👍︎︎ 3 👤︎︎ u/f3nd3r 📅︎︎ May 25 2009 🗫︎ replies
Captions
>> First, I would like to thank you and welcome you all to the latest in our series of talks on advance topics and programming languages. The purpose of these series of talks is to expose all of the great domain knowledge in programming languages that we have at Google. So that--what that means is that everybody in this audience, all of you geniuses who work for Google and myself discounted, should give a talk. So please, come to me, my name Jeremy Manson email me, IM me whatever and tell me what talk you are going to give and I will set it up and give you a talk. You don't have to be an author like Josh or some of the people we have coming up in order to do so. Josh--and we're running late. Josh didn't want me to build him up too much and give him a swell head. So, I want to introduce him. I'll just say that in a company full of geniuses, it's his star that shines probably the most brightly. And I just want to... >> BLOCH: I'll get you for that. >> I just want to introduce this man whose boot heel I am not worthy to lick, Joshua Bloch ladies and gentlemen. >> BLOCH: Normally at this point, I thank my introducer for the introduction but today I think I'll dispense with that formality. So, I should--I should also say by the way that this is unfortunately rather a long talk. I try to have short talks with one or two key ideas but this subject matter just doesn't lend itself to that. So, please hold your questions till the end and in fact, we'll probably use up the hour without questions but then I'll hang around for as long as you want and answer all the questions that you have. So, why is API design important? Well, APIs can be among a company's greatest assets. A good API is something that people invest heavily in. They do this in obvious ways and in less obvious ways. Obvious ways are may be a by product built around the APIs. They write to the API. But the less obvious ways is they learn it. They spend hours and hours--actually, they spend months learning the APIs. And once they've have done that, you know, they don't want to learn a new one because they have to unlearn everything they know and replace it with something else. And furthermore, the API is just wired throughout the infrastructure at a company. So, a successful API can make a company, can give a franchise that last. So--I guess it's about twenty-five years now and similarly a bad API can be among a company's greatest liabilities. And there are a couple of reasons for this. First of all, a bad API can cause an unending stream of, you know, support phone calls because people cannot make the thing do what it ought to do and it can inhibit a company's ability to move forward because once you have a bad API you cannot change it at will. You're pretty much stuck with it forever. You have one chance to get it right. So, that's pretty scary and with that in mind, you want to learn how to make APIs that will stand the test of time. So, now you know why API design is important but why is it important to you? Not all of you may think of yourselves as API designers. Well, it turns out that all of you are API designers. Anyone who programs a computer is an API designer. And the reason is that good programming is inherently modular and these inter-modular boundaries are themselves APIs. Furthermore, good APIs tend to get reused. If you've written a module and it's good at doing something, you now, one of these days one of your co-worker is going to need to do the same thing. And gosh, you know you've already got this module that does it but once he's using that API, you are no longer free to change it at will because if you change it you'll break his program. And if he has ten friends and they all start using it then you're really hosed. Finally, thinking in terms of API design tends to improve the quality of the programs that you write. It tends to sort of keep you from just hacking things together. It tends to make you want to write nice units, you now, that are--that are composable, that are reusable and that are sensible. Now, one other question we've got to get out of the way at the beginning is, why am I talking about API at what is billed as a language design series? And the glib answer of course is that Jeremy asked me to do it. And Jeremy used to be friend, so, of course, I said, "Yeah, of course I'll do it." But in fact the real answer is that API design and language design are very, very similar. The only real difference is that API design is constrained by the syntactic--sorry, the syntax of the language for which you're writing the API. Whereas, when you're designing a language you have the flexibility to do anything you like with the syntax. But in fact, whether you're designing a language or an API, you are creating a tool for programmers to express their intent to the machine and to other programmers, who read the program, maintain it, modify it and so forth. Finally, these days you don't really think in terms of a language or a library alone. A language and a library together comprise a platform and when you're designing a language, you design the core libraries hand in hand with the language. So, really the skill set for designing good APIs and for designing languages is pretty much the same. So, what are the characteristics of a good API? First of all, it's easy to learn and it's easy to use, even without documentation. So, a good API should be easy to memorize. It should just plain make sense. And the flip side of that is, not only should it be easy to use a good API but it should hard to misuse a good API. It should be hard or impossible to use--misuse a good API. That is basically, a good API should simply force you to do the right thing. It should be easy to read and to maintain code written to that API. The API should be sufficiently powerful to do what it has to do. Note that I didn't say the API should be powerful. It is not the case that the more powerful an API is, the better it is. It should basically be just powerful enough to do what it needs to do. But it should be easy to evolve the API over time because there will be new needs later on. So, what you want to do is want to write an API that meets its requirements and that can evolve to meet future requirements. And finally, the API has to be appropriate to the audience. What is a good API for let's say a Wall Street analyst is probably not a good API for a physicist because they have different terminology. They think differently. So their API has to be aimed at its audience. So now, we know what the characteristics are. How do we achieve them? And that's what's the rest of this talk is about. The talk is divided into five sections. The first one is on the process of the API design. I'm not a big process weenie but I found over the years that there are certain things that all good API designs have in common in terms of the process used to create them. So, I'll try to get over that. Then, the general principles of API design, then those principles as they apply to classes, as they apply to methods, and as they apply to exceptions. And finally, if I have time I'll show a couple of refactorings where we improve API designs. So, what is the process of API design like? Well, the first thing you got to do is you got to gather the requirements but do it with a healthy dose of speck--of skepticism. Because often when you ask people, you know, "Well, what does this API have to do for you?" What you'll get won't be a real set of requirements, it'll be a set of proposed solutions and a better solution may exist. So, you know, if someone tells you, let's say, "We need to precisely control the garbage collection intervals and the maximum time that each garbage collection can take." You know, that's not really a requirement. I mean, the requirement is, you know, we need to be able to run a server smoothly while any garbage collection takes place. How you choose to achieve that is up to you. Your job is to extract the requirements from the stakeholders in your API and often it's real give and take process. Once you've got the requirements, they should take the form of used cases, and by used cases, I simply mean, the problems that your API should be able to solve. And these are extremely important because they provide the benchmark against which you can measure any proposed solution. One thing you should keep in mind is that it can be easier and more rewarding to build a more general solution than what you've been asked to do. This doesn't mean you should just say, "Oh, I'm going to build a framework," every time someone asks you to do something. But sometimes the more specific thing is more difficult to build as well as being less powerful. So, always keep an open when you're--when you're looking at those initial requirements. Oh, let's see, I guess I have another example here of what they say and what they mean. When they say, "We need new data structures and RPCs with the Version 2 attributes," this actually happened to me at a company called Transarc, you know, when we were kind of upgrading. They said, you know, "Make a whole new set of data structures and a whole new set of APIs." What they really meant was, "We need a new data format that will accommodate all further evolution in the internal data structures." You know, because you don't have to want to have to make a whole new set of data structures and a whole new set of on the wire and on disk interfaces every time you decide to add a few attributes to your server. So in fact, I made the system much more dynamic and we never had to do that again. So, you should start with a really short spec, one page is ideal. At this stage in an API design, agility definitely trumps completeness. The worst thing you can do is to send six smart guys off into a room and have them sit there with the door closed for six months and come out with a 240 specification document. And believe me this has been done many, many times. It's an awful idea because at that point, first of all, their ego is invested in what they've just done. They're going to build it even if it's a piece of crap. And second of all, you know, how do you know if it's any good? It's like this big, long, hairy spec. It's no longer agile. If they made a fundamental mistake, then you've got to change all 240 pages of it. But it may fail to satisfy some sort of key requirement that they didn't really understand before they started. So, what you want to do in the beginning is have the entire API spec on one page. In this way, you can bounce that spec off as many stakeholders as possible. Listen to what they have to say and take it seriously. If they say, "No, I'm sorry, this won't do for me because I cannot write such and such a program," think about it. You know, you may say, "Well, you shouldn't be writing that kind of program. It's a really bad idea." But more likely, you may say, "Oh gee, I didn't think of that. This really is important. What if we structure it this way?" The whole thing is only a page long. You can do major refactorings in ten minutes. If you keep the spec short, it's easy to modify. And you flesh that spec out only as you gain confidence that you're on the right track. And this necessarily involves coding. In particular, it involves coding to the API that you are defining. It doesn't involve implementing the API. It involves pretending it's already been implemented. So, what does it look like when you do it right? Well, here's an example that I was writing at about the time that I was putting this talk together for OOPSLA some number of months ago. Someone wanted the ability to retry a computation in the face of failure. And I said, "Oh well, you know, we have this executor framework, otherwise known as the executor framework." And really, all we want here is a retry policy that tells you how you might choose to retry the thing in the face of failure. So, you know, here's a little interface and it's got a couple of methods. One tells you, if a given failure is recoverable. You pass in the exception and it just gives you true or false. We should try to recover from this one or we shouldn't. And the second one computes the next delay in terms of, the initial start time and the number of previous retries and by passing in all these data, the actual retry policy can be stateless. So, you can have singletons. You can have a retry policy called exponential backoff. But you're going to have to store any data in each exponential back-off instance. It really is just a retry policy and it's called exponential backoff. And that's kind of all there is to that one. And this isn't really very complete. This is not, you know, a spec of a quality that I could use for Java doc. It's not a spec that someone could use to implement to but it's a spec that's good enough for someone to look at it and say, "Yes. It does do what I need or it doesn't." The rest of it is on another slide here. This is a set of static utilities that lets you actually use retry policies. So, what can you do? Well, the first thing you can is you can pass in an executor service and retry policy and get a retrying executor service. It implements the same interface, which is executor service. So, if you already know how to use an executor service, you know how to use the retrying executor service and that's great by the way. That's a good way to keep the sort of the conceptual weight of an API small. Use interfaces that have already been designed--defined, in this case, executor service. And what else do we have? We have another kind of a retrying executor service. What are--what are the difference between the first two? I haven't looked at this for a while so, I apologize for that. Anyway, it doesn't--it doesn't matter. And then, we have a couple of wrappers, one of which takes a callable and returns a retrying callable and takes a runnable, returns to retrying runnable. And then, we have a couple examples of the retry policies themselves. These are static factories, we have, if you want an exponential backoff, you get one. And these are the parameters that describe your exponential backoff. What's the initial delay? In what unit, you know, 10 seconds, a hundred milliseconds, whatever and then, the timeout, which I'm not sure what that means. So, by the way, this is actually interesting. This tells me this wasn't quite good enough, you know. What I wanted to show you is that something that's really simple. It fits on the slide or two, is enough to communicate your intent and enough to sort of, figure out whether it's good enough and you know, try it out. And I think the answer here is it's almost good enough but not quite good enough. I guess, the timeout probably, is the overall timeout. Like after how long of trying and retrying do you finally give up. But I think it should have said that somewhere on this. But anyway, you get the idea. The idea is that this is a very small description of an API. But it's big enough to find out if it's good enough to do what needs to be done. And if it's not, it's easy to modify it. You should write to the API early and often. Of course, you should start before you've implemented the API because this saves you from having to throw away an implementation of a bad API. You know if you--if you first specify it, then implement it, then try the implementation and decide that the API was garbage, well, you've wasted lots of time implementing it. And as I said, you should start before you've even specified it properly because that saves you from having to throw away detailed specifications for broken APIs. You should continue writing to the API as you flesh it out and this is important. Some people, sort of stop writing to the API about halfway through the process and just go on this death march to implementation. The problem with that is you get nasty surprises about a week before you ship when you try writing to it again and you find that, you know, "Oh, my gosh. It actually doesn't solve this important case that we thought it solved." And some people worry. They worry that, you know, that all these coding to the API is a waste of time when they should be implementing it. But that nothing could be further from the truth. Those initial pieces of code that you write to any API are among the most important pieces of code that you'll ever write to it. The code lives on in the examples that you publish for how to use that API. And those examples tend to get emulated heavily. If you get them right, you seeded the market with good uses of your API. If you get them wrong, conversely, you know, you've ensured that there will be broken programs floating around the web for the next ten years. And I used to, you know believe this with all my heart and soul. But now, I actually can point to a proof of it. It turns out in the last OOPSLA, there was a paper published called Design Fragments by Fairbanks, Garlan and Scherlis from CMU. And they actually traced mistakes in the original applets that were shipped out with, you know, the first release of Java into broken concurrent programs, thousands of them that still exist on the web today. So, you know, the way I put this is, example program should be exemplary. There's a reason they call them example programs. And the programs that--the first programs that you write to an API, as you are fleshing it out, invariably become the example programs. You know, so, you know, my rule of thumb is you should spend ten times as much time on every line of example code as you do production code. That may sound backwards to you but I really believe it. Writing to an SPI, that is a Service Provider Interface is even more important than writing to any other kind of API. You all probably what SPIs are or maybe you don't. If you know an SPI is, raise your hand. Okay. So, an SPI is a special kind of an API which you use to provide a new means of doing something. It's not the API which the programmers write to rather, it's the plug-in API that, let's say, lets--RSADSI Incorporated provides their encryption methods and Sun Microsystems provide their encryption methods. So, the user of this encryption API can use a higher level API, which then dispatches to these encryption methods. And the thing about SPIs is that you're supposed to be able to hide very, very different implementations underneath them. And if you, you know, write an SPI and you only have one provider it turns out that is a practical matter, you will probably never be able to support another. Once you try to do the second, you'll find that there's something about that SPI that ties it forever to the only implementation you actually thought about. If on the other hand, you do two implementations rather than one. Then, you'll probably be good enough. You'll probably be able to support subsequent ones with some difficulty. But if you do three, it will probably work fine for any number. If it works for three, the fourth won't all be that different from each of the first three. I found this out myself and then, I saw it in a book, Will Tracz discovered this and called it The Rule of Threes in a book, which was subtitled Confessions of a Used Program Salesman because it was a book about a software reuse. And by the way, here we explain the subtle coding that is used throughout this talk. Whenever you see something green, it means this is good, do it this way. And when you see something in red, it means this is bad, don't do it that way. You should maintain realistic expectations throughout the process of API design. It turns out that most API designs are over constrained. People want them to do more than they can possibly do. So, you are going to have to make compromises. You cannot please everyone. If you try to please everyone, you come up with a pig. You come with big, nasty APIs that no one will ever be able to use properly. So, what you should do? And this may sound strange, is you should aim to displease everyone equally. The idea is that, you know if one of your important stakeholders is very displeased and the others are really pleased, that's probably a problem because your API isn't doing something it has to do. If everyone is like less than a 100% happy but they're all happy enough then, you've probably done the right thing. But do not misinterpret this as saying I favor design by committee and you should take everyone's ideas and mush them altogether. You do need one sort of strong design lead that can ensure that the API that you're design is cohesive and pretty and clearly the work of, you know, one single mind or at least a single minded body. And that's always a little bit of a trade off, being able to satisfy the needs of many costumers and yet produce something that is, you know, beautiful and cohesive. Expect to make mistakes. API design is hard. Now luckily, a few years of real world use will always flush out the mistakes. Unfortunately, by that time it's like too late to do anything about them. Although, you can write nice talks and tell people about the mistakes so they don't make them again and it's kind of what I'm doing here today. So, given that you're going to make mistakes and you're going to be stuck with the original API, write the API so that at least you can sort of add to it and produce something that will help you get around the shortcomings of your original designs. The recent example of that in my life is in the collections API which I did around 1997, 1998. There were some real flaws in the sorted sets and sort of map implementations. In particular, you know, they're a little bit asymmetric. It's much easier to a forward than backward and there are couple other things. I knew about these flaws at that time but I didn't know how to fix them. However, we were able to extend that API. So, if you look at the most recent release of Java 6. You'll see something called the Navigable Set and Navigable Map, which extends sorted set and sorted map and provide, you know, additional methods that basically fix those difficulties. You know and it's not a perfect solution because there are some things that implement the old--the older interfaces and they're not fixed yet. But at least all the standard collection implementations from Sun now are fixed. So, what are the general principles of good API design? First all, an API should do one thing and do it well. And I should say that almost all of what I'm going to say for the next five minutes may just sound like motherhood and apple pie. But, there's more to it than that. I'm going to try to give you actionable advice. I'm going to try to take the sort of the standard old souls like in API should do one thing and do it well and see what it really means and tell you how to achieve it. So, you know, in this case, the functionality should be easy to explain. If it's not easy to explain, then it's not doing one thing and doing it well. It's a mess. If you can't come up with a good name for it, then it's a mess. The names are the API talking back to you, so listen to them. When you try to name those methods and those classes, you know, if you come up with the really complicated name like DynAnyFactoryOperations or underscore BindingIteratorImplBase switch, you know, actually violates the naming conventions of platen--platform, sorry. ENCODING_CDR_ENCAPS, you know, clearly you've got problems. Any API that contains this sort of stuff is a mess. Oh, what about this, OMGVMCID? I know OMG is, "Oh, my god," but I can't--I can't figure out the rest of it. And by the way least you think I'm just making this stuff up, all of these comes from an actual Java platform API. And I won't tell what it is except to say that it's club Good API names are like a font. Yeah, I know what a font is. Sure, you know, it's like, it's italic or bold or whatever. You know, a set, I know what a set is. A private key, a lock, a ThreadFactory, these things are, you know, they instantly communicate what they are. And the methods, you know the classes all of them should be like that. Looking at them, you know, it should be clear what they are. And good names drive good designs. You know, once you have something that's called a set. You know what the operations are. You insert things from sets, you remove, you test for containments. So, good names drive good designs and bad names are an indication of bad designs. So, listen to those names speaking to you. And if you just you can't get it to work out right, then you probably not trying to build something reasonable. So, always remain amenable to splitting a module up if you're trying to cramp too much into a module, or to putting multiple modules together, if you're trying to expose sort of internal details that ought to be hidden. Maybe you should just make a bigger module that hides some of those details. An API should be as small as possible but no smaller. This principle is usually attributed to Einstein. Although I looked really, really hard and I don't believe he ever said it. I believe that it's, you know--he probably believed it but he didn't say it. At any rate the API should satisfy its requirements and if you only remember one thing from the talk today, please remember this. When it doubt, leave it out. That applies to everything. It applies to functionality, to classes, to methods, to parameters within a method, anything. If you have any doubts about whether to include something, leave it out. You'll probably be able to add it later, but you most certainly will not be able to take it out once you put it into an API. Once you put it in people will be using it, if you take it out they will scream bloody murder. So, if you ever have any doubts about whether to include something, leave it out. And if that's all you take away with you from today's talk, then, I think it's been worth your hour. Of course, that's just my judgment. Anyway, the--when you're thinking about the size of an API, the conceptual weight is more important than the bulk. By bulk, I mean the number of methods, the number of class, the number parameters. What's really important is the number of concepts. When I'm learning this API, how many different things do I have to learn about? And there are a number of ways to decrease the conceptual weight of an API. The most important one is reusing interfaces. So, for example, if you look at the collection's framework, there are many implementations of the set interface, you know, whether it's TreeSet. The original ones were, you know, a HashSet, TreeSet and then we add a LinkedHashSet and more recently a whole slew of concurrent set implementations. You don't have to learn any new APIs. You learn the Set API and we can add new functionality, you know, we can richness without making you learn any new API. So, that's one of the great ways to increase the power to weight ratio. That's the important thing. You want to be able to do a lot without learning a lot. The implementation shouldn't impact the API. Once again this is motherhood but what does it really mean? It means don't put any implementation details into the API. They confuse the users and they inhibit their freedom to decide. They inhibit the implementer's freedom to change the API later. So, you know, for example, let us say, if you have some API that's about phone numbers but it throw a sequel exceptions and now you want to re-implement it on top of some proprietary data store rather than a sequel data store. But your clients are already trying to catch the sequel exceptions. You know, what do you do? Well, you can emulate those olds sequel exceptions but that's crazy. So that means that you should make sure that the exceptions that you throw are kind of at the same layer of abstraction as the rest of the API is. That's just one example where implementation details kind of leak into APIs. The important thing is you have to be aware of what actually is an implementation detail. You don't want to over specify your methods. You don't want the specification of a method to involve something that is in the implementation detail and that you would like to be able to change later on. So, here's an example where we did that. Don't specify your hash functions. You might think that, you know, exactly what value is returned by the hash code method is a proper part of a spec but it isn't. It's an implementation detail. The spec should simply say it returns an integer and, you know, with high probability the integer will differ for two different objects and furthermore it should be cheap to calculate the thing. But exactly what number is returned you should have the flexibility to change that from release to release as you learn about flaws in your old hash functions and as the technology improves and hash functions improve. Of course, you cannot do that if you're writing a persistent data store. If those hash functions are going to be use to store data on disk then they can't change. But that's a very special kind of hash function. Those ones must be specified but the great majority of hash functions out there shouldn't be. And we got this very wrong in initial releases of Java and unfortunately that tradition has stuck to the point where almost all of these hash functions are specified. But they really shouldn't be. Finally, you shouldn't let the implementation details kind of just leak into the API. The example I gave you before with an exception is an example where, you know, you didn't really think hard about them and say, "Oh yeah, you know, we should maybe throw a sequel exception here." You probably just wrote it and you realized, "Gee, I'm calling something that throws a sequel exception, so I have to propagate it out." That's a case where an implementation detail is just sort of leaking. Another example, a really notorious example in Java is, if you simply say, "Implements serializable." Once you've done that, your entire implementation has just sort of leaked out as part of your API because the serial form consists of the entire field that comprise your object even your private fields, so all of the sudden the private fields are part of the public API and that's really, really bad. And the way around that by the way is to design your serial forms carefully. Don't just say implement serializable. You should minimize the accessibility of everything. That means you should make your classes, your members, your fields, all, as private as possible. One specific case is that public classes should have no public fields with the exception of their constants, which aren't really fields. This maximizes what they call information hiding. Parnas is the guy who came up with that term. And it minimizes--minimizes the coupling between APIs. You know, if things are kind of hidden behind inter-modular boundaries, they can be change freely. And this allows modules to be understood, to be used, to be built, to be optimized, debugged, tested, and what have you individually and in parallel. So you can have multiple teams, you know, dealing with multiple APIs concurrently. If on the other hand the APIs sort of expose everything and each, you know, module is sort of messing around with other modules, then there is very little that you can do to any module without affecting a whole slew of modules around it. Names matter a lot. There are some people who think that names don't matter and, you know, when sit down and you say, "Well, this isn't named right." They say, "Don't waste your time. Let's just move on. It's good enough." Names in an API that are going to be used by anyone else and that includes yourself in a few months, matter an awful lot. The idea is that every API is kind of a little language and people who are going to use your API have to learn that language and then speak in that language and that means names should be largely self explanatory. You should avoid cryptic abbreviations. So the original Unix names I think, failed this one miserably. You should be consistent. It's very important that same word means the same things when you used repeatedly in your API. And that you don't have multiple words meaning that same thing. So let us say you have a remove and a delete in the same API. That's almost always wrong. You know, what's the difference between remove and delete? Well, I don't know. When I listen to those two things, they seem to mean the same thing. If they do mean the same thing, then call them both the same thing. If they don't, then make the names different enough to tell you how they differ. If they were called, let's say delete and expunge, I would know that expunge was a more--a permanent kind of removal or something like that. Not only should you strive for consistency but you should strive for symmetry. So, if your API has, let's say two verbs, add and remove, and two nouns, entry and key, I'd like to see, you know, add entry, add key, remove entry and remove key. If one of them is missing, there should be a very good reason for it. I'm not saying that all API should be symmetric but the great bulk of them should. And if you get it right, the code will read like prose. That's the prize. So, you know, in this case the code reads, "If the car's speed is more than twice a speed limit and the speaker should generate in alert that says watch out for cops." That's pretty much English. It reads like prose and that's an indication that API is reasonably decent. Documentation matters as well. And Parnas, the aforementioned Parnas said it much better than I could, so I'm simply going to read what he had to say. He said, "Reuse is something that is far easier to say than to do. Doing it requires both good design and very good documentation. Even when we see good design, which is still infrequently, we won't see the components reused without good documentation." He said that in 1994. And I don't know about you but, you know, when I--when I read that I get religion. And the only thing I can do then is to document religiously. Document every single class, every interface, every method, every constructor, every parameter and every exception in my public API. Of course all of you do that, right? And when you go out on the web, whenever you look Javadoc, it's always the case that every public or protected method has a Javadoc comment, right? No. You know, it's really terrible because if you don't have a comment telling you what the specification is, what is the specification? Who knows? You have two choices. Either you guess, in much case your program probably doesn't work, or you read the code in which case, the implementation becomes the specification and it's over specified and you no longer have the freedom to change that implementation at all. So document everything and there are--you know, what does this mean for classes? Just tell me what an instance of that class represents. From methods, tell me the contract between the method and its client. That is what must be true before I call it. What will be true after it returns and any side effects. Those are particularly important. If you had side effect and you don't document them, people will shoot themselves in the foot. I'll give you an example of this later. For parameters, people often forget. Don't just say, "Hmm, the size of the block." It's the size of the block in bytes or in megabytes or whatever. You've got to tell me what the units are, the form, if it's a string, especially. I've got to know. Is it XML? You know what form is this string in and finally, the ownership. If I'm passing an object into an API, do I still only object? Am I free to modify it after I passed it in or have I transferred ownership from my self to the object to which I passed that other object? If the thing that you're defining is mutable, that is, can be modified, then you must document the state space very carefully. If you have a badly documented space--state space, then you have no hope of being able to use that API properly. Because people won't know when it's legal to call what and what will happen after the call is made. I may discuss this elsewhere but an example of how to do this very badly, our date and calendar, in particular the calendar API in Java. The state space is almost undocumented and it caused numerous bugs over time. I'm happy to say by the way, that just days ago, Sun decided that we would be pursuing any JSR to improve the date and calendar APIs based in part on [INDISTINCT] time and who knows what or else. But to me--we may finally be, you know, free of that mess. You should consider performance consequences of API design decisions. And this is funny because this tends to, you know, contradict the advice. You've all heard that, you know, a premature optimization is evil. In fact, I have an old essay about that in effective Java. However, that doesn't mean that you can just ignore performance. Jon Bentley was big on this fact. And in particular, it turns out that bad API decisions can limit performance. Examples of things that can limit it are making the type mutable when it should be immutable or vice versa, providing a constructor instead of a static factory, using implementation type in an API instead of an interface, which means that people will always have to use that particular implementation even if a better one comes along later. But the converse of this is, never work an API to gain performance. Every once in a while you have something that sort of temporarily broken. You know, this thing is slow and in order to avoid this slow thing, you break your API. The thing that used to get slow becomes fast but your API is still broken, you know. So, design your APIs for the long haul. Luckily, good API design usually coincides with good performance. Here's an example of an API design decision that led to bad performance. In the original AWT, there was something called a dimension. If you had a component and you ask for its size, you got back a dimension object, which contained two coordinates. It's simply couple of longs that were wrapped. The problem is that the dimension object was mutable. Those longs weren't really wrapped. You know, they were--they were publicly visible, mutable fields. And what that meant was every time you called getSize, you had to allocate a new dimension object because otherwise I might, you know, get the dimension object, give it to you, you might ask for it and then you might modify it modifying his copy as well. And that would be really bad. Now these two, you know, independent threads or computations will be tied together in nasty ways. Is that bad? You know, is it really expensive to allocate a little object containing two longs? No, it's dirt cheap but unfortunately this thing gets called millions, literally millions of times in a goyap So, you know, all of a sudden, you're basically allocating megabytes of garbage and that really does cost you. It's garbage collector pressure that you just don't need. And it was fixed in 1.2 by adding methods that return each dimension individually as a primitive type which is in fact immutable. But, you know unfortunately, old codes that used the 1.1 APIs is still slow and will always be slow. The APIs that you write have to coexist peacefully with the platform. So, that means that you have to do what's customary for that platform. You have to obey its naming conventions. You have to avoid anything that's just verboten in that platform. You know, whatever it is, if you're in java, you know, if you are in C++, everybody knows that there are certain things that you just shouldn't do or shouldn't use. So, learn what those things are and then avoid them. And there are--there are generally books that tell you the traps and pitfalls for every platform. I could recommend one for Java but, you know. Anyway, the thing you've got to do is mimic the patterns and the core APIs in the platform, because everyone who uses a platform knows its core APIs. So, if your API feel just like one of the core APIs, then everyone, because they already know how to use core API will already know how to use your API. It's as simple as that. And the real trap here is you should never simply transliterate APIs. That is the worst way to design an API. And what do I mean by that? Suppose you have a C++ API, and you want a Java version of the same facility. What you should not do is take every C++ class and make a complementary Java class that contains all the same methods as the C++ class because what was reasonable in C++ will almost certainly be unreasonable in Java and vice versa. So, you have to basically take a step back. You have to say, "What is this class doing and how would I do this in Java?" That's the right way to do it. Transliterated APIs are almost always broken. Okay. On to class design, I have 15 minutes to finish the whole--rest of the talk. So that gives me five minutes each for the next three sections and zero minutes for the last section. So, first of all minimize mutability. Classes should be immutable unless there's a very good reason to make them mutable. And the advantages are that the class you get are simple, they are thread safe, they are--instances of reusable. You never have to generate a new one. The only disadvantage is that you need a separate object for each value. So, if you a huge object, let's say a big integer that's a million digits long and you want to throw it away but you just want to change a last bit, if it were mutable, you could do that in place, you know, at virtually no cost. But because they're immutable, you have to basically copy a megabit of data and then throw away the old megabit, which is a little bit unpleasant. But if you do have to make things mutable, and you often do, you should still make them as immutable as possible. You should give them a nice small, well defined state space. You should make it clear--you should make it clear when it's legal to call which method. So, bad examples, as I mentioned before, are date and calendar. Calendar has this, you know, like the roll method and when you, you know, you put date into it, who knows what state is behind it? When it's legal to call what? What state the calendar is in after if you've used it? And, you know, did you know that when you're using a date formatter, if one thread uses a date formatter and other thread tries to use at the same time, both threads are hosed. You know, it kind of feels like an immutable object, except that it has state inside it. You can't read that state but it's there. You know, so, these are--these are things that are more mutable than they should be. A good example is TimerTask. It's not immutable, but it minimizes mutability. In particular, a TimerTask is inherently mutable because it represents an actual or current computation and that mutates. But, you know, a TimerTask, what you do? You create it, you schedule it, it runs as many times as it has to run, and then it's dead. It's gone. End of story. Now, there were people, who asked us, when we were designing the timer API, "We want to reuse them, you know. It's like expensive to make a TimerTask." And the answer is, "No, let it die in peace." If you need another one, make another one but by eliminating that sort of loop from the state space, you make the API much simpler and much less bug-prone. You should subclass only when it makes to do so. This is the Liskov Substitution Principle. And it's actually very simple. You just have to ask yourself a question. When you have two classes, in a public API, and you're thinking of making one of subclass to another, like through a subclass of bar, ask yourself, "Is every foo a bar?" If you can answer that with a straight face, then make it a subclass. If not, don't. So, a bad example is, in Java properties extends Hash table. Is every property's object a Hash table? Heavens, no. A property's object is a special thing that map certain strings to certain other string. So, every property's object has a Hash table, perhaps. Typically, it's implemented a top Hash table but you can't answer that answer--that is a question with a straight face or what about Stack extends Vector? Is every stack a vector? No. You push and pump on stacks and that's pretty much all you do them. You might also have a peek method and a size method. But, you know a vector allows random access, accessing by index, a stack doesn't. So, it was really wrong to have Stack extends Vector. And the really bad thing about it was they took away great piece of real estate. We can't use the name stack for a class that actually does implement stack anymore because the name has been taken. Good is Set extends Collections. Is every set a collection? Yes, it really is. Set is just a special kind of a collection. It's a collection that does not allow duplicate elements. This is a fairly controversial one. But I believe that you should design and document a class for inheritance or else prohibit it outright, to make the class final or have no publicly accessible constructor. The reason for that is, is that in heritance violates in encapsulation in a--in a subtle way. And that is to say that subclassing violates encapsulation in a way that mere method invocation does not. This is sometimes called the fragile base class problem, where basically if you have one class, it's implemented atop second class. And you override a method in the first class. It may modify the behavior of other methods in the subclass because the original implementations of those methods dispatched to the method that you just overrode. And then if in the future implementation of the super class, they re-implement all of these methods, so that this method no longer is implemented in terms of this one but both of them are implemented in terms of some third method. Then, you know, this new version of the super class will break the subclass. And the way to avoid that is either come clean on exactly how every method uses every other method. That is document the self-use patterns of a class. If you've done that, then you documented it and designed it for inheritance. So, it's okay to make it non-final. Otherwise, it could just be final. Now, if you look at the Java API, we got this wrong in many places. So, most of the concrete class in the J2SE libraries, in particular, collection classes like, HashMap and HashSet. They're all non-final, but they don't exactly define their self-use patterns. So, they're a bit fragile. Abstract set and abstract map and the other abstract xxx classes are good. They really are designed and documented for inheritance. Okay, onto methods. So, this is--if you remember only two things from the talk today, this is the second thing. By the way what was the first thing I told you to remember? >> [INDISTINCT] >> BLOCH: Excellent. When in doubt, leave it out. The second one is, don't make the client do anything the module could do. Those two things are like the fundamental rules of API design. So, the worst thing you can do is write an API that just requires the client to call in and out, and in and out just doing repeated calls, passing junk from the first call to the second call. These causes boilerplate code, which as you can see it's red. It's really, really bad. Why is boilerplate code bad? Because it's an opportunity for bugs, you make boilerplate code by doing cut-and-paste and then modify. But if you don't do all the modification that you should, it may still compile but it won't do the right thing. It is ugly, it is annoying, and it's error prone. And here is a real live example from the W3Cs DOM API. Suppose you have in hand an XML document and you want to print it to an output stream. That's a very reasonable thing to do. It should take one call, you know, print, it takes an output stream and you're done. But it doesn't. Here is what you actually have to do. First you have to import, in addition to w3c.dom, that's fine and java.io, you also have to import XML Transform, xmltransform.dom and xmltransform.stream. Why? I don't know but I do know you have to do it. And then here's how the call should've looked. It should've been called Right Doc and, you know, the document perhaps should have been the receiver and you pass in an output stream and it would throw an IO exception. But here's what you actually have to do. First, you get a transformer and what is a transformer? I don't know, but I know you need one. How do you get one? Well, you take the new instance method on the transformer factory and then that gives you a transformer factory and once you get the factory you ask for a new transformer. So, as you can see, they read design patterns. It's great. It's like filled with patterns. So now, you've got your transformer out of your transformer factory and then you have to set an output properly on that transformer. You have to set its doc type system to be the doc type from the doc. I don't know what any of these means but I know you have to do all of it. And then, you have to get the system I.D. and you have to--that's part of this output property that you're setting. And then when you're all done, you can actually do the output. The way you do that is you call transform on the transformer. And you just don't pass in the doc source because that doesn't implement the right thing. You use the adaptor pattern here to take your doc and you turn it into a DOM source and then also you can't quite just pass in your output stream. You wrap it in a stream result and now you get your output, right? Except for one tiny problem, which is it can throw a transformer exception? When can it throw it? Well, never actually but the API says it can and it's a checked exception, so we have to catch it if it gets thrown and then we throw an insertion there because it can never happen. So, you know, you've got like, whatever it is, six lines of just unreadable garbage code give you something very simple. If they had started with a used case, people might want to print their xml documents. You get the idea. Another general rule when you're designing class--a method is, don't violate the Principle of Least Astonishments. The user of an API should never be surprised by its behavior. It's worth extra implementation effort. It's even sometimes worth a reduced performance not to surprise the users of your API because if you surprise them, what will happen? They'll simply do the wrong thing. They'll think it does something, it'll actually do something else and their program will be broken. Here's a real example from the thread API in Java. So, we have this method called interrupted. You got a thread and you want to check if it's interrupted. You call thread.interrupted. And what does it do? Well, it tests whether the current thread has been interrupted. Oh, and by the way, it clears the interrupted status of the current thread. That's just like a little side effect. It clears the interrupted status of the current thread. Looking at the name, you know, thread.interrupted, there'll be no way to guess that it does this. But it does this and many people have, you now, spent hours chasing bugs because of it. You know what is the primary thing that this call does? It clears the interrupted status. It's not an unreasonable call but it should've been named clear interrupt status and by the way, it could've returned the old interrupt status as a favor to you. But they named it based on--so, the second most important thing it did, instead of the first most important thing it did. And in doing so, they violated this Principle of Least Astonishment. You should fail fast. Whenever there's an error, you should tell the user of your API as soon as possible after the error has happened. Ideally, you should tell him at compile time, you know, because this way it happens in the lab or in the, you know, here where the program is being written, instead of out in the field where the program is being run. And that means that I believe that static typing is a very good thing. It moves errors from runtime to compile time. I understand that this is another highly controversial topic. But I do believe it. You know, I've seen it happened. You know, for example, when increased the static typing in Java by adding generics. I found bugs in preexisting code, you know, because it forced to me to be more specific about the types that were expected and they told me where things were wrong. If you are only going to be able to find out an error at runtime, you want to find out the first time you do something wrong. Like, if you pass some garbage into something else, you should find out as soon as you pass the garbage in, not ten minutes later. Here's the example of how not to do it. In the aforementioned properties class that extends hash table. If you look at the spec, it has properties instance maps strings to strings. But if you look at this put call, it takes an object key and an object value. Any object, maybe a string, maybe something else, but it tells you right away if you pass in something that's not a string, right? If only it were so. It doesn't, in fact, tell you right away. It lets you put in any garbage you want and then, ten minutes later when you try--when you call the save call to an output stream, which basically takes his property's object and translate into some garbage that isn't quite XML. Then and only then does it blow up with a class cast exception because you put something wrong into it ten minutes before. But by that time, debugging becomes almost impossible. You don't know where the call was that put the garbage into your properties object. You should provide programmatic access to all data that is also available in string form. This is really important. Whenever you have a method that return something as a string you should also have a method that returns the same stuff in programmatic form. If you don't do that, then clients will have to parse the string. Not only is that a pain in the butt but it turns the string into a de facto part of the API. You can never add information to that string because there's code out there that pausing that string. And if you change the format of the string, you break that code. So, what you should do is along with the API that gives you the printable string, you should have other APIs that give you access to--excuse me, to the actual information and this way you can add more information to the string later. And in fact, the spec should say that you are not specifying the format of the string and that anyone who writes code to parse the string is taking their lives into their hands. So, you know, a bad example is, initially the only way to get the stack trace in Java was to call this gets stack--sorry, prints stack trace API and people actually did go parse those things. In 1.4, we finally added get stack trace API that gives you all the same information, a stack trace element consisting of the filename, a line number, class name and so forth. But this was a case of sort of the horse had already left the barn. You should overload with care. Method overloading can be a good thing but it tends overused. You should avoid ambiguous overloading. That is multiple overloading that can do different things when passed the same values. And a bad example, which I am guilty of here myself, is TreeSet has two constructors; one that takes collection, one that takes a sorted set. The first one ignores the order of the thing that was passed in. The second one says, "Gee, I'm making a tree set out of another sorted set, I might as well order it in the same way." Well, here's the problem. If you have a sorted set that's cast to a collection, then you're calling this constructor and you get one result. Whereas, if you don't cast it, if you just pass it in, you get another result, so I really should not have done that way. I should've done a dynamic test. If the thing was the instance of sorted of set, then I should've preserved the order. So, you know, the basic rule here is just because you can doesn't mean you should. Often, it's better to simply give something another name rather than overloading. Overloading can be a real trap. If you use appropriate parameter and return types that means you should favor interfaces that were specific classes. You should use the most specific possible input parameter type. If you accept, let's say, a collection but you'll blow up unless somebody passes you a set that's broken, you've just taken something that could've been caught at compile time and instead you're catching it at runtime. Here's a really, sort of, another trap, don't use a string if a better type exist. In these days of, you know, XML and web services, people always start off with strings. Strings come in over the web. Just because it started as a string doesn't mean it should stay as string. You should turn it into something more reasonable and leave it there. A really bad example of this that I saw in a program years ago, was a program that passed around a string for its whole duration of its execution, that was either yes or no. We have a good data type for that it's called boolean. You should never use floating point types at Google. You guys already know all these but never use the floating point types, float or double for monetary values. They're not good enough to represent money. You cannot do exact computations base 10 using floating point numbers--binary numbers because of the fact that 1/10th is not representable as a binary fraction. So, don't do it. If you have you know, an amount of money, use big decimal. Perhaps, use long big integer or what have you but do not use float or double. And when you are faced with the choice of using float or double, you should almost use double rather than float because, you know, typically double will run just as fast and, you know, you lose real and important precision by going down to float. Let see, I'm going to have to just run through the rest of this. This one is really important. Use consistent parameter ordering across methods. So, you know, here is an example of what not to do. A real example from Unix. We have two methods to copy data. One is called store and copy. One is called b-copy. The first one takes a destination, a source and a size. The second takes a source, a destination and a size. So, what happens, you know, if somebody assumes one ordering when they call the other method? They clobber their source data with whatever garbage was in their destination array. And how long does that take to find that bug? Probably a really, really long time. This is particularly important when the types of the two parameters are identical because if you switch them around, you will not know at compile time. A good example here is in Java.util.Collections, the first value is always the collection being mutated or manipulated. Similarly, in util concurrent, when you have an amount of time, you always specify it as a delay followed by time unit never other a way around. And even if it were the other a way around, the compiler would tell you because these are strong types that are incompatible. A long and the time unit are two different things. You should avoid long parameter list. Ideally, you should limit them to three. It's really easy to remember three things especially, you should avoid long list of identically typed parameters because of the problems that I've told you before. If you get the order wrong, you're hosed. So here's an example of what not to do. This is from the Win3 to API, to create a window. You know, if you look in the middle of these, whatever 15 parameters, you see Int x, Int y, Int end with, Int height. So, here's, you know, a whole string of Ints and by the way some of these other things are also Ints, just Int by another name. So, you know, without support from an IDE it's pretty much impossible to use this API. Luckily, there are number of great techniques for shorting parameter list. One thing you can do is break up a method into a multiple methods, or you can create a helper classes to hold the parameters. A specific example of the helper class is the builder pattern, where if you got a constructor or a static factory that naturally would take 10 parameters, most of which you don't have to specify, most of which have good defaults. Instead, make a builder and then just plug-in the ones you actually care about and then call a build method. And that code will be much easier to write and to read. You should avoid return values that demand exceptional processing. In particular, you should never return a, "No," instead of a zero-length array or collections. Here is an example of something we got wrong. The--in the buffered image op-class, we have a method called get rendering hints. And either, it returns a rendering hints collection or it returns, "No." And what's the consequence of this? Almost all code that calls this thing is wrong because it rarely returns, "No." So, people forget to code for that special case. If they do get a no, what happens? No point or exception and it's completely unnecessary. If they just returned a zero-length collection, then they wouldn't got into any trouble. So, I think I'm actually going to skip the rest of the talk because I'm out of time. In fact, I've already used up five minutes more time than I have. But if you have any other collection--try again. If you have any other questions, I'll be around for a while afterwards and you can ask me anything you'd like. So, thanks for coming. Oh, one other thing, which is I have a--I have a give-away for you, which is--when I gave this at OOPSLA in the proceedings, they gave me two pages to put in a--what do you call that? Like an extended abstract. And instead of doing that, I tried to do API design by bumper sticker. I try to basically take this entire talk and break it down into 50 little maxims, like, when in doubt, leave it out or, you know, all programmers are API designers, each with a sort of a sentence describing, you know, in a bit more detail. So here, pass this out amongst yourselves, as best you can. And I'll put it up on, you know--and tell the JJB in case there are more of you than there are piece of paper. So thanks again for coming.
Info
Channel: Google TechTalks
Views: 410,487
Rating: 4.9541478 out of 5
Keywords: api, google, howto
Id: aAb7hSCtvGw
Channel Id: undefined
Length: 60min 19sec (3619 seconds)
Published: Mon Oct 08 2007
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.