MountainWest RubyConf 2013 Code Smells: Your Refactoring Cheat Codes by John Pignata

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
everyone good morning can y'all can you hear me yeah cool so they lured you here the Matt's keynote and now you're stuck in a refactoring talk which is pretty devious I'm sorry about that my name is John pinata I'm an engineer in New York at a company called group mate and I want to start I want to start with a quick definition coat smells are heuristics for refactoring and the concept of code smells were first introduced by Kent back on Ward Cunningham's wiki in the late 90s and then later more formally published in the factoring book in a chapter Kent Beck co-authored when we're building software our design communicates to us through resistance we may say things like our code is difficult to understand or to test or to change or to reuse and this is the this is the expression of that resistance this this resistance is really valuable it's nudging us along to refactor code smells are hints from our software about how it wants us to reduce this resistance and in this way when we listen and respond to the hints that our software is giving us about how we want it how it wants to be structured our design emerges from our systems parts so the rest of this talk is going to be a lot of code this is the system that we're going to be refactoring it is a back-end API for a mobile application that's used by Google Android devices the push daemon is the thing we're going to refactor so the feature we're looking at is we deliver push notifications to users phones so like when someone favorites your message on Twitter you get a push notification that's what we're doing here the component that actually delivers the push notifications to Android phones is this set this middle bit the push daemon which we've extracted from our application and the push daemons a little unorthodox it listens on a UDP port so it's listening for UDP datagrams it takes those datagrams extracts a message and then delivers that message through HTTP to the Google API Google then handles delivery of the message to the user's device and just to give you an idea of this code as it was discovered here it is there's a lot of it and we're gonna go through it line by line so don't worry about reading it hopefully it's somewhat legible let's just talk about what this thing does so there's two externally visible behaviors in our push there's one command called ping which is a simple health jack if we was asked the the push team in the ping it responds back with pong and then there's a command send send is the central command of our system it's the thing that delivers messages to Google's API let's test this thing so in order to see how this thing works we're actually going to run the script and we're going to use we're gonna send it UDP messages and observe its behavior externally so if we run the script reading through the script we see that it binds on a UDP port in this case it's six eight eight nine we're gonna run the script it's gonna block and wait for incoming data grams we could use netcat now netcat is a small UNIX utility for sending messages across the network and it has a dash u flag which means UDP so we're gonna send messages to our local host port six eight eight nine we're gonna just type in ping and hit enter it's supposed to respond back with pong which it does so our first test is passing our first manual test next we're going to synthesize a send command send is a little more complicated so send takes parameters so instead of just a command send we also include two parameters the first is a registration ID this registration ID represents a physical Android phone in the field like so an Android device registers with Google's API and Google says okay here's your token here's how an application can address you the phone then passes that token to us and we use it to send it notifications the second parameter is just some text it's just some text we want to deliver into the phone since we're testing this as a black box they're really the only way we could be sure that this thing works is to send this message and use an actual registration ID of an actual Google Google phone if we do that and hit enter we then check our physical phone and we get the message so our system works as it is and our second test passes so we're going to be running this test a lot we're gonna be refactoring in many small steps and we want instant feedback we want to be able to run this thing in the background and for it to throw an alert at us if we if we break it so we're gonna write the outer most possible coverage we're gonna do the same thing we did with netcat and our eyes but we're gonna do it using r-spec so in this test we're cheating a little bit because our script body we don't have an object definition we don't even have any methods we just have a raw script that run it's because of that we're gonna use the Ruby load method to load the thing in a background thread and we're going to then send it messages with r-spec and assert on the behavior now we could fork a process instead of using a thread but we're going to use a thread here because we're in the same process as the thing that we're testing now which means we can mock and like mock the Google API for instance so here's the first test this is the same thing we did with netcat instead of using netcat we're going to use Ruby so we're gonna say socket send ping and then we're gonna assert that the response should be pong same thing we did with our eyes but now we're automating it here's the send test same thing we did with net cat we're gonna send in a prefab message so send token with some text but we can't automate physically checking a phone right and we don't really want to have to we want to be able to test this as a self-contained unit so what we're going to do is we're going to use web mock to mock the Google API and our responsibility is just hit the service boundary with the right parameters so we're gonna send in this command and then we're going to assert that the right parameters come out and now we could run the test and the test passes and now we could use auto test or something in the background to continue running this in the background as we refactor so now we can actually start refactoring we can go back to the original patient like I said there's no class definition here there's no method definition here but we see our first code smell has to do with the fact that we have one long method so it's long that it's not actually a method but our script body is sort of that the main method of our program and this method is really really long it's 30 lines of code that's doing all signs kinds of sort of things so let's go through it line by line at the top we're instantiating some collaborators so we're instantiating a queue in HTTP client and a UDP socket three things that seem to have nothing to do with each other next we're actually creating a worker thread pool so I assume we're using JRuby or Rubinius like moths just recommended and we are doing this because we want to do some work in the background so here we're using the queue to coordinate our work with the worker threads next in the threads we're actually creating a service request to the Google API and we're sending we're sending them the data that comes down to you finally in the middle of script we're finally getting to binding to our socket so here's our socket and there's a port six eight eight nine again just hard-coded in there in but beneath that we are pulling the socket for Datagram so we're seeing if we have incoming data grams if we do we're then dispatching a command based upon what's in there if it's ping we're gonna go one way if it's send we're gonna go the other way and then finally in send we're doing all kinds of gross string gymnastics to extract parameters so that we could create this JSON body that Google expects from us on the other side so to begin organizing this we're going to need to break it down into smaller pieces and our first move is going to be to replace this method with a method object we're just going to wrap this amorphous script body in a method so we can give it some shape so because we only have one method we're gonna create a method called push demon we have one object sorry that object we've one object in our system so we'll just use the same name as we saw on our diagram and the first the mechanics of replaced method with method object dictate that the first move is that we take the locals that are used in the method and we promote them the instance variables within the object we're creating next we're just going to take the remainder of the script body and we're going to wrap it in a method called start and we're gonna take the references we had to the locals and we're going to update it to instance variables so that we can use the instance variables we just created so we still have our long method at least now it has a name it's called start and now we could begin breaking that thing down at the smaller chunks so start is doing a lot there's three statement bodies in start we're spawning our workers and that will first one we're binding in the middle one and then below we're looping and processing incoming requests so let's extract these through statement bodies into private methods and let's leave references to these project methods in start so it's somewhat coherent so we're spawning workers which is just that same statement of body we're binding the socket and we're processing requests so here is push demon with the private API collapsed post extraction and we still have a lot more extraction to do but we have a choice we can continue extracting into smaller and smaller methods or we can look at push demon as an object and see if its responsibilities make sense I would assert that in this case push demon has a lot of reasons to change so here's a list of just random things that could possibly change about push teaming the items in this list have only a tangential relationship with each other and that's a code smell called divergent change it tells us that if we have an object and the object has many reasons to change that seem unrelated it's likely that the object has too many responsibilities so we're going to extract objects from our push team an object will start with the worker because the worker is pretty stuff contained and it seems like its own concept so our worker here is using the cue and using the client uses the cue to get coordinate work amongst its worker threads and it uses the client to post to the Google API so we need these two collaborators in our new object so we're gonna extract an object called worker workers going to have a cue and a client we then go back to the push daemon and now we can replace our cue in the client with a call to worker so now we're instantiating a new worker and that's going to encapsulate those two other collaborators next we have to look through our code to make sure we're not using those instance variables we just nuked and process requests were shoveling stuff directly into the queue here which we can't do anymore because we don't have access to the queue but we do need some way to submit work into the worker so what we're gonna do is we're gonna add a method to worker the shovel method to submit work into it and this will just allow us to give the worker a piece of work to do it in the background we go back in the process request and we flipped queue to worker okay so now queue is gone now we can actually extract the method we want to extract so spawn workers is still in push daemon we could move that method into worker now so we're gonna move it into worker we're gonna call it spawn because spawn worker would be redundant and we're gonna parameterize count but other than that we're gonna leave it unchanged tray now in push daemon we no longer call this local it goes away and instead we tell our worker to spawn and we give it some unit of work unit of concurrency in this case 10 next step is this UDP demon that UDP sockets scuse me so UDP socket and we're using it a lot in this in this program we're binding it to a port we're receiving data through it and we're sending data through it so we're pretty concerned with the internals of this socket and our object should be talking at a higher level of abstraction we want to raise a level of conversation between these two objects we'd want to be dealing with a low-level socket we want to be dealing with something higher so let's create a UDP server UDP server will expose a socket that it'll hide the socket rather the opposite sockets an implementation detail of the UDP server next we're going to look at these three verbs bind receive send and we'll just give UDP server those capabilities so we'll add three methods bind receive and send and it'll pretty much just wrap a socket for now so we can go back into push daemon and we can flip our socket to a server and now bind goes away we don't need this anymore we just tell the server to bind and process request we have a couple of references to socket those get updated to server so we have another object now but I would assert that we didn't achieve our goal our goal was to raise the level of discourse between these two objects we wanted to hide the fact that we were dealing with a socket and all we did was introduce another step in the chain we introduced a middleman essentially we know a lot about how UDP server does its job we know it uses a socket we could guess that but these details are leaking out of the UDP server this is an example of inappropriate intimacy we are inappropriately intimate with the internals of UDP server and part of the reason is in this sequence diagram what we're doing is we're telling it to bind to a port and then we're pulling it to receive data so we're constantly asking me TP server do you have new data yet blocking it comes back it has data UDP server can't really do much on its own without the push daemon kind of being its puppeteer so we could rearrange these objects though in such a way where they sort of tell each other to do things rather than us asking it and like forcing it to do things we can create an interface where push daemon subscribes its intent that it wants to be alerted when new requests come in so we're going to have a listen we're still going to have to tell it what port it's going to use and then we're going to tell UDP server tell us when new information is available and call a method on us so let's let's go through that so we have to update push daemon for our new arrangement the first thing we have to do is I mentioned we wanted to indicate our interest in UDP server to let it know that we want new requests so to do that we're going to pass in ourselves in the constructor and the convention here is going to be when you create a new UDP server you pass in the application that's interested in next we don't know exactly where this is gonna go but we know we don't want to do this anymore we don't want to pull the socket and get data from the socket so we're gonna replace that we're gonna just have a method called listen and we'll figure out what that does in a moment next these two things look the same now we're telling the server to bind and then we're telling it to listen bind is kind of its own problem we don't care what it has to do to listen we just want to provide it a port number so we're gonna collapse those into one method call next process requests the first thing it does is it asks the server do you have any data we wanted to couple it from that we don't want process requests to have to worry about that so we're gonna add a parameter of data and process request is going to assume that if someone calls it you're gonna have data for it to process and we're going to promote it into our public API so this will be the method that gets called by UDP server when there's a new request to process and process request is a really bulky name that I don't really like so I'm gonna change it I'm gonna steal a convention from Rack and call it call at first glance it seems like process requests the money is a better name it's a longer name maybe more descriptive but I expect future developers who read this code will be familiar with racks API so if they see call they're gonna instantly sort of intuitively understand the arrangement between these two objects okay next we're going to update UDP server for this new arrangement so we before we passed ourselves into the initializer of UDP server so we have to change the method signature of our constructor we'll do that we'll add the app next we have to update bind so we changed its name to listen and we added some behavior so it still binds the socket that's the detail of what it has to do to begin listening but we also added this loop so that loop that was in push daemon is now here and it's responsible for polling its socket itself and when it gets new data it'll call our app with that data so finally back to push daemon most of our dirty laundry in terms of our public our private API is gone but now we have this call method which is a little bit of a disaster the first thing that draws the eye is the fact that call is essentially a large case statement and case statement was one of the canonical code smells from a factory it was the switch statement in Java the thing that Beck and Fowler were concerned with was when you have a conditional like this it has a tendency to chlorate itself in your codebase if you have occasion to check for a type or a type code or something like that it's likely that you're going to do that in multiple places I think our use of case is a little different it's almost defensible like we're not weird we're testing we're looking at data coming in from over a network and we're making a determination based upon that data something's gonna have to make this determination something's gonna have to say is the command pingers the command send whether it's a case or another structure it seems a little immaterial but our bigger problem is in the branches of this case statement is a lot of behavior and a lot of like really nitty-gritty like string manipulation behavior and we can't unit test this stuff outside of the context of a push demon which feels completely wrong so let's try to shake that behavior loose we're gonna want to collapse this case statement by making its branches identical and replacing the behavior in these branches with objects that quack to the same interface so we're essentially going to replace this conditional with polymorphism it's gonna take a few moves to get there the first thing we need to do is here's the here's what the server service does when it gets a ping it sends back pong across the network using this data stuff we're going to extract that into an object because we have a worker concept in our system let's create a job concept let's extend the metaphor so we will create a jobs namespace and we'll add a class ping ping what needs as data and server to do its job so we'll just add those to the initializer and then when we call run it will send it'll still send data across the network and we go back to call we take this statement and we just get rid of it and replace it with our jobs class and call run so not a win at all now we have more code but hopefully this shrink soon let's look at send now so we're going to do the same thing with send we're gonna forklift this out of push daemon and into its own job class so send now the semantics of send are such that the return value of run is a JSON string so the semantics of send and ping are completely different in ping we rely on the side effect rely on the fact that when we call run it's going to make that system call and send data across the network to the client run is actually just doing some string manipulation and returning some JSON so we're kind of stuck now all right we have these two branches that are different and we can't eliminate this conditional unless these branches are the same to make these branches identical we have to zoom out and look at the larger system we made this extraction before this worker and we just sort of like happily plunked all these Google API details in there and that seems really limiting now in retrospect we have a concept of jobs and we have a worker it would be much more natural if the worker could just work arbitrary jobs rather than being tied to the context of use of posting JSON to Google so let's extract some of that dirty laundry I'm going to take these details these Google details and we're just going to move them into send for now so send is going to change a bit instead of returning JSON it's going to do the post across the network we're gonna memorize an HTTP client instance and send and ping now have the same semantics when you call run it makes an operate takes an operation that does some unit of work and we don't have to worry about the return value worker now is much much slimmer instead of being tied to Google this is now just a jobs runner so it runs in it's it monitors this queue and in each background thread it pops off the queue it expects jobs to come down that queue when a job comes down it just calls run on it so that's the interface this any job that speaks to this interface we could use the worker to execute in the background so now in push daemon we no longer have to worry about calling run that's workers responsibility and we no longer have to worry about the return value of send so that all disappears so now we have a case statement with pretty identical branches the same you know we have a different class we different object there but it's the same method being called with the same signature and we essentially have a factory so the case statement is just returning an instance depending on what command is paste pea sent to it and then once we get a job instance we just shovel that into the worker and the worker runs the job in the background so now that we have this factory and it's clear that that's what it is we've we've had all along we can move this into an explicit factory so we're just going to copy it into a method we'll add a method to the jobs met a class called factory and it's just going to do the same work so now we could finally think about do we want to just get rid of this case statement there's a compelling argument to get rid of it if we keep it around in future factorings it's possible that behavior is gonna slip into these branches and we don't want that we want the interface to be standard and we want all jobs to quack the same way so let's make a move and get rid of it we're gonna go backwards a little bit to go forwards so we're gonna replace our case with with a hash the hash is a map from a command to a job class in our factory we get the incoming data from this from the client we do some string manipulation for now we then ask the class do you have a class we ask for their jobs hash do you have a class for this command if you do we instantiate it and return it to the caller now in our push daemon instead of having this case that Slim's down and we just call our factory and the factory returns us a job we shovel it into the worker so push demon is slimmed down quite a bit but we've made a mess in these other extractions so let's go and try to clean them up jobs ping there's not much code here but the code that is here is a little suspect the first thing that we noticed is we have these weird references to data this associative array type structure and we don't know what it is but we assume it's data that you'd say that we're only passing together and this is a code smell called the data clump so a data clump are you know two or more attributes that are passed around together to do some represent some larger value that our pet that can't be passed around individually but do not exist in a higher level of abstraction so in this case we have to look a little closer at what data is because it's kind of confusing so this is what socket returns back to us it's returning us two things the first member is just the data that came through from the client but the second is of the structure called the sock adder and the sock adder includes information about the client and some information about the network and so a protocol family port and IP address of what we're interested in this is a pretty important concept in our system we're a service that deals with remote clients we should have a concept in our system home for this knowledge so let's create that we're gonna create a client a client this is going to be a very kind of class isn't going to do much yet it's going to take in the sock outer structure it's going to use adder info adder info is a class in the Ruby standard library that knows how to parse sock adders it's gonna take that and decorate it with some methods so we can access address import pretty easy so now we get to clean up a little bit and UDP server we're just taking the data from the socket and throwing it to the app like just saying here good luck we're gonna do something better now we're gonna deconstruct what comes out of the socket into its constituent parts the message in the sock adder we're gonna create a new client out of the sock adder and then we're gonna call the app with client and message this is a really good thing we had a really terrible name hang around for a long time data which is probably the worst name you could have in the system right what does data mean let's replace it so now we can finally get rid of it data becomes its parts client and message much more explicit so now we can go back to jobs ping we still have our data clump but at least now our data clump is coherent and looking at it it becomes pretty obvious that we can do this work without clients data client is the key actor in this line of code but we're asking a questions rather than letting it do its job this is an example of feature Envy we're trying to do someone else's job with someone else's data it's obvious that client should be the one doing the work extracted objects tend to attract behavior so now that we have something called client or system we want to give it behaviors so let's do that let's let client handle sending the message so to send the message it's going to have to collaborate with a server server is the only thing that knows how to deal with the network so we're gonna enter a add a server parameter here to the to the initializer and then we're gonna add a send method so the send method is going to take in a message pong or whatever it is and it's going to collaborate with server to send the message using the the data it knows address in port finally in UDP server because we're instantiating the client now we have to pass in the server which is ourself and in jobs now this collapses down to clients and pong which is much better now we can get we deal with this met this signature here we don't need a server anymore because client encapsulates the idea of a server so we could stop passing that down get rid of that okay so let's turn our attention to send so there's a lot of code smells in here if we start at the top we can kind of work our way down so same name different meaning we have two concepts here with the same name at the top we have an HTTP client memorized that we're using to talk to Google and then right below it in our initializer we're passing in the client that we just created the service client that represents a requester this could mean one of two things it could mean that we've imprecisely named the actors in the system or it could mean that this object knows too much I think in this case since we've moved around these details quite a bit you know we start out and push team and then we move to worker it seems like these need a permanent home so let's create a concrete object in our system that represents a push notification it's pretty important concept push notification has a registration ID and then it alert and this object's behavior is it knows how to deliver itself to the google api and that's it that's all it does so now jobs send collapses and instead of having to deal with the google api directly it uses push notification to do that so we're left with two fairly stinky lines of code here the first line of code is doing a regular expression match using capture groups and the second line of code is using the pseudo global polish like regular expression side effect variables we want to get rid of that the first line of code is an interesting one it's it's a night it's representative of primitive obsession this isn't to say that primitives are bad right primitives are most of what we're passing around but it's because we're using a simple type to represent a complex idea what's happening in here is a complex idea we're tokenizing the incoming message and trying to extract meaning from it and we're doing it at a lower depth in our system than we'd like if we zoom out what we have between a client and the server is a line protocol our line protocol looks like this we expect a command ping or send and then optionally we expect some parameters these parameters have their own tokenization rules so a parameter string is split on spaces unless a double quote is encountered in which case all characters enclosed in those double quotes becomes a token this should sound familiar this is exactly how the UNIX Bourne shell process parses parameters right when you've have in the command line if you put something in quotes that becomes an argument and that's what we've unwittingly built here and the Ruby standard library actually has an object that knows how to tokenize strings based upon those rules it's called shell words shell words is something we can use to split up the incoming message into tokens and then use those tokens more intelligently so what we're going to create a new object we're gonna extract yet another object called request request it's gonna take in a message and it's going to tokenize that message using shell words split it's then gonna decorate it with two men two methods we're gonna have a method called command which takes the first token and normalizes it into an upper case strength and then we're gonna take the remainder of the tokens and we're gonna return them as an array and if we don't have any tokens which can return an empty array so now things clean up quite a bit instead of passing down this raw message and relying on the rest of the system to kind of figure out what that message means we're gonna pass down a request so here in our factory instead of having to deal with a raw message and splitting it and figuring out what the command is well instead just gonna use the request object to figure out what the command is and ask the hash to return us a corresponding job class finally in send we no longer have the regular expression and we no longer have the weird pseudo global variables we just have parameters which is an array and we you know we're sort of binding ourselves to the position of those elements okay we have one more piece of unfinished business before we're out of time we've left mill checks scattered throughout the code in a couple of places so the first place is in this Factory so we're saying hash give me a job at correspond to this command but if we pass in a command like info or shutdown or something like that the hash is gonna return nil to us so we don't want to call new on nil because it will blow out so we check to see if classes truthy if class is truthy then we instantiate the class and we return it to the caller so our factory could return nil depending on user input consequently the push demon has to guard here so it has to check did I get a job back from the factory if I did I'm gonna shovel it into the worker if I didn't I'm not going to do anything nil is communicating an idea here it's communicating the idea that an unknown command has been requested and we can make this a concept in our system we're gonna use the null object pattern we're gonna introduce a null object to represent an unknown command so it's gonna look like this it's gonna clack exactly like a job but it's a no op it's not going to do anything you can call run on it it's just not gonna do anything but it speaks the same interface which means in our jobs class now instead of guarding here we could use a cool feature of hash so hash doesn't have to return nil as the default it just does that's its default behavior we could tell the hash when you get a key that you don't recognize return null job so now this hash is guaranteed to return something that clocks like a job so instead of checking if we got a class back we can just blindly instantiate it and return the instance so in the case of ping and send this method will return opinio instance or a send instance in the case of anything else it's going to return a null job instance now in push daemon we no longer have to check for nil here that can just go away and this is better but we've introduced a problem now we're in queueing null job here so no jobs are no op but we're stuffing it into memory and making a thread grab it and call run on it which is essentially no op it would be better if we didn't in queue it at all if we short-circuited it here and for us to do that we have a few options right we could check the type of the job that comes back we could say if you are a no job don't you don't submit yourself to the worker and this is just as bad as the nil check I mean it's pretty much the same thing it's a little more explicit I guess it would be better if we inverted the dispatch here so instead of the worker that invoking the operation of the shovel on the worker let's invoke the shovel on the job because the job knows if it's a valid job that needs to be in queued in the worker so let's add the opposite shovel to our job so we're gonna add the right shovel here right shovel will expect to pass a worker in in the ping job if we get this method call we're gonna just submit ourselves to the worker for later execution but in null job we're gonna respond to the same method but we're not gonna submit ourselves we're just going to no op and that means the push daemon in this case we can just flip this so job now goes into worker and job can make the determination if I want to include of time we're gonna stop our refactoring here I've put the code up on github for anyone interested for whatever reason there's plenty left to do in the code like there's there's tons of smells still left like for example we have hard-coded configuration details everywhere and that's gonna that's like the opposite of divergent change that's shotgun surgery that's possible that we're gonna have to update those details in many different places when we deploy to a new environment let's we also have weird boilerplate now like the job to create the job interface we have to add these methods and it would be nicer if we like extracted a superclass but we don't have time to do that and that said even though we have some more work left to do our push daemon is a much better shape than when we started we listened and responded to the code smells in the original script and we've made huge strides toward a more coherent design that will be easier to grow and maintain that's all I have thank you guys for your attention
Info
Channel: Confreaks
Views: 8,642
Rating: undefined out of 5
Keywords: mwrc2013, MountainWest RubyConf 2013
Id: q_qdWuCAkd8
Channel Id: undefined
Length: 32min 29sec (1949 seconds)
Published: Sat Apr 27 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.