4.7 Kylie Bemis-Object-oriented Programming in R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so who here has either is either familiar with object-oriented programming or if you doctor go into programming in the past okay so some of you how many of you object-oriented programming in are specifically a few over the less of you okay cool so today what they're in this presentation we're just going to talk a little bit about first of all what is iPad oriented programming and why should you use it when should you use it and then what are the different kinds of ways to use object-oriented programming in our specifically so first of all what is object-oriented programming object-oriented programming is basically a paradigm of organizing your code around commonly reused data structures and ways of organizing data called classes and then functions that are related to those kinds of data called methods so what is a class the class is just a blueprint for a way of organized of some way of organizing data so for example if you work a lot with proteomics experiments then you might write a class that represents a proteomics experiment an object is a particular instance of a class so so for example if you if you have a flat for a proteomic experiment than a particular a very specific proteomics experiments such as the twin case study data set that Sarang was discussing earlier that could be a particular instance or object of that of that proteomic experiment class on when doctor jointed programming can also use inheritance which allows subclasses to specialize super classes and we'll talk a little bit more about what that means but just for an example say you you have some kind of class that represents proteomic experiments as I said and you want to specialize on different classes for say it's a dia experiment versus an SRM experiment so you might have different subclasses of a proteomics experiment class that represents dia experiments or SRM experiments or other different kinds of proteomics experiments and finally a method is basically just a function that says so with a particular class though it changes its behavior depending on what kind of object you give to it and in our this methods are implemented using something called generic functions which is a little bit different than how other languages sometimes use object-oriented programming and methods so things are a little bit different than ours your if you're more familiar with object-oriented programming from say Java or so for C++ so we'll talk a little bit more about the differences between how methods are implemented in our in a little bit so for example in our flat is a generic function so when you call plot on some kind of object it changes behavior depending on what kind of object you give to it so for example calling calling plot on numeric data will behave differently than if you call plot on some kind of categorical data so just a simple example of what this kind of this kind of class organization in object-oriented programming is like so consider for example that you want to write some kind of test simulator game where you just have like dogs and cats and you uses interactive C's with these dogs and cats and they're the users pets so you want to write the code for this kind of pet simulator game and you need something that represents first of all on these pets so you might write an animal class for for representing these tests and an animal class could have subclasses cat and dog because those are specific kinds of animals um you also have different kinds of tests might mate might speak to you for example in different ways so like if you have a parrot that might actually say something to you in English which would be which can be sometimes kind of weird but if you have a cat or a dog it might just do things like meow or bark at you so you can have a generic function called speak that those different things depending on what kind of object it is so you would have a generic function called speak which is something that all all of your pets or all of you your animals do and then you have an object named mittens which is an instance of a cat class and the Q axis we'll say meow conversely if you have an object named Duke which is instance of the dog class and you ask to speak to would say wolf because the dog class and the cat class each have their own methods for this generic function speak and so speak will change its behavior depending on whether you give it a cat or a dog does that make sense everyone please stop me if you have any questions or I want me to slow down or just one more elaboration on anything specific so how does this object-oriented programming work in our so in our there are two major object-oriented programming systems the first one is called s3 on if you remember when Yan was talking about the origins of our are actually comes from a language called s and so the s3 object system was implemented in the third version of S which is why it's called s3 and later on there is another object system added to the s language in the fourth version called s4 which is where the fees on the names of these different object-oriented programming systems in are come from so there's s3 and s4 and they can be used for different things depending on what exactly you need out of your object-oriented programming system and our so s3 classes are typically very simple on they they have a very simple class system you don't have any formal class definitions and they're also something called single dispatch which means basically if I asked if I guess if I use this method or call this a generic function speak it will only pretend on whether is a first argument is a cat or a dog or what other kind of animal is it if I wanted to say for example say for example you wanted to implement swim Plato sort of conversation between your pet so you have two cats speaking to each other they might communicate differently than if you have a cat and a dog speaking to each other so again the dog might go wolf and meow to each other and two cats just might meow at each other but with s3 which is single dispatch you can only change the behavior based on the first argument given to it so you can't actually do something like that conversely I'm going to talk a little bit more about multiple dispatch later on in s4 so another thing about s3 is it's single inheritance so each each each class each subclass can only inherit from a single other class or chain of classes anyway so for example f3 say you had a cat dogs set you couldn't have a cat dog that inherits from both cat and dog because that doesn't make any sense and s3 doesn't let you do that so you can't have multiple inheritance in that's three so something is either a cat or a dog or it's for it's neither or some other things so s3 doesn't allow multiple inheritance conversely s4 is a more complex class system you have formal class definitions and you'll see what those look like in a little bit but basically it's more complicated than s3 but that also makes it more powerful in some ways so s4 has multiple dispatch so to go back to that Orion for example so you had one has that conversation between two cats if you had speak and you had two cats as arguments they could both meow to each other or does this use this speak to a test and the dog you could dispatch in both of those arguments and have the cat meow and the dogs speak so you can change the behavior of your methods depending on more than one arguments that you pass to it which can be really useful but also really confusing sometimes but it's that's one of the things about s4 and we can talk more about that later again but that's one it's so s4 you have more complex class systems formal class definitions multiple dispatching and also multiple inheritance so in s4 if you wanted to have a cat dog class you could have a cat dog class because you can ask for you can inherit for more than one class on when do you use which in general you you want to use s3 for simple data structures without complex dependencies and you want to use s4 for more complex data structures most of the classes in that you use on a day to day basis when you're programming in are a lot of those are going to be as three classes even if you don't know it already but a lot of what you're working with already are s3 classes s4 classes are a little bit more uncommon and you don't see them as much in most of the most of the packages you use on a day to day basis unless you use a lot of bioconductor packages bioconductor is a repository that has lots of software specific for bioinformatics so obviously if you do buy informatics then you might use a lot more bioconductor packages which has a lot of packages that use s4 class systems because bioinformatics data tends to be more complex and have these complex networks of networks of dependencies so that's when s4 can become more useful than s3 when you have to have a formal class definition that reflects all of these complicated dependencies so now we're going to look specifically at s3 classes so s3 classes are created from base are types with additional attributes so we'll see in a moment what attributes are but basically but first off let's focus on that first part s3 classes are created from base our types so that means that a through classes are based on raw vectors integer vectors numeric vectors character vectors and lists so these are your your base core are data types and that's what s3 classes are based on so all of your s3 classes will be at its core one of these types with some additional attributes tacked on to it so s3 classes are defined by a class attribute which you can access and set by the class function so what are some common as new classes you already using are as you can see you probably already know this but there's factor and data frame so let's explore those a little bit right now so here I'm just making a factor called sv4 factor indices class to get what class it is we already know it's a factor and as Yan demonstrated earlier we can use type of to find out what the base are type underlying this data structure is and we you already saw earlier that a factor is basically just built on top of integers so the type of factor is an integer and if we look at the attributes we find out that factors have levels and a class as attributes and a class the class attribute is the one that actually defines what what s3 class something is likewise we can look at data frames which I'm sure all of you have used already and are very familiar with so the class of a data frame is just data frame we know that already the type of a data frame is a list so at its core data a data frame is basically just a list that were each of the elements or each of the list elements is the same length the attributes through a data frame a data frame has main row names and that class attribute data frame so for example if we change the class of this data frame object to a list what will happen will that will that work what happens if I just type yes now come on what'll happen it will just print out as a list because we didn't really change much to it besides changing the its class attribute conversely if I go back and just change the class attribute back to a data frame will that work again will it will it behave like a data frame now do you think so now if I well that parts working I guess the want to print down here there we go yeah so we can just change because a data frame is at its core just a list if we change it back to a listen back to a data frame without changing anything else we won't break anything it'll just work so s3 classes can inherit from other s new classes simply by appending to this to this class attribute so for example you may not have you probably work with factors before but you may not have worked with ordered factors so ordered factors are just some kind of categorical variable that has an order to them so I'll create an ordered factor here it looks a lot like a factor except it has a specific ordering to to its levels and if I look at the class for an ordered factor it's just a character vector with ordered and then factor so that's how that's how inheritance works with us three classes it's basically just a concatenation of the the first thing in the first thing and that class attribute is the class that it is and then everything after that is the classes that it's inherited from so that's just a very simple introduction to what these s3 classes look like and we're going to be writing our own s3 class in just a little bit but first we're going to talk a little bit about s3 methods so as I mentioned in our methods are implemented using something called generic generic functions so if you're familiar with object-oriented programming from a language like Java you may be familiar with writing functions for example object dot method to call a method for fun an object in our it's a little bit different instead of instead of methods belonging to a class the methods belong to a certain generic function so instead you would do you would call a method on some object and that for example we saw earlier how we could use how we can use plot or print and flatten print behave differently depending on what kind of object you give to them so as three functions s3 generic functions in are basically are just functions that consist of a body that just has used method and then the name of the function or the names the generic function so we can see plot is defined just as use method plot likewise print is just defined as use method print so what do those actually do well in order to find out what they actually do we have to decide what what kind of object we want to or looking at so but all another thing that though is that all of these will also have not all of them but a lot of them will have a default method that you can get by prints default or plot default so typically most of the s3 class most of the s3 methods will have some kind of default method that you have just the method name dot default and so you can see that SP methods in our are basically just defined by this naming naming scheme worth with the name of the method followed by dot and then the name of the class and that's really all there is to all there is to s3 methods but basically just this naming scheme and our happens to have an implementation underneath it that allows it to find the correct method based on this naming scheme which also means sometimes you could have you can have confusing behavior such as because our doesn't are doesn't usually care about what about periods in main so you can have so usually it doesn't matter if I just have something with a period in an object name however you can have confusing cases such as say so to force something to a data frame use as that data frame but what if what would happen if you had a class called frame then it's confusing because he doesn't know it's unclear whether you're doing some method called s dot data for a frame class or an AZ method for a data frame class or actually sorry as that data frames that data frame that's the full name of that so the folks yeah so the method itself is actually asked that data frame and you can have different different things on the end of that for example as that data frame dot matrix and as that data frame that data frame and so forth and it can get kind of confusing with these big naming schemes so that's one kind of drawback to s3 methods is that they're just dependent on this naming scheme which is kind of easy to mess up if you're not careful about how you main things with periods which is contrary to how things usually work in R because usually our doesn't care if you have periods in name so some common s3 methods that you're already using are are print we already saw that one slot another one is is summary you can use the methods function to find all of the methods for some given generic function so for example print you can see there are a lot of print methods so print just prints out things based on what kind of obviously give to it and all of these are the different print methods so basically whatever object you you have in our there's some method out there that knows how to print it [Music] that's a good question I think yeah so those are ones that are actually not exported so like if I do so the first one here is princess yes oh yes it's for time series I believe but if I do print PS down here I can't actually find it so that means that there's a print that ts method somewhere but it's not actually the print PS method itself isn't exported to to the user so there's some so there's some package out there that implements print PS but and our knows that that bet it exists but you can't directly access print duckie yet see anyone do yeah so if I use three colons like this I can gain access to two objects that haven't been exported to the user so these are this is how you gain access to the hidden sins that you're not supposed to have access to so I happen to guess correctly that print es is in the stat package so if I go stats colon colon colon print es then I can see and I can see what the print dot TS method actually is but because it's not exported by the stats package usually you can't see it which is why it has that asterisk there so now we're going to learn how to create a class with SP methods any questions before we start writing an s3 our own s3 class okay so earlier we learned how to write functions how to normalize and how to normalize summarize and blah proteomics experiments these are things that we would like to do again whenever we get a new proteomics experiment data set but we don't necessarily want to where you have to rewrite all of that code that we write every time we do that so it would be really convenient if we had some kind of class and methods that would take care of this for us so like for example if what if Y would do we always want to have to write all of the code that we write to plot or summarize a proteomics experiment when we could just write plot and summary and likewise normalize is something is another verb that you could you probably want to do a lot to a lot of different kinds of datasets normalizing data is generally a thing that you want to do so that's a good candidate to be a generic function that way we can write on a normalized method that just always have to do is say normalize and give it our data set and then we that would normalize the data set and if we have say different kinds of data sets and we want to want normalize to bave differently depending on what kind of data sets it is then we can just write classes for those different kinds of data sets and just always write normalize and it'll automatically do the correct thing if we write a three methods for that so in this section we're going to create an s3 class called s3 proteomics experiment and we're going to write straight and s3 on generic function called normalize and then we're going to write methods for normalized summary and flat so because s3 class is like I said they don't have formal definitions anywhere they're not really defined by anything except for having that class attribute the first step is to just write a function that creates that creates these s3 proteomics experiment objects so based on our infinite wisdom about what proteomics experiments are like we decide that a valid proteomics experiment should have at a minimum some proteins some features some intensities and labels and some runs we can probably do better than that but just for the sake of example let's let's let's do that book sick with that and because it gives us a lot of things for free we might as well inherit from data frames so to do that I've written this this function which it just takes vectors for a protein feature run intensity label and then I also wanted to give it the option of finding out whether the data has been log transformed or not and then I've also included a dot that dot to add any additional columns that you might have in your experiment so here I'm just taking the list of dots and saving that in the variable I want to check if the user actually passed anything and those dot that and then to pick it depending what they do I even make a data frame with all of the additional columns or I just make a data frame with with those required columns that we have then I'm going to add an attribute to our object called is log transformed and I just defined that to whatever the user passes as whether it was log transformed or not so why so why do I have this is log trends after the dot dot dot can anyone answer that we talked about we talked earlier about what happens when you put an argument after the dot what does that mean what else so yeah so you don't expect that it's that the default will change very often what else does it mean yes yep so it means that when that whenever you call the function you have to explicitly name that variable and in this case it's also doing and it's also doing another practical thing can anyone guess what that what that might be so as a hint everything in the dot dot dot I'm assuming our extra columns in the data frame that we want for this s3 proteomics experiment if I had this is log trans argument before that dot dot and a user didn't a user wanted to pass in passing additional columns and then he named them what would happen so after the dot-dot-dot as we mentioned you have to name it explicitly and everything that is that you passes a function that isn't the name that doesn't have a name will be assumed to either be one of these first five arguments protein feature run intensity or label or belongs to that dot dot dot so if is log trans or here for example and I call this s3 proteomics experiment function and I gave it six six columns so I gave it a protein a feature a run intensity label and I gave it some additional column that would say subject what would happen without naming it what would happen so if I so if I just gave it a sixth column and I didn't name it specifically it would assume that that is love trans is equal to that sixth column but I want to add to the data frame so by having the is log trans after the dot dot dot and means dot dot only captures things that we want to be additional columns in the data frame so we can kind of do this kind of do do this safely knowing that if the user wants to change that argument you'll have to make it explicitly and if they don't name it sir to see what arguments then we can safely assume that it's going to be a column in the data frame you know in this case it wouldn't because there's nothing there's nothing that the body of our function that actually explicitly assumed that explicitly assumes it's illogical all we do to it is we assign it to this that to an attribute of the object it would be a good idea however to check whether it's illogical and onliest and then throws one kind of error if it's not so a better function would be so now here were thoroughly checking if log trans is illogical and saying that it also has to be length one and now we're throwing an error if that's not true oh that's just on sane or so if it's if it's not a logical or if the length is not equal to one so in our there's a difference between having two of them and having one of them having two of them is basically having a single a single pipe or a single and is a vectorized version of logical of the logic of illogical operators so for example true both for so the single pipe is a vectorized version of it so it'll so you can use if you have a vector on both sides it just takes the basically those vector eyes and like compares the before on the first element or third element and so forth where else the single or the double version is assumes that each each everything is a length one so conversely except the only going to take the first element and sometimes that difference is important both because things like it's all statement it expects that the length is only going to be one I mean older versions of are it's just through a warning if you have to save this else something longer than link a once been a length one logical I think in going forward I think it's going to be an error now another useful reason it is because with the with beyond I think what the double one is that everything will be evaluated left to right so that if you have for example something the arm is left hand side or something on this right hand side that depends on the left hand side being true or false existence if this is say an and this were all than this part would never get evaluated so basically if I have so it's with the WN so I said if I have two expressions here if these were expressions rather than just explicitly true or false and I had something in the second one that I did not want evaluated unless the first one were true this would be safe to do so because this case if the first one is false and you're using an end and the second one will never be evaluated because you're enough it already knows enough to know to just say that it's false yeah something like that yeah so basically whenever if you need a few need to if you need to assume that this is the things on your on the right hand side will only be evaluated in certain situations depending on what's there so that's it's usually it's safe to do that with the double ones because because it evaluates them one by one so after assigning that attribute here we then just assign the class attribute to it using this class function so you'll notice that even though class itself is an attribute and I've changed this other attribute for whether it's log transformed or not using this attribute function I still change the class using this class function and there are certain attributes that are kind of protected that you should always use their specific accessors to do this class is one of those another one other ones are on our dim I think dim names probably but you can see what they are by looking at on this documentation for most attributes because yeah there are certain attributes that have special meanings in our name Julie also yeah yeah class comment there's all these are the vertices but so semester clerk especially have restrictions on which values can be set so you should typically only use to change those with their with their specific functions because then our will do that error checking for you and tell you if it's an invalid if it's an invalid value for that particular attribute because our already has certain expectations about what will be in a class attribute will be in a common attribute what will be in a dim attribute and so forth so for those it's a good idea to use the class or the dim or the dim names accessor and setter so after assigning the class to it which I do just call it s D proteomics experiment and then data frame so that it inherits from data frame and I just returned the object so I'll just run that now if we get that our data set I'm just going to rearrange it real quick so that it's in the it's in a format that we can use so physically I just um I think something showed this is an example earlier but basically I just rearranged it so that an additional column or a label that so all instead of instead of separate intensity columns for heavy and light it has one intensity column and then a column called label that says whether that particular row is a heavier light on channel so now I create our a3 proteomics experiment object and call it twin dia three and our ski is being weird again so I can just call head-on it like I would on a data frame and because we inherited from data frame we get all of the behavior that we would expect from a data frame so now we're going to write a function or an RS a generic function for normalized so normalized is not already in s3 generic function so first we're going to learn how to create an s3 generic function and it's really simple as we saw earlier and s3 generic function just basically is a function with a body use method and then the name of the generic function so here we're just going to create a function called normalize and the body is just going to be used method if normalized so create that and now I'm going to create a normalized method for s3 proteomics experiment objects and doing that I just follow the naming scheme normalize dot s3 proteomics experiment so here I'm basically just doing that one of the median normalization that's been presented earlier and here I'm also checking whether the whether the data set has been log transformed or not and if it hasn't been then I'm log transforming it as as an exercise it would be a good it would be a useful thing to for example add an attribute that keeps track of whether the experiment has been log transformed to using log two or log ten so that would be a neat thing to do for an exercise so you would add an attribute that keeps that either keeps track of say the base of the log or for example just whether it was log 2 or log 10 or or what kind of function it was with useful normalize it but here we're just assuming that if it was log transformed you on log 2 then we're getting the so I have this argument by so that's that's going to be the label for the standard that we're going to use get checking which rose chorus on to that label that I'm getting the medians getting the median of the medians and doing that median normalization that's intent presented earlier will also write methods for summary and watch so their summary for summary we're just gonna so summary already exists as an f3 generic function which we can see just by writing summaries we can see that that already exists use method summary and so we can just write summary f3 that f3 proteomics experiment to create our s3 method for it and this is just doing a sum over the over the over the feature intensities to get a summary that way and now we can write a simple plot method now in reality you'll probably want to write several different plant methods depending on what you are interested in in writing about but here we're just going to make a box plot of the intensities so now we can do we can just call normalize on our date on our on our data set and say we want to normalize it by the heavy channel we can get the summary let's see what is this error fine so somehow the intensity has become a character where would we have done that oh who knows what happens so Yan Yan played a practical joke on us apparently so so the problem here is that the plus operator is still redefined - - pasting two strings together so all of our intensities are are now strings okay let's restart our ah we probably could have just removed it but yeah before you restart again delete everything okay the only really simply just you go one down okay let's see where were we so we still need to you only don't define our normalize again and our summary under flat now we can normalize it and there now that we have the correct addition okay and we can also just run flat on it and that will just flat a box plots of the intensities for each run so that's just a really simple quick example of how to write three classes in R so I'm going to move on to talking about as four classes now does anyone have any questions about s3 classes before we move on yes oh so so you you mean it if I wanted to plot something well let's find out so so let's see we'll plot will just plot a really simple thing and that just plots like it supposed to so so the answer is no because because what we defined was a plot specifically for our s3 ds3 proteomics experiment class it doesn't interfere with any of the other any of the any of the other plot methods so plot for everything else is totally unaffected well in this case the the reason the whole the whole point is so that we can just say say plot on it and it'll only do it'll only do our plot method if we give it an s3 if we give it an s3 proteomics experiment object so if we wanted to do some other kind of plotting we would just give it a different kind of object and we assumed if that we're giving it an s3 proteomic experiment object we want to do two specific kind of plotting that we defined for that specific class so I just for a quick example of how we could do the same thing that that Yan demonstrated earlier but without breaking everything let's define an s3 method for addition for addition now as I mentioned we can you can have a default method on just just put the name of the method default and I'm just going to use make that the actual the actual addition function that is are actually a slight lead yep the actual ours normal addition and now I can define one for character vectors so now I can check one plus one that still equals to a plus B step face together and B so this is an example of how you can use as three methods to do the kind of thing that Yan was doing earlier to concatenate strings for exists to concatenate strings using the addition sign for example without breaking everything else in are yes so sometimes so subtle if you if you just right not not every not every function is going to be a method so sometimes you just have functions that have the same name as each other and in that case then you will have on so-and-so was masked by by this by this new name so if you had if you had um if it was an s 3 and s 4 method and it was just implementing a specific implementing that method for that generic function a method for that generic function for a specific class then you wouldn't get that message but if there's already an existing method was already an existing function a regular function with that name then it'll then it'll mask that so for example if I wrote a package and I had and I had this code in it then you would still get a message that the plus operator has been has been a masked because I'm redefining the the primitives the primitive base plus to be this as three generic function plus ah yes so a lot of there are a lot of there are a lot of us three methods in deep flier on but there are but they but in some cases I believe let's see so I'll just so that bloated without so in this case I'm not getting any any um okay okay yeah so here for example arranged employer is just it's just a regular function you can see because it has all of this code for instead of a use method or or or in the case of best for methods it would be it would also look different and we'll see what that looks like in a moment but arrange is just a regular function and if both both flyer and deployer have regular just functions call the range then you're going to hide one of them well in the case of in the case of Tibbles Tibbles actually are just for the most part are just regular data frames um structurally the all of the differences are just implemented in the in the constructor for them for example the data frame constructor always adds row names worlds the table constructor does not add row names unless you specifically asked for them but as basic but a table inherits from data frame so everything that works on one will for the most part work on the other yes yep so for example here I can just use these double colons to get the specific arrange from either from either flyer or deep liar yeah so I think for the most part assume you're not going to be using both of them together but yeah for things that are just regular functions they're going to mess they're going to mask each other like that or even if you even if you're redefining what and as what and what the generic function is so so you'll notice for example that the generic for plot has this particular signature XY dot dot if I redefined it to have say object instead of XY if I just renamed that argument then it and that would also change change things so whenever you write in the an s3 or an has for method it's going to also inherit those those arguments those are those parameters you can add things to them but but it has to match that for the most part yes yes so does anyone want to guess how to do that okay so to call the the character version on numbers for example I could just do this so yawn probably talked about how are all of the all of these insects functions in our insects operators in our life plus are actually just regular functions and you can call them like regular functions so here at the name of my method for this plus is just plus that character and so I can call it using just plus character surrounded by these backticks and then the arguments and so if I do that and of course because it's our it can record since them two characters so it ends up working straight yeah yes yeah you just have to use the full the full member of the plot dot whatever print dot whatever yep yep so what did you call it twenty I you three yeah so there you go you can just get the normal normal summary or whatever else you want I just call it I just calling it with the full name okay so now we're going to move on to us as for classes so s pro classes are more rare in base are but they're more common in bioconductor packages and like I said that's mostly because with bioconductor they're dealing a lot with bioinformatics data sets and so there's this more complex system of dependencies you have you have complicated experiments so you want more complicated more powerful objects as to be able to be able to account for all of that all of those complications and complexities so as three classes have formal class definitions and more rigorous rigorous structure so if you're just doing something simple on the fly band they can be a lot more tedious to use but if you happen to need that additional power then they can be very useful so s4 classes are defined by asset class function and so with us what classes you actually have to define what a class of that particular kind of looks like and you do that with the set class function rather than rather than deriving from the base our types like us three classes do s3 classes are based on lists or numeric vectors or things like that as for classes instead have have slots and these slots can you can define what kind of types are in these slots and you can also define a valid object method which will basically check whether an object is valid or not and you can and you can define what what that means specifically for your for your class so we can we can look at class definitions with the get class function so even though data frame itself is an s3 class are also provides an s4 version of it in case you want to do as for stuff with data frame so well just since we're all familiar with data frame we're just going to look at the s4 version of of data frames so so the s4 version of data frame you can see it has these different slots so the slots for the s4 version of data frame are dot data which is a list on mains which is a character row names which is this special class called data frame row labels and dot s3 class which is a character so you can kind of see some some rubble from the s3 system and and make in in this s4 version of data frame but it'll work for our purposes and you can also see what what a class is the data frame extends yeah so extends the residual class and inheritor yeah so now we're going to define an S for a class called s for proteomics experiment and again we're going to inherit from data frame so we do that using this defined class function the first thing we give it is just the main with a class in with s for classes you define what the super classes are or what classes it inherits from using contains equals so we're just going to contain equals data frame if you if we wanted to inherit from multiple things we could make that a character vector with a bunch of different classes we wanted to inherit from but in this case we're just getting here inherit from data frames then we have different slots so we have slots for this is log transformed or not and a slot for title so because we feel like having an experiment title would be fun so we're going to have a slot for an experiment title and that's going to be a character and here we can specifically say that that is locked whether it's log transformed data logical without having to define that to find that somewhere else we can make that explicit here so I'm going to find a validity function here so that's just defined with this validity and here I'm going to list what columns are required are required for it I'm going to check if any of those are missing and there's any of those columns are missing I'm going to print out a message saying it's missing those columns and otherwise I'm just going to return true because it's a valid a valid ss4 podium it's concerning so you'll notice over here you can see that the slots in as for classes are accessed using the @ symbol which is really different from s3 classes and most other objects in our so typically you don't actually want to you don't actually want your users accessing those slots using that @ symbol that's even though our doesn't make those explicitly private it's good form to provide some kind of accessors to those rather than accessing things via at in anything besides your own functions that implement the class so typically you would you would use on names object and names object equals value to us to assign those things so here even though so another thing is that as for objects can just be created with this new function which will you can see here but it's still typically a good idea to provide some sort of constructor so that your users aren't having to call this new function which can be kind of clunky so here I'm doing basically the same thing as the forth but creating an S for object with this new thank you are actually right yeah that's just for our our own convenience yes submit the actual main doesn't matter so here we're using a constructor to create an S for object and our studio being weird again but we can look at it and again we get the same data frame behavior that we would expect because we inherit from data frame s for generic functions are defined using set generic and you can also use get generic or is generic to find out whether something is already in s for generic function so here I'm just going to see if if these things are already as for generic functions or not the answer is no yes and yes actually I think that that might be new in this version of our so because I had these had seen setting the generic before in case it doesn't change anything so when when a function already exists and you want to just make an S for turn it into an S for generic function you can just do by do that cycle and set generic and then the name of the function and then that'll just convert it into an S fortunate function using the existing version of it as the default method so now if I do get generic on these we have on we have these these things show that there are as for generic functions with the standard generic which is just basically that the S 4 version of gnu's method so these are now s for generic functions in addition to already being as three generic generic functions and we can find that out by so show methods we'll just list the puppet list the existing methods were normalized the only one is this default one for object equals any so so that basically means object can be anything so that's the default method we can find out what that particular method is using select method and if we do that we find out that it's actually our s3 generic functions so that s3 generic functions still exist somewhere it's just art the default as for s4 method now also so so now you can define methods for those just using set method so with set methods the first thing is just the name of the method and then the signature and so this is basically the classes that will appear in the argument list and you can see here for example Y that means for plot you can have multiple dispatch on both x and y so so so the method that that will be selected when you call plot can depend on both the class of X and the class of Y and that which that's what you would on put next here in this set method in the case of normalized we only have we only have on we only we're only defining things on the first argument so all we need is that first step first give the class hey the first argument which is an S for proteomics experiment you'll notice when we do the same thing for plot we actually also just give give just give the first argument and that's because if we if you don't give addition any additional arguments it will assume that the rest are also this special any any argument so I can just define these as four methods if you think that method giving the name of the name of the method the class and then functions and those are just the same functions as before and now we can just call normalize summary and plot on RS for class as well and that all works just like before so that's basically a quick introduction to s3 classes and as for classes I actually went over 1/2 hour so I apologize for that um but I'd be happy to answer any questions since I'm here till 6:00 anyway so yes if the pen it depends what what class you are what kind of object you give to it so if I call if I call plot and the first and it's argument is the argument that you and the argument that you give to it is a destiny class it'll go find the it'll go find the correct s3 method for that s3 class if it's an s4 class it will find the correct s4 method for that s4 class yes it'll it'll oh well for the most part just work yes oh it's not ok ok so now we can do anything we want what do you want to do yes so you can do so you can do that just by using the methods function and class equals and then just give the name of the class yeah so I can just do twins I don't know what happened there oh I oh it's because I'm Oh because class for the Fiesta request for that turns out just a simple vector of FA experimented data frames but yes we can just do that or for for and those are the so for the this one it ends up having more just because it it'll automatically SH and show more things that were inherited or in this case will have some things that's that just all our objects have like this initialized method so it ends up showing more here but yeah so so to answer your question you can just use methods class equals and then that and then that's the name of the class yes so that will be covered tomorrow I won't spoil it yet if there is there is a package that goes I guess and we'll talk about that tomorrow yes yep yeah so as for classes can are definitely a little bit more complicated than s3 classes and methods but there can be times when when that extra rigor and formality can be really useful and like I said there's a lot of us for classes on bioconductor and so when you have a complex experiment and you want to make sure all of the the rows are correct all of these samples have associated descriptions and things like that and having 'aa than being able to check whether an s4 class is valid or not being able to model all of those additional constraints and complexities can be really useful yes any of you anyone it's a little bit of both most languages will be will have a certain guiding principle guiding design that will kind of guide how their how they are developed and how they're designed and yes so I R or s before it and now R is largely a functional language so functions are first class citizens you can manipulate functions just like you can any other object and so forth like that it some are also has this immutable immutable state that Young was talking about so it assumes that once so does that it assumes that your objects aren't going to change and that gives you certain things like ideally it would give you a certain level of functional purity that's not actually true but email and R because you have side effects with things like print but um but it but there it's designed primarily as a functional programming language or LCS Java is more designed around object oriented programming in our everything has to be an object everything as a class that's true to a certain extent in our because everything has some kind of class behind it but it's not true nearly same extent as Java with um and yeah you can kind of tell just by how they works at the the object-oriented systems in our s3 and s4 have been kind of added later after after the original language has already developed quite a bit and you can tell that just by the way that as three that's three class system works just because it's basically based on this this maining scheme that kind of conflicts with how lots of other things in the language are named why do you have as that data frame or even just why do you have data frame with a period in the middle of it when as three methods are defined by hat by a period dividing the name of the genetic function in the class so you can kind of see in places like that how how object-oriented programming kind of came as an afterthought to our and there's certainly things that I liked about the way our implements other going to programming but but it is not as central to the goal of the language as it is for something like Java but and there are definitely some other differences just in the approach to object-oriented programming with a functional language like R and that made one as I mentioned earlier is how in Java you would have object dot message to call a message or else in in a in R you have the method and then and then the object so all of the methods are functions first they belong to the generic function rather than the specific object so even even without when it comes to object-oriented programming in our it's primarily based around functions on these generic functions that then can be applied to different kinds of classes and but there are also other types of object-oriented systems that I didn't talk about that are on so there are a couple different packages on cran that implement object-oriented programming in our in ways that are more similar to Java and C++ and so forth and there's also another one that comes with are also called referenced classes which I didn't talk about because they're not used as much but they're things like reference classes are more similar to the kind of object-oriented program that from programming that you would get with Java so with reference classes you do end up doing things like object method to call to call a method so it's if objects here were a reference class you would call methods by using this this way of calling the method so it's a lot more similar to for something like Java or C++ one of the one of the other main ways that that s3 and s4 are different than some of the other object systems in R is that as three and s4r keep with this functional programming idea of a mutable state so s3 and s4 objects are basically intended to be immutable so if you go and change an s3 change something in an s3 object or an s4 object it will copy the object just like Yan was showing earlier with other kinds of objects in our so you have that same kind of behavior with s3 s4 classes so basically and that can be nice in certain situations so like say you get you get past an object some s3 or s4 object in the function you can go modify it it will knowing that there you won't have any side effects to to whoever is using that object elsewhere because you'll get your own copy of that object that you can do whatever you want to so that's something that amuses ideas we need ability gets you is that you can do whatever you want with an object inside a function and it won't affect that object anywhere else with things like reference classes which behave a lot more like Java on reference classes and there's another package on cran called r6 which is another object-oriented system that's very similar to reference classes and calls methods in the same way these these implement object-oriented programming a lot more similar to Java and they have a mutable classes so that so you if you change if you pass the if you pass the object to some function that function can then modify the object and you'll see those changes update and in your own object wherever else it happens to be and so that can be both good and bad it can be useful in terms of memory and so you don't have all these extra copies of data running around which can create a huge problem in our because you get a bunch of if you end up creating a bunch of extra copies of objects without knowing it that can really create a lot of memory overhead that you must don't necessarily aren't necessarily expecting where else with mutable mutable objects you can update things in place and not expect that to have there suddenly blow up your memory however that also create can create some problems in terms of user expectations because our programmers are used to things behaving in certain way and when you do things differently like with reference classes and r6 that kind of breaks the expectations of what our programmers are used to so if you if you passed some object to to a function you generally expect that that that that function either isn't isn't going to actually be able to change your own copy of that object but with mutable objects like as reference classes or r6 that can happen and the same is also true if basically all of these things are implemented using environments mostly and so and we talked to n I believe Yann talked about environments on the first day so E is V is some environment I am going to store one in this environment I'm going to create a function that takes some kind of environment and modifies something called X in it so here in my copy of G X is one if I do through on that and I can see in my version of P X has been changed to two and so that's the kind of behavior that you can do with environments and also happens with these other versions of objects noted programming like reference classes in r6 that doesn't happen with s3 s4 which can be really useful but also just kind of breaks user expectations with how people are normally used to programming in R but sometimes it can be really useful so so if you are interesting that kind of object on to programming and in R you can look up these reference classes or also packages call them r6 and I think there's some other ones called like r dot zero and good people always coming up with new they were packages that for R so that was a long answer to that question I hope it was helpful so yes generally the best way if you have some kind of collection of functions that you have a package on your local machine that you'd be play to do that is just distribute your package which you can either do very easily just by hosting and on github or also submitting it to Quran or bioconductor or another repository like that and if you send it to crane or bioconductor that also makes it easier for other people to download because it'll take those will take care of all the dependencies and make sure everything is installed correctly for whenever users download it and we'll talk more about building your own package tomorrow as well all right so that's it for today I guess we'll end ten minutes early I will hang around for a little while longer since I'm supposed to okay if you have any more questions otherwise I'll see you tomorrow morning I'll be here singing and I will be here at 8 a.m. if you have any other additional questions tomorrow morning and we'll start at 9 a.m. tomorrow with J JLS keynote thank you [Applause]
Info
Channel: May Institute: Computation and Statistics for MS
Views: 1,794
Rating: 5 out of 5
Keywords:
Id: UWR0hPVXiRc
Channel Id: undefined
Length: 79min 6sec (4746 seconds)
Published: Fri Jun 30 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.