R or Python: Which Should You Learn in 2020?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up everybody welcome back to my youtube channel richard on data my name is richard and this is the channel where we talk about all things data and we're weighing in today on the biggest rivalry of data science this is the Ford versus GM the Coke versus Pepsi the Marvel vs. DC I'm talking of course about our vs. Python this debate has been going on for several years and I don't see it calming down anytime in the near future and I'm mainly looking at those of you out there who are maybe in school right now you have a statistics or mathematics focus maybe some of you also who are in the computer science realms maybe you know some programming languages like Java C C++ etc but not necessarily Python some of you out there in the data management space who knows sequel you want to move over to the data science world but you're really thinking do I want to learn R or do I want to learn Python my thirty second answer to that question and it's kind of a lawyer answer which a lot of you might not like but it depends it depends on you as an individual how you're wired and what exactly in the data science world you're actually interested in but what I would say is you really can't go wrong with either of them what you want to do is learn one and get good at it you don't want to get analysis paralysis where you're weighing the options between the two and then you never learn either no if you get good at one of them you're gonna be in an amazing position so I'm gonna do a detailed comparison between these two and specifically we're gonna look at which one of the two are easier to pick up what are the capabilities of each of them and then as we move into 2020 we're gonna do some popularity comparisons so what does it look like people in the community you're using and then also what does it look like data science jobs are looking for in 2020 before I do all that smash the like button also hit the subscribe button and tap the notification bell so YouTube notifies you whatever I up a new video for those of you who are totally unfamiliar with one of these languages or the other it is a little helpful to know a little bit of the history and the backstory of them so starting with our our was made available to the public in the year 2000 and it was made open-source just a little bit after that it was created by statisticians and the idea behind it was to make data analysis easier and first the objective was to make it easier for statisticians and then the population at large so it's important to keep in mind with our it was created looking at programming and data analysis in general through the perspective of statisticians now as for Python Python was released in 1991 and it was developed by the sky' guido van rossum the idea behind it was to communicate complex concepts with fewer lines of code with the idea being that this would enhance developer and programmer productivity and a massive emphasis behind it was on readability so python quickly became open-source software and once it did people began to develop packages like numpy and pandas and these packages are really the foundation for data analysis and data science in python as it stands today and so I'm telling you all this to illustrate a pretty important distinction R was created by statisticians for other statisticians to do statistical analysis Python was created as a general-purpose programming language now that you have some background on the two languages you probably want to know which one is easier to learn and this one depends a lot on your background but the overwhelming majority of people do seem to think Python is easier to pick up than R however that actually wasn't the case for me my background was in statistics and I was familiar with the mathematical concepts of like statistical tests and models long before I was familiar with any programming languages so I started playing around with R and I saw how easy it was to spin up the LM function or maybe to run statistical tests without even having to worry about packages and it was super easy for me to pick up expand my knowledge base and get reasonably good at having said that that's because I'm wired as a statistician so the ability using R to quickly prototype different models that really appeal to me now if that's not you you don't have a statistics background or your wire as a programmer and as a coder yourself Python will honestly probably be easier to pick up for you than R is next let's go over the capabilities of these two languages now we have to start off by saying that for the purposes of data analysis and data science R and Python are completely competent languages for the overwhelming majority of things that you could want to do both languages are going to be able to support your needs but having said that there are certain things which one language is generally thought to be a little better than the other at in my opinion and I think this is a pretty commonly held conception our Blose Python out of the water as far as data visualization is concerned so R has the very popular ggplot2 package and this is just an incredibly powerful very flexible language and it makes use of the grammar of graphics framework which is just generally quite intuitive and for most people it's pretty easy to pick up now obviously Python has some visualization packages Seabourn is one of the most popular ones and these packages aren't bad they will work they will get the job done but R has packages like ggplot2 gg this HTML widgets they've been around for a long time their absolute beasts at creating beautiful data visualizations and to me when you compare the package stack from one to the other Python really can't compete with ours visualization capabilities we also have to point out with our that it was built for statistical modeling and statistical tests so for creating those in our it's sooo for a quick and easy to do in Python it's obviously doable you have the stats models package which is pretty nice for it but you really can't compete with how quick and easy it is to do these things using are from a design and usability standpoint almost all our users use the IDE that is integrated development environment called our studio with Python users there's nowhere near that level of uniformity there are some who use pycharm some use Spyder some use Jupiter there are tons of different things people use out there honestly I do think that can create a bit of a learning curve for Python for beginners and to this day I don't like any of those Python IDE s as much as I like our studio our studio is really clean it's really easy to use it's really good at keeping all your stuff organized and it just makes the whole programming experience really fun additionally R has one of my all-time favorite packages and that's shiny so for those of you who are unfamiliar with shiny it's a tool for creating interactive visualizations and applications now Python does have some equivalents like Dash and bouquet but right now shiny just has incredible flexibility and it has the power to handle some amazingly complex requirements and right now I just don't think any of the Python modules are quite on that level yet and likewise our has what's called our markdown which is a framework for writing code and quickly knitting that code and whatever output and comments that you have into a report now you can do essentially the same thing using the jupiter notebook framework for python but really to me just the ability to create a report for virtually any audience without having to leave your workflow and your coding environment your ability to do that in r is just unparalleled now let's move over to python now as I mentioned python is a general-purpose programming language and that's going to be a beneficial factor in quite a number of ways first and foremost Python is going to be faster than our most of the time now you could be proactive with R and write code in a good style in ways that are going to perform at a reasonable speed but for more complex pieces of code or for things like loops which take a longer amount of time Python is almost always going to outperform R for that reason and for others namely the fact it was designed to do this Python is going to be better at deploying things into a production environment now that's not to say you can't do that or it's impossible with R but if that's something that's an objective for you or needs to be on your radar Python is going to be your tool Python is also generally considered better at machine learning and deep learning so Python has the very popular scikit-learn package for all of your machine learning needs and this thing is just incredibly powerful it's fast and it's really easy to use and create models using a very small amount of code now R has some machine learning packages like carrots which are also pretty powerful but they haven't really caught on in quite the way that scikit-learn has similarly Python has a lot of deep learning libraries and Python was the focus as deep learning became more mainstream just because of its superior computational power now R has the Charis package now and it's making some headway in the deep learning space but honestly I do think it's really playing catch-up to Python Python is also going to blow our out of the water with respect to things like natural language processing and web scraping this really boils down to pythons amazing computational power and the fact it can handle big datasets or just big tasks in general very efficiently it's these kinds of things which are more general-purpose tasks which Python was designed to do and it's a total beast at them so as you can see this is not a simple clear-cut story and a lot of the things I just described are somewhat subjective and they're based on what the overall community tends to think on average so your experience could very and honestly when we discuss and we compare the capabilities between the two languages it might not be as important of a discussion as you might think at first because both r and python have packages where you can incorporate code from the other language so provided you have the know-how you're increasingly able to leverage the best of both worlds and when you put it that way it's a lot easier to learn one function or one package from another language than to learn the entire language next let's look at the popularity of these two languages with an emphasis on what things are looking like as we head into 2020 now we're gonna look at the tie yo B index here for those of you who are unfamiliar with that it basically measures how popular programming languages are by looking at how frequently they're getting queried by various search engines so here we have the top 20 languages you'll see python is at number 3 and r is at number 18 which is actually down from number 12 which it was at this time last year then if you look down the ratings column you see python comprises nine point seven percent of search queries of these programming languages but are only makes a point eight percent now a few things to keep in mind the tayo b index is not exactly a scientific methodology and this also isn't a totally apples to apples comparison because python is a general-purpose programming language while r is generally statistics and data science specific so that makes it easy to imagine why it's much higher up on the list but having said that it's definitely alarming that r's percentage of these queries has gone down in the last year considering data science in general has not exactly gotten less popular over the same time period in fact if we look at R and pythons popularity over time specifically in the last two years these things get pretty interesting here's the tayo B index over time for R and on the y-axis we have the rating or just the percentage share of search results out of all languages measured by the TOB index that are happens to get now based on this index you see that it has doubled in its rating overall since 2014 but it also hit a peak around 2018 and it has gone down quite a bit since then now we look over here at Python so python has also skyrocketed in its popularity since 2014 so again the same time period but its pattern of skyrocketing has continued more in fact it's got even more extreme since 2018 there are also a few metrics that point to the idea that the catalyst that fueled pythons enormous growth over this time period has been data science if you look at keywords like pandas or data science and Python just things like that the rate that which people have been querying these things using search engines has increased massively over the same time period and honestly I don't have a really good definitive answer for why Python has been experiencing this huge boom but our hasn't quite so much my best guess is that in the last few years big companies have been really heavily investing into data science and that by and large their language of choice has been Python more so than R it could also be the case data engineers use Python more so than R and if we want to have data scientists and data engineers as integrated and connected as possible they're going to be using the same platform so python more so than r it could also be that for a greater share of the data science community it's important to have solutions which can then later be deployed into production and that's going to give a big edge to python whereas if a bigger share of the data science world was doing research you'd see more of an edge towards R so I don't know these are just some theories I have leave a comment down below I'd be really curious what you guys think I mentioned earlier that Python is generally viewed as better and it's more popular than all with respect to machine learning so we're going to look at another popularity metric this is a poll from Katie nuggets this is a survey from a sample of about 1,800 programmers responding by listing what software they were using for analytics for data science and for machine learning as you can see python is the most popular tool in 2017 fifty-nine percent of these programmers reported using Python but that increased to sixty five point eight percent in 2019 meanwhile in the same two years while python was going up our actually went down it went from fifty six point six percent in 2017 to just forty six point six percent last year now again that's a totally non-scientific poll but I still think it's evidence that some real trend has been going on over the last couple years next let's look at data science specific jobs so this is more work from the awesome people over the KD Nuggets blog and it's from February 24 2017 through May 27th 2019 now these are job postings from indeed and these guys tabulated for how many jobs these languages appeared in the description now Python is in first place and it's got somewhere around double what are has sequel is in a close second that's no surprise there for me at all but they then go on to say are increased massively from 2017 to 2019 but not near as much as Python jobs increased so circling back to our question of the day which one should you learn R or Python and again from a global perspective there is absolutely not one right answer that answer is tailored to you and to a certain extent probably to your industry and to your region you have to ask yourself are you more of a statistics person more of a programmer or neither the best possible thing that you can do is really buckle down and learn one of these languages apply it get good at it develop expertise and then later come back and learn enough of the other one to be dangerous there is a real risk that if you try to learn too many things at once you might end up failing at everything but if over the long run you do buckle down you get good at one you master it you go back and learn the other one and master that uh-huh mark my words employers are going to want to hire you and they're going to want to pay you as my final recommendation before learning either of these two I would learn sequel you saw that it appeared in almost as many job descriptions specific to data science as Python dead and there's really tons of downstream benefits to having a good fundamental understanding of databases and multi relational data sets based on these analyses there is certainly some evidence that are is losing popularity while Python is just going to the moon and this could continue unless R comes out with some new capabilities or some new perks so just make of this information what you will but the most important thing is to learn one of these and get good at it because I guarantee you are is not going anywhere in the next year not the next few years after that there's tons of companies that still use it and once you get good at one of these you should hasten your ability to get good at the other one what do you guys think are you gonna be using more R or more of Python in 2020 let me know in the comments thanks all for watching this video until next time richer on data
Info
Channel: RichardOnData
Views: 31,840
Rating: 4.9519615 out of 5
Keywords: r or python, python vs r, r vs python, programming languages to learn in 2020, r or python which is easier, r vs python for data science, r vs python speed, r vs python 2020, should i learn python or r, r or python for data science, python or r for data science, r vs python for data visualization, r programming, r vs python data science, r vs python differences, r vs python comparison, r vs python machine learning, data science r, learn r and python, pewdiepie, minecraft
Id: ZGeLAqGkObw
Channel Id: undefined
Length: 19min 1sec (1141 seconds)
Published: Thu Jan 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.