R vs Python for Data Science, Data Analytics, Machine Learning Building Apps, Moving to Production

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey Jonathan here in this video I'm gonna be talking about R vs. Python for data analytics and data science and I thought it was important to make this video because there's been a lot of great information but I just thought it's maybe a little incomplete a little imbalanced about which tools should actually be used if you're new to this channel and you're keen to learn the latest tips tricks and tools for working more effectively with data please hit the subscribe button for weekly videos now R and Python are both fantastic tools and in some circumstances R is a much much better tool and in some circumstances Python is a much better tool so the question is you know which is the right tool for you and this is important I think because at the end of day we need to make data more accessible to more people not necessarily just PhDs not necessarily just software engineers and the world is becoming data-driven and so it's important aside from just kind of R and Python and stuff as well that we make data analytics and data science more accessible to more people now I wanted to start off by saying that the rise of R and python has been absolutely fantastic obviously they are both open source tools which means they're free and they have been massively extended through tens of thousands of libraries which do a ton of work for you automatically which is absolutely incredible now compare this to what you used to need to use things like MATLAB SAS and SPSS which were great but massively expensive tools and the other thing is besides the initial cost of the actual tool was the fact that the power of these tools often comes through all of these additional libraries and if you needed to get any of these libraries for these tools that that would be additional license cost as well which just made them like really difficult to actually do anything at any pace if you need to get my cost approvals or any that sort of thing whereas with R and Python you type in a short command and within seconds you've got your library installed and you're ready to go so key benefit there of both being open-source now let's go back and talk a bit about fundamentally python is a general-purpose programming language so it's been around for it's been around for longer than R and it's used for a lot more things so in that respect Python is a lot more popular in the kind of wider area of not just data science so you can use it for everything like you know running a server maintaining scripts you can use it to program your raspberry pi to like give yourself smart lights to manage your cloud computing environments as well as now data with the canvas library so you know python is a very very powerful tool and it's got about 70,000 libraries or so which do kind of all of these all of these different things now R on the other hand is a data specific programming language so which means it's actually very very nice for working with data and it's only got 10,000 libraries but instead of having you know 10,000 libraries spread across like all these kind of different functionalities and areas it's got 10,000 libraries which are very much focused around a data analyst data scientist kind of process and workflow so that's something to keep in mind as well that if you are actually specifically working in data there's some very very nice libraries that maybe Python doesn't really have right now so let me give you example of that so one of the things with Python is people say that okay well you can take your analysis and turn it into web application using things like flask and Django so these are kind of really popular well established frameworks for Python which allow you to build web applications now the thing to keep in mind with this is that in order to use these you kind of need to also be a web developer right so you need to know flask and Django and you also need to know some HTML as a JavaScript asynchronous programming concepts you know like some reactive programming all this kind of stuff right now if you have these skills fantastic that's that's great right a nice little thing to add to your CV right but the thing to keep in mind is that even if you do have these skills it basically takes a lot you know it basically takes a bit of work to kind of actually build these things out and it's almost like a kind of another job in itself whereas with R a lot of people don't realize that you can actually build web applications as well you can build reactive web applications actually very very quickly actually much faster than Python using things like shiny and flex dashboards now the cool thing about this is that your focus is predominantly as a data analyst and a data analyst or data scientist you want to be able to do your research and then you want to be able to quickly present it without necessarily needing to be a web developer or being dependent on a web developer to get your work out and this is where R is just really really powerful because you can basically just write markdown documents with a little bit of your R code embedded and you can build reactive web applications know HTML no JavaScript no learning up flask or or any of that kind of thing and it is just unbelievably fast for doing it now not as flexible you know you're not going to get a job as a web developer using things like R shiny flex dashboards but you're going to get out work a hell of a lot quicker so even if you're into things like a rapid prototyping and different things like that R can actually be a really really good option for that so you know just something to keep in mind as well okay other things like often people talk about how Python is basically for machine learning and so the thing is this R had machine learning libraries kind of well before Python but the problem was was that because they were written by lots of different people it was actually quite fragmented and so when python came out with scikit-learn it was very very kind of powerful because it created a single cohesive way for accessing local of machine learning models and you know since then Python has actually taken the front seat in terms of all the kind of different machine learning models which have been developed access to the different API is everything like that on you know their computing environments Python does take a front row seat to the machine learning models which is you know which is an important thing to take into consideration especially since machine learning is such a big thing right now but the thing is is that the R community is not sitting still on this either they are rapidly developing and porting all of these libraries over as well so you know you still have access to all your things like carrots and tensorflow you know you've got access to automated machine learning models like h2o and everything like that so there is still a lot of options on R as well but typically it is you know just slightly behind Python right in this respect so something to keep in mind there okay um so yeah I just kind of mentioned kind of cloud computing and stuff as well so right now with cloud computing the big thing is you know like now you can build like do service programming and stuff like this right and I'll talk about that in probably another video sometime but a lot of that basically uses Python to build all of these functions which is kind of powerful a lot of the follow the API access and everything again is basically through Python right so Python is slightly better when it comes to this disrespect especially on AWS which is kind of the kind of key player in the cloud computing space really it's it's all it's all Python now but you've got other players as well for instance Microsoft Azure they're kind of pretty much the second place player and there there's still a massive massive car computer to the cloud computing space and it did do a ton of really really good things like a lot of people don't realize that Microsoft Azure has actually much more machine learning pre-built machine learning models then actually AWS does so something to kind of keep in mind there microsoft also actually has much better support for R as well so they have support for basically R and Python knowing that okay well these are the two big languages for data analytics and data science so you've got things like a Microsoft ML server they'll take R or a Python code and then compile it and expose these stuff by our a REST API which you can then to kind of run your stuff off right so really stop there you've also got things like it basically Microsoft has a a no code machine learning sort of machine learning kind of builder where you can basically drag and drop processes onto kind of a workflow to build out machine learning models and that also integrates with R or Python so you know a bit better kind of our support there as well so again you've kind of got some options which kind of leads me onto my second well the kind of next point which is about putting stuff into production so a lot of times people say that okay well if you want to put something into production well I'll just use Python right Python is kind of thing that you would use to put things into production but again that's not necessarily the not necessarily the case now effectively Python is a language which has been used a lot more by software developers software engineers and you know so they're much more familiar with the whole process and workflow on getting stuff into production whereas R has been typically been a language used by researchers and mathematicians institutions who are not necessarily software developers and because of that they're not really familiar with production workflows right so you know like people think okay well R is just a bunch of scripts which is you know not really that suitable but actually it's possible to expose you know you can containerize R functions into like docker containers and you can expose your functions by R REST API s and all this kind of stuff right so um actually there are actually a lot of options there for using either language in production it's really just it's kind of more how you use the language rather than the actual language itself all right so the next point I wanted to cover off here is documentation to help and just basically I guess how easy each language is to learn now again as mentioned Python is typically something which is already being used by software developers and so because of that if you already come from like a software developer kind of background then you know potentially Python that will be easier to pick up whereas R is I'm gonna say that R is easier for working with data than Python and you know again this is partially because it was natively built right as a language specifically for data analytics right also I mean the design the design considerations for both languages Python was designed to be easy to read easy to understand language R was designed to be a language which was very very forgiving which means you could write code in kind of lots of different ways and have it still work which when you're getting started actually makes a big difference it actually helps a lot to get up and running right on the flip side of that to know sometimes people complain that's like this R code is just so messy because there's you know like instead of having one way to do something you maybe have a dozen I still suggest you know like now if you're learning R I suggest learning tidyverse as like a really really good standard for using R but even if you don't strictly adhere to it your code will probably still work and that is kind of nice now in terms of help files and documentation at least for the data side of things I found the R documentation and help files to be much much better now part of this is the fact that the standards for R documentation it requires like vignette so basically code samples and examples that can run and now this really really helps out a lot now a lot of times with the Python libraries documentation file tells you what all the different kind of functions are but it has no examples to help you along which means maybe you have to go to Google Stack Overflow whatever it is to try and look those up to try and figure those things out whereas R code you know you hit f1 right there on the function you've got all the you've got all the documentation in your code samples and you basically copy/paste and start using and they have lots of nice little examples and this actually really really helps a lot to get up and running like really really quickly so you know important thing to take into consideration as well okay now the other thing I want to talk about is bi solutions now you know that's mentioned before Python has much better integration with things like cloud computing you know it has better support for like a lot of different api's which is which is also important but R has got better support for the AI tools right so a lot of good VI tools like your tableaus your power BI and all these kind of things that's you know like a lot of businesses are already using to do their data analysis they will have built in art integration and but they probably don't have I haven't seen Python integration as well so something to kind of keep in mind there so R is kind of integrated better in different kinds of ways right okay so which is which is right for you I think probably the most important thing is what is the rest of your team using right because if your team is using R or they're using Python well you should really just go off and learn that but other than that like python is generally going to be better for kind of software developers software developer teams whereas R it's going to be kind of a better for kind of researchers and analysts I'd say if you have more business facing and not sitting in a software development apartment then you know you should really consider R as as a tool just because you can build again you can build everything from end to end like very very quickly without needing to learn like a lot of other kind of skills and stuff as well so again with the web development you can build a web application within minutes in our without HTML or JavaScript or flask you won't get as much control but you know you will be able to churn out some really other things I forgot to kind of mention as well is that the languages like SQL are very very useful very important for accessing data from databases and different things right so it's a that's a good language learn and is thought it's not too difficult to learn SQL but if you learn R you don't necessarily need to learn SQL either because R has got database connectors which allows you to basically just write your regular R code and have it run against databases and stuff as well which is just kind of really nice because again you've got like one language that pretty much does you know that pretty much does everything in your workflow right I mean again like Python kind of overall does more things but it's merged in with HTML and JavaScript and SQL and all these different things which are which again great skills to learn and know but if your focus is as a data analyst or data scientist you may not necessarily want to learn those or even if you do learn them you may not necessarily want to spend time on them or you know even if yeah or basically have dependence on other people with those skills to be able to get your work out all right so if you found this helpful please give a thumbs up part of my mission is to try and help more people from different backgrounds and different starting points get involved in data so if you want to get access to some free training you can head over to my website www.DataStrategyWithJonathan.com
Info
Channel: Jonathan Ng
Views: 26,158
Rating: 4.9219217 out of 5
Keywords: data analysis, data visualization, deep learning, data scientist, data science training, data science with python, neural network, r vs python 2018, data science, r vs python data science, r vs python, python vs r, how to become datascientist, machine learning python, r vs python for data analysis, r vs python for finance, r vs python vs sas, r vs python machine learning, data science python, data science career, r vs python data science 2018, data scientist python
Id: ETvvwTuiIps
Channel Id: undefined
Length: 19min 5sec (1145 seconds)
Published: Tue Oct 23 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.