Best Laptop for Data Science

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm gonna be going over what laptop you should get for getting into data science I'm going to be going over four different laptop types and the pros and cons of each of them may be going over generally what you would expect to be able to do on a laptop the kind of limitations of one and how to make your hardware go further regardless of what kind of hardware that you actually have available if you're new to this channel and you're keen to learn the latest tips tricks and tools for working more effectively with data please hit the subscribe button for weekly videos okay so first up what can you generally expect to do on a laptop now for myself personally I'm quite often working with data sets between five to ten million records of data now for some of you that may seem really really big and for others of you that may seem really really small in any case that amount of data I find is probably a good amount that you know it's probably enough to be able to take a sample of a bigger data set and still have enough capacity to work with while still being reasonably fast now once you start getting into bigger data bigger workloads if you're trying to work with all of the data at the same time it can slow things down a bit and so generally what you want to be doing is testing your models on smaller data sets making sure that everything works and then popping it off over onto some sort of server cluster computing some sort of computer with more resource to be able to run those things faster rather than trying to necessarily run that on a laptop but anyway what do you need in order to process about five to ten moving more cups of data so I personally have in ultrabook now this thing here has basically got in I seven with 16 gigabytes of RAM and a 512 SSD hard drive now I'll talk about why those three things are kind of Porter now first of all the CPU is in i7 CPU which is a 64-bit CPU and personally I think that having a 64-bit CPU is really important because when you are working in data science you're crunching a lot of large numbers large integers and doubles are all stored within 64 bits now if you go to a less powerful cheaper machine like some sort of netbook which is using some mobile processor running on a 32-bit system basically what happens is that all the numbers this large 64-bit numbers have to get chopped into to be processed by a 32-bit processor which means it's taking at least twice as long and twice as many processes in order to crunch this data so you have a significant slowdown once you're moving to 32-bit processors now the other thing about the i7 because you also have a 3s and I fives they're all 64-bit chips the i7s also have more Kish and they also have and basically the case is the fast memory within the CPU so that's how much data it can hold close to the close to the processor much much faster than Ram which we will talk about in a sec and the other thing is is that it has more cores now the thing to keep in mind here is that by default data science applications like R and Python are not multi-threaded by default they by default they run on a single core so even if you get a faster CPU a lot of times you're not taking advantage of that unless you're also using libraries within those systems that take advantage of multiple cores so for example in our and Python the default data frames so either the default our data frame or pandas or Python are actually relatively slow data frames quite a bit slow then some of the libraries like our dot data table sorry our data table so this is a library which is actually available for both R and Python and it actually is the underlying implementation is built in C and it will use all the cores that you have available so because of this it runs many many many times faster than the built in data frames so something to definitely keep in mind there now moving on the amount of RAM is going to determine how much data that you can actually process on that computer so as mentioned I use sixteen gigabytes of RAM which is a pretty standard amount of RAM you can get you can get more but sixteen gig about sixteen gigabytes is a kind of standard amount that you get on a kind of high-end laptop now so if you had maybe 32 gigs of RAM maybe you could process 10 to 20 million records or whatever it is but like I said before in a lot of situations you can still probably take a sample of your data to work on your laptop rather than trying to have to do too much on your laptop anyway so something to keep in mind there now also with memory I find that a lot of times people end up sort of using a lot more memory than they need to basically because they make lots and lots of lots and lots of different copies of the data frames and so hmm maybe you're only getting 1 million records of data instead of 10 million records of data because you have 10 different copies of it because every single stage of your application saves another copy of your data so um so it's other things to keep in mind that will enable you to do more with less is just managing how you actually use the memory because you can use it very inefficiently as well things like graphics visualizations all that kind of stuff like that again is going to take up a lot of that memory so things to keep in mind there ROM now in terms of hard drive I would definitely be going for something which contains an SSD SSDs is short for solid-state drive these are memory based hard drives instead of disk based hard drives now these are many many times faster I think around about like 40 times faster or so and when it comes to reading and writing the data off disk these are significantly faster and so will make a big difference to how fast everything else runs as well ok now as mentioned I personally use in Ultrabook now the thing is with an ultrabook the kind of one thing that you might think is missing is something like a GPU so another type of laptop that some data scientists use is sometimes something like a gaming laptop and because gaming laptops are generally the only laptop that you can kind of get which also contain GPUs so GPUs are graphics processing units now if you are going for something with a GPU then you definitely want to get something with an Nvidia graphics card as opposed to some other make because the algorithms that do support GPUs are all optimized for Nvidia graphics cards now do you actually even need a graphics card or not will it actually make your particular kind of algorithms any faster and a lot of instances know just as with our in Python our single core by default a lot of the algorithms that available don't have any option for GPU acceleration so GPU acceleration is most notably used for deep learning type of applications now I personally don't really use any deep learning applications because it's generally better for things like computer vision natural language processing and these type of things now in for the type of data that I work with I'm working with basically a lot of structured data big data frames and so for the type of work that I do I generally don't benefit from actually having a GPU now the other thing is this if you get a computer with a GPU you are paying a lot more for it the computer is going to be a lot bigger and chunkier usually like say twice as thick as something as in ultrabook and they're generally gaming laptops so maybe not something you necessarily want to bring along to a client meeting or something like that but just something to keep in mind now the other type of laptop that a lot of data scientist use sometimes is sometimes they use Mac's so Mac or PC which one should you go for now as you've seen I use a PC for the type of work I I do there's a lot of great data scientists which also use Mac's as well but the reason why a PC can be particularly useful for you and it's basically if you are doing data science for business if you're in enterprise then a lot of the enterprise applications are pretty much all the enterprise applications especially the legacy ones are all PC based so your financial applications your trading applications trading platforms those kind of things they're all PC based applications now more and more there a lot of the applications being built are more web-based so they'll be cross-platform but pretty much all the legacy applications and still a lot of the applications being built today are still PC based applications so it's something to keep in mind there most notably of all the kind of applications is Microsoft Excel now you might be thinking well Mac also has Microsoft Excel but it's actually a very very different and very cut-down version of Excel compared to the version on PC now when you are working with data and data science you are gonna be starting to use larger data sets and when you do get into things like our Python you realize the importance of having a reproducible process now with the Windows version of Microsoft Excel you have tools such as power query and power pivot which exist in any of the other versions of Excel including Mac or Office online and those kind of things why those important power query enables you to create a reproducible process and power pivot enables you to have an in-memory columnstore data frame in the backend of Excel which means you can process data much faster and you can basically share much larger amounts of data with your clients which is not normally possible with the normal versions of Excel so so something to keep in mind there now finally another thing that I've seen people use sometimes is a netbook now as mentioned previously netbooks use 32-bit CPUs they generally massively underpowered so the machine itself is not suitable for data science so how are certain data scientist getting around this certain data scientists what they're doing is they are taking these netbooks and they are actually using a remote computer such as AWS Google Cloud Microsoft Azure or something like that or even their own sort of personal setup cluster and they're installing servers on these computers and they are running their processes remotely using a netbook just as a terminal so I've done some of this kind of stuff myself as well I've set up my own our servers on AWS and that's kind of cool because effectively you can get in our studio ide that you can use anywhere that you can get a web browser including your mobile phone right so you can go around with your mobile phone do what you are programming on it which is really pretty cool but there's a few things that you should keep in mind here is this actually a good idea or not so if you are actually working in teams and you're collaborating or you have a lot of data kind of stored remotely then setting up a remote server can be quite a good idea and the reason for this is because everybody on your team can be using the exact same version of our with the exact same libraries installed so you're all working on the same environment and this makes a really big difference when you're working on the team it makes things a lot easier easier in some respects and I'll go over this a little bit more sec now also if you have a lot of data right say you have I know terabytes of data and you don't want to download that you don't want to take that off of the server then having a remote server it means that you can leave the processing next to where the data is instead of pulling the data down to where your computer is to be able to process it and this can make this can make a big difference because downloading data can take a very long time if you need to do it quite frequently again if you're working a laptop really you're only want to be downloading a kind of smaller sample of that data and be doing the large processing often some kind of remote computer anyway but generally speaking I don't like this set up so much if you are kind of more experimenting if you want to explore like new libraries and applications then it's much harder to get all of these things set up on a remote machine experimentation which is a lot of times of what you want to do is what you want to do while you're learning you really want to be doing it on a local machine because it's so much faster and easier to do that you're not really going to be working with the bigger datasets anyway but you do want to be installing libraries installing different applications testing out different versions and that's just much much harder to do on a server so just something to keep in mind there anyway I hope this helps you decide for yourself like what kind of laptop that you should get as as a data science getting into data science even working on data science in a team I hope this helps and I'll see you next time
Info
Channel: Jonathan Ng
Views: 71,260
Rating: 4.7346792 out of 5
Keywords: Best Laptop for Data Science, best laptop for data analysis, mac vs pc for data science, best laptop, best laptops, machine learning, data scientist, razer blade, deep learning tutorial, machine learning tutorial, machine learning basics, machine learning tutorial for beginners, best laptop 2019, macbook pro, macbook air, xps 13, data analyst, big data
Id: BZA0C5AdW8I
Channel Id: undefined
Length: 15min 44sec (944 seconds)
Published: Mon Oct 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.