NumPy vs Pandas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Mathematical-based Python libraries like NumPy and like Pandas. These are libraries that can help spot trends over time, gain insights into data, and maybe one day, even solve the mystery of just why seven eight nine. So today, we're going to take a closer look at NumPy and Pandas. And if you've ever seen a simple ray of sunlight plus a glass prism, you've seen how that combination lets us see all the colors of the visible spectrum hidden inside. Well, when a data scientist comes across some interesting new data and they want to get a deeper look, they've got a number of tools they reach for. Now, this would be a great time for some background music, but I'm... I'm being told that that's not in the budget. Now Python, P-Y-T-H-O-N --Python is probably the language most associated with data science, but it's not really Python itself providing these deep perspective shifting capabilities. It's usually some sort of Python library which specializes in numerical and data processing. And two of the biggest ones out there are, oh yes! NumPy and Pandas. So which one is the right one for us? Is there a clear winner in this mathematical match up? Well, for starters, we're not in for too intense of a brawl here since Pandas is actually built on top of NumPy. So even if we're fully Team Pandas, we're still using NumPy. Now, NumPy was released as an open source project back in 2005 with the goal of bringing scientific computing to Python. It was based on two earlier packages. Those packages were Numeric and the other package was Numarray. And its strength really lies in its ability to work with multi-dimensional array objects. From there, users can sort search, filter, apply linear algebra, Fourier transforms-- the tools the data scientist needs to handle large amounts of data much faster than they could with Python's built-in functions. Specifically, it leverages something called BLAS-- that is an acronym for Basic Linear Algebra Subprogram, and LAPACK, which is also an acronym, Linear Algebra PACKage. And it uses those to supercharge its linear algebra capabilities. So all good, why not just stop there? Why not stay comfortably NumPy? Well, as its name suggests, NumPy is all about numbers. And where it really excels is numerical analysis, linear algebra and simulations. But when it comes to data analysis of manipulation, working with a wide range of data sources, that's where Pandas really starts to differentiate itself. Now, Pandas got its start in 2008 when developer Wes McKinney was looking for a powerful and flexible tool for programing quantitative analysis on financial data. Now Pandas is named after the three dimensional PANel DAta of which it works in. And then it was made open source the following year. Now, Pandas makes the process of working with data more straightforward for data scientists by providing methods for loading, reshaping, pivoting, merging and joining data. Or even working with missing data. It excels at working with tabular data, whereas NumPy is really more firmly rooted strictly in numerical data. Where NumPy excels at things like simulation, well, Pandas steps up its game in things like data analysis. So why not start right with Pandas? After all, most of NumPy's methods get surfaced outward through Pandas, so one might see this as a superset. Well, Pandas does build on top of NumPy, but that also means that it brings with it some overhead, both in terms of performance and learning curve. Pandas capabilities come at a cost of complexity. However, Pandas also implements a number of functions optimized with C and Cython, which can be faster than the NumPy equivalent once we get into very large datasets. The general consensus on the best approach seems to be start with NumPy and look for the features you're most likely to need. If that search leads you to Pandas, then there's your answer. So if you came here looking for a knockdown, drag out fight between Pandas and NumPy, I hope you're not too disappointed. That landscape of mathematical and scientific tools available to us keeps us busy and well equipped. So when you're thinking Pandas, NumPy or anything else, it's really any color you like. If you have any questions, please drop us a line below. And if you want to see more videos like this in the future, please like and subscribe. Thanks for watching.
Info
Channel: IBM Technology
Views: 133,191
Rating: undefined out of 5
Keywords: IBM, IBM Cloud
Id: KHoEbRH46Zk
Channel Id: undefined
Length: 5min 55sec (355 seconds)
Published: Wed Apr 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.