How I'd Learn PYTHON For DATA ANALYSIS | If I Had To Start Over Again

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
say you get a bunch of separate Excel files from different stores with orders data on a weekly basis and you need to collate this data into a single table for further analysis you could just use Excel create a new workbook and copy and paste your data onto the same worksheet or you could just run this one line of python code that will do the exact same cool right hey my name is mochan and I work as a data and analytics analyst within the financial services industry out of the technical skills I've acquired throughout the years python was by far the most challenging one for me to learn which is why in today's video I'd like to show you how you can learn python for data analysis efficiently or at least much more efficient than me I'd say python is the most powerful tool in my data analyst skills Arsenal as I can clean and transport data with it create data visualizations or write scripts to automate certain tasks or processes learning to code in Python was not easy at all I've had my fair share of failures along the way before I could eventually write neat clean and efficient code in this video I'd like to share with you the biggest mistakes I've made and the lessons I learned from these mistakes which helped me succeed in learning python for data analysis I really hope that you can get value out of my learning story relate to a part or even various parts of it and accelerate your own Journey so first of all let me quickly tell you about how I completely failed at learning python right when I started my career five years ago I was working as a risk graduate within the banking industry after I just finished my master's degree in finance and economics for those of you who don't know much about graduate schemes in the UK their jobs designed specifically for University graduates you usually sign a two to three year contract and get to do a different rotation every six months I was in my second placement working in a risk modeling team and this was when I first encountered python just for reference after graduating I had zero technical skills no Excel no Tableau no SQL and of course zero python I've picked up all of these skills after I started working so in this placement I was surrounded by very very smart people they all had master's degrees and phds in quantitative subjects like quantitative Finance statistics econometrics and were also super technical let's just say that sometimes I didn't even understand their questions not to mention answer them I felt pretty disconnected out of the loop in terms of skills and knowledge and try to learn all the python I could within two to three months so that I could use the rest of the time to apply my skills and make Act safe to say I was too eager I didn't spend enough time learning the foundations blew through the topics quickly and by the time I got to the more advanced concepts like classes or writing scripts I was pretty lost learning python was a challenge that I severely underestimated mastering everything in Python is extremely difficult if at all possible but I didn't know this back then I wanted to run a marathon when I wasn't even able to make the 5K Mark as I tried to create and run automated credit risk models when I could barely understand a simple class within the code being in this team humbled me for life and completely changed the way I look at different levels of technical skills so now that you know how I failed at learning python let me tell you how I actually succeeded in the end I built on my mistakes and I created a structured roadmap focused on python for data analysis and I cannot highlight the emphasis on data analysis here learning everything in Python will take you ages so narrow down your focus by learning the basics very well before moving on to mastering essential libraries like numpy pandas matplotlib and Seaborn and by learning the basics I mean build a strong core knowledge of what data types lists dictionaries mutable or immutable objects are practice looping be able to write functions Lambda functions and other basic built-in functions have a basic understanding of what object-oriented programming is by learning about instances and classes I made the mistake of copy and pasting a bunch of code thinking only about the end result and getting it done which was not great from a learning perspective try and type out the code yourself as even though you can easily copy and paste or just ask some AI tools to write some code for you I feel the code sticks with you much better if you actually type it out trust me being able to actually code from scratch will help you so much when it comes to altering some code that you just copy and pasted or when you need to understand someone else's code and pick up the work from them after building a strong foundation with the basics you can move on and master the essential libraries let's cover numpy first it's used for numerical computations in Python its popularity mainly comes from the fact that it supports large multi-dimensional arrays and matrices and a bunch of math functions that you can use to operate on these arrays efficiently it also has a broadcasting feature that helps you perform operations between arrays of different shapes and sizes say for example if you have a larger and a smaller array numpy automatically replicates the smaller array to match the shape of the larger one numpy also integrates well with other libraries such as pandas or matplotlib or scipy or scikit-learn if you're into machine learning as it's a foundational library in the python computational ecosystem it gives you a seamless workflow for data analysis moving on to the pandas Library which is an open source data manipulation and data analysis library for python it's designed to make working with structured data such as tabular data or time series data more convenient and efficient it has two primary data structures series which is a one-dimensional array that can hold any data type and data frame which is a two-dimensional data structure where each column can hold a different data type similar to tables in Excel spreadsheets pandas simplifies the process of reading and writing data from and to various file formats like CSV Excel parquet or pickle moreover you can easily manipulate and transform your data using the data cleaning and preparation functions as well as handle missing values and categorical variables pandas also has strong indexing capabilities allowing you to select slice filter data based on your chosen criteria you also have many easy ways to access specific rows columns or even subsets of your data using labels Boolean expressions or positional indexing pandas comes in really handy when working with time series data as it has extensive support for working with it you can use the functionalities to handle time-based indexing resampling time shifting even rolling window calculations very useful when analyzing and manipulating financial and stock market data pandas also integrates well with other libraries such as numpy matplotlib or scikit-learn if you're into machine learning giving you a seamless data analysis workflow by combining the pandas data structures with the computational and visualization capabilities of other libraries speaking of data visualization let's move on to the math plot lib and the Seaborn libraries matplotlib gives you a wide range of tools and functions for creating a variety of visualizations such as line plots bar plots or histograms and Seaborn is a library that's actually built on top of matplotlib use matplotlib to create high quality plots with customizable settings for fonts colors line Styles or markers you can modify the axes labels titles or Legends and add elements to your plot as well the customizability is insane you can pretty much control every aspect of the visual you can also create multiple plots within a single figure using subplots and you can arrange the subplots in a grid or any other custom layout you prefer subplots are great when you want to present multiple visualizations in a single image now matplotlib is great but if you want to go the extra mile and make your visuals even more eye-catching use Seaborn as it enhances the visual Aesthetics of plots compared to the default styles of matplotlib it has a set of predefined themes and color palettes that look much more visually appealing and professional looking Seaborn complements matplotlib very well as it simplifies the creation of complex statistical visualizations you can easily create box plots violent plots or regression plots you can also just as easily visualize categorical data by using Scatter Plots count plots or bar plots you can then use these visuals to compare groups display proportions or highlight relationships within categorical variables one of my favorite things about Seabourn is the beautiful heat maps you can create which you can then use to highlight patterns and correlations in large data sets both matplotlib and Seaborn work well with other statistical computational libraries such as pandas or numpy giving you again a seamless workflow for data analysis and that's it that's the end of the video If you enjoyed this one make sure to check out some of my other videos right here thank you so so much for watching and I'll see you in the next one
Info
Channel: Data With Mo
Views: 62,179
Rating: undefined out of 5
Keywords:
Id: mut8eTdoRxU
Channel Id: undefined
Length: 11min 34sec (694 seconds)
Published: Sat Jun 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.