Get Historical Price Data Using Quandl & Sharadar

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello world today we're going to learn about the quandl python api now if you're not familiar with what quandl is i like to think of quantum is a data exchange but before we move any further quandl is now named nasdaq datalink so i might say quandl i might say nasdaq datalink they're the same thing i'm going to probably stick with quantum because that's what i'm used to but just know that they're the same thing so i am a paying customer of quandl i use quandl's sharadar fundamentals database i like how the charadar team handles the financial restatements and things like that but i'm digressing you can go on to quandl and go to their search and browse their vast data data set library essentially and then you can go ahead and pay some money for each one of those as a subscription fee to then use in your analysis but that's actually not the only way or the only you know customer dynamic you can actually be a vendor to quandl if you have uh alternative data or your own data set you can sell that to quandl and they will pay pay you for it so um so that's essentially what it is it's just a bunch of data providers you know providing data to quandl which is then a platform that you know we can then use in our analysis now bundle's massive i think they have 650 000 users and i think 12 of the top or the largest 15 hedge funds are are using their data double check that on the website but the point is it's a massive service the nice thing about it is because it's so massive tons of different data and a lot of the data is vetted so it's usually correct and you only need to have one api so it doesn't matter what the data set is you just use the the one api to get the data no matter what you know the the data set was uploaded as right so it's pretty nice so what are we going to do in this video well i'm not going to walk you through getting an api key you just go to the website login sign up right same thing with some of the basic stuff so i'll cover that basic stuff you know again their documentation is pretty good so i don't feel like i can really add to it but where i'm going to go different is that i'm going to essentially show you how to use or show you how you can use you know a paid database or use that polygon polygon that was the last video use the quandl api to you know kind of create your own methods to get the data in a more in an easier way right so or a better format for you in your analysis so one of the things when you're using tons of different data sources you want to standardize and make everything conventional so you'll notice as we go through here's the polygon one if you haven't watched that and it goes into detail but basically we use all the same method names and methods same for data format etc you know when combining all of these different vendors together so it's just easier to remember i don't have to think well is this a data frame or is this a data frame what was this method called what was this method called it's all the same and that's what i'm doing here so hopefully that makes sense to you if you're not a paid subscriber um you might want to be that's pretty good service but if you're not you're still going to learn how to use the uh the api in this video so hopefully you're excited let's uh create some code because i think i've been talking long enough we'll open up jupyter notebook and the first thing that we're going to want to do is install quandl now you should have your own virtual environment active if you don't have a virtual environment yet or don't know how to create one i'll put a link in the description below you definitely don't want to be installing everything into your global environment so to install quandl super easy you could use github but in this case we're just going to use zip we'll do bang pip space install quandl and that will install quandl into the active environment simple enough the next thing we'll want to do is get the imports we're going to use a few imports in this program so we're going to want date we'll get from date time import date we're going to use date to set the start and end date for the dates we're going to kind of get for for quandl right for our price bars we're going to want to import quandl because we need that to connect to quandl we'll import numpy as np because we're going to want to handle uh commands or not of numbers we'll import pandas as pd to be able to check for null values and just use data frames and from local settings import quandl as settings now if you follow my videos you'll know what this is but basically the local settings is just a file that we don't upload to our github repository because we don't ever want to upload you know sensitive information such as api keys to github for prying eyes to maliciously steal all of our api calls and do other wacky things so if you're wondering what the format of this file looks like it's quite simple it's just you know just like this we've already done it twice now you just have in this case would be quandl and then you have a dictionary where the api key would be the key and the actual api key would be the value but this is essentially what we needed actually it's just like polygon only you would substitute the polygon text with quandl and obviously change your polygon api to the quantum api okay perfect so with the imports out of the way hit enter just to make sure it can locate them all let's create our rest client so then create my rest client awesome so let's create our rest client class a class my rest client be super easy well f the thunder init self off key and then if you're not familiar with type hints um that's okay basically all they do is they tell you what type of variable you're passing in and passing out just you know gives you a hint of the type right so auth key which is our api key is a string and if we don't set it it gets the default value from settings.api key so we actually don't have to pass this in it'll essentially know what our api key is but if we do want to pass a different api key in for whatever reason we can okay so now what we do is we just simply take our quandl package that we imported we'll type quandl dot api config api key set that equal to the auth key right which is our api key here and now what we'll do um is we'll go ahead and set self underscore session equal to quandl that's it now uh whenever i upload this jupyter notebook i'll add some more information because you can have like a number of retries max weight between retries again a retry strategy right where polygon didn't provide any of that stuff quandl does but i'll provide some documentation i just feel like going through all that here might not be the best use of your time okay so i'll hit enter and let's test to see if we can get this to work we'll create a client from my rest client and we don't have to pass the api key because it's default okay and then enter so we did get a client and now we should be able to access all of the quandl you know method so we'll do client dot underscore session period hit tab and we can we can see all of this stuff that the quandl package provides awesome so now what we want to do is create our get tickers method by clearing up some of this and let's go ahead and create a new title create get all right perfect now let's think about what we want to do here right so we want to grab all of the tickers from the charadar tickers database there's going to be a lot of them so we'll need to set page nate equal to true so you know it'll only give us one page if we don't uh but the paginate allows it to essentially continue paging and adding to that ticker's data frame until all of the pages are complete or you hit the api limit but in this case we won't i think that's one million rows so let's go ahead and start there we're going to need to grab our client because this is a class method i'm sorry grab our class because it's a class method type def get pickers and then that'll output a data frame okay and now let's grab our ticker so do tickers equal self session again we created that session up here get table we're going to get the shard our tickers table r stickers then we paginate equal to true okay and now let's filter for only the equity and fun tickers we'll do tickers equals stickers and let's do this okay we'll do tickers table equals sap and ticker's table equals sfp okay there's no space after that okay so that'll filter where the table is the equity pricing table or the fund pricing table now we'll also want to do some more data cleanup let's uh fix the nand values i'll do tickers replace the np man the none we'll do that in place there we go and now what we want to do is this is one of my largest gripes about this data set the there's a field call or column called isd listed and it's not boolean it's actually character yes or no so let's fix that we'll make it active so i'll make a note here convert is delisted to active and i'll make this cleaner whenever i upload it to github we'll do uh tickers active that's the field we want we'll do tickers is delisted because that's the field we need to check we'll put apply an anonymous function as a lambda function x then bool x is equal to n let's walk through that right so if it is delisted or active if it is not delisted it's active so if we set uh for each row if x is delisted if it's n it means it's active if n is equal equal to n that'll return true which is active if it is delisted this will be a y i think or t i'm i'm pretty sure it's a y y is equal equal to no that means is d listed as yes and that will return false to active that's how we fix that field so let's also do some basic renaming we'll do rename and get rename fields okay we'll do tickers equal pickers rename columns perma ticker call this the quandl id even though technically it's the charadar id but again you know it is what it is this is just for learning purposes code and basic right there okay and now let's make sure that our quandl id is a type integer we'll do that type of bundle id to hint 64. for tickers bundle id is equal to tickers bundle id as type hint okay so we fixed that now let's only get the columns that we want so we'll return only columns of interest we'll do calls equal ones we want perfect now so there's one last thing that we need to do and this happens and it's almost like a de facto standard at this point to prevent duplicates because it does happen even with paid sources speakers equal tickers dot drop duplicates do subset equals quicker so if for some reason there is a row that has the same ticker it will drop the second one and return tickers and keep our fingers crossed i didn't make any mistakes so what i do here in ballast syntax of course i made a mistake line 17 active equal is delisted y lam lambda x okay and let's give r alkaline a test i'll do client equals my ref client enter that does look like a cree oh no there we go now let's see so we'll have everything that we had before right with our session okay but now we should have it tickers so we'll capture that into a data frame and hit enter and see if we get all of the tickers in the format that we want it looks like we did now let's get the bars we're more than 50 percent done say create get bars method let's think about what we want to do we want to be able to pass a ticker to this method and it gets all of the bars for us makes sense and we also want to be able to pass the start and end date and we probably should pass the market too just to make sure we don't accidentally pull in you know the wrong ticker although the ticker set is unique but we'll just keep that as is for now when you're designing your system you can you know design it however you want so we'll go ahead and grab all of our code so far we'll paste this now we'll start getting to work on the get bars method okay class method so pass self a mark it is a string i'm going to say it's equal to stock from a default through ticker and that should be a string um pass is none that could just be optional well no it's you have to have a string so from would be a date none and two is date none and that'll output a pd data frame okay all right so let's first handle the start and end date so we'll say from underscores equal to none if cd is null problem so basically we want to make sure it's not an n so else this just handles is common we're handling um bands because well i'm not going to get into the technicals here but basically we're just going to handling hands uh is null okay so now that we know that the data is not a nan because that can mess us up we know that it's uh at least data's in there or to none and now let's set uh the day to let's set 2 to today if it wasn't set so 2 equals 2 if 2 right so that means 2 is equal to whatever was set to else date today right now from we'll do something similar so we'll do from underscore if from underscore right so that means if from exists right at the from else make it a date just start in the 2000 right okay now we also have the two tables that we're interested in right we got the tickers from those two you know from the sep and sfp table so let's just create a list for that tables r r scp and sharadar sfp perfect okay so now basically what we want to do we've got our two set up correctly now we've got our from set up correctly now what we want to do is we want to loop through the tables of four table and tables df equal self underscore session that table and we pass in the table name the ticker which is what we provided the date is equal to uh gte or greater than or equal to from and then we have our less than or equal to two and then paginate when it means we'll loop through all of it if some reason the first page didn't get everything it'll page us all them so i'll do if not df empty right so we have our our data frame now with everything if it's not empty we'll change the date into something more readable we'll do pd do date time df date all right so we're just setting that to a date date time and then df equal pf sort values by date and then we select the columns of interest so we'll do date open i flow close and volume and then we'll return the df and we'll also just say return turn none we don't have to actually do that but i don't i i like to make sure things are explicit right there and then hit i'll enter and see if we got it okay now let's see 9 47 syntax error now let's test it out create a new client it'll be my rest client we'll say df equals client and see both of the methods now get bars picker is apple and why don't we try to get all of the data okay i'll type df here and this may take a minute so i might oh that was actually pretty quick and it does look like the adjusted to be working perfect and that's it i can hear you say it already leo that was one of the easiest videos we've done so far and you're right the nice thing about the quandl api is that so many people have used it it's pretty fleshed out at this point right there's not much we had to do we just created you know our own methods just to make our lives easier so i hope you found this video valuable if you did please subscribe and hit the thumbs up button and also if you're interested this video right here google thinks you'll like so i hope you have a wonderful day and i'll see you in the next one thanks
Info
Channel: Analyzing Alpha
Views: 3,066
Rating: undefined out of 5
Keywords: trading, investing, data science
Id: -MSGRTLc7vc
Channel Id: undefined
Length: 21min 22sec (1282 seconds)
Published: Fri Oct 29 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.