Predicting Crypto Prices in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] what is going on guys welcome back in today's video we're going to learn how to predict cryptocurrency prices in python using deep learning so let us get right into it now before we get into the actual coding i want to mention that this video was sponsored by tab nine and before you skip that part because it's just a basic sponsorship let me once again tell you that i would not recommend something or advertise something unless i think it's useful and interesting and i've been using tab 9 for the past couple of weeks myself and i really enjoy it because it's an auto completion model where you could say an auto completion engine which is powered by deep learning and the gpt2 language model which means that it gets better over time it gives you very good and long auto completion suggestions and it fits your programming style or it adapts to your programming style over time so you can go to their website and you're going to see that they support almost all modern programming languages or actually all modern programming languages and almost all programming languages that are relevant and you can see they support javascript python typescript php java c plus go rust and many more and you can also see the tab 9 is integratable into most ides or at least in all ides that i know and think are relevant for example vs code intellij pycharm go land you can see the atom editor sublime text you can see android studio you can see emacs and even though you cannot see it here it's also supported in vim because i have a lot of vim users in my followers or among my followers and you can see that it is easily integratable with all these languages and all these ides and it's also totally free as you can see on the pricing page you can see that the basic code completions here are totally free so you don't need to pay a single cent and you can use tab 9 and the intelligent auto completions and also because that is also concerned for a lot of people especially if you're using the cloud model the privacy tab 9 takes the privacy very seriously so you can go to tab 9.com you can click on code privacy and you can see that they don't use your data for anything else but better auto completions and even if you don't want that you can just turn off the cloud model and you can use the local ai model instead of sharing your data with a cloud even though they're not doing anything stupid with your data some people just don't want it so in in case you're one of those people you can just turn it off and use the local mall so tab nine is a very interesting product i highly recommend you to download it and you can find a link in the description down below uh click it and take a look and tell me in the comment section what you think about tab nine all right so i need to add one more disclaimer here because this is a financial topic this is not investment advice this video is not financial advice in any way it is programming advice i'm showing you how to use python and neural networks to work with uh sequential data so i'm showing you how you can get data of the past few weeks months and years and then predict the next day uh and or try to predict the next day this is not a guaranteed model for making profits with cryptocurrencies in fact you may lose money using the small so this is the focus in this video is on the programming and not on the financial trading you can try to use it i don't recommend it because that's just a machine learning model that is going to predict one day based on 60 days or based on 90 days and the predictions are not going to be super accurate now you may make money with it but you may also lose a lot of money with it so this video is not investing advice i'm not responsible for any gains or losses that you take using that model i'm just showing you how to implement such a thing in python all right so having said all that we can start by installing some libraries we're going to open up command line or the favorite terminal of your choice and we're going to say pip or pip 3 install numpy first so we're going to need numpy we're going to need pandas we're going to need uh matplotlib so the basic data science stack in python and we're also going to need pandas data readers so pandas dash data reader like that this is important because this library is going to get the financial data from the yahoo finance api for us and one thing that i also think that we're going to use is mpl finance even though i'm not sure if we're going to use that uh i actually don't think that i'm using it in this project so let me just see no i don't think that we need mpl finance but if we need it you just say pip install mpl finance like that mpl finance but i don't think that we need it for this video and we're also going to need uh some machine learning library so we're going to say pip install tensorflow and we're going to say pip install sklearn or also scikit-learn this is actually the same library so those are the libraries that we're going to need and we can now go ahead into the code and once you have installed them you should be able to import them so we say import numpy smp as you can see a good auto completion here from tab nine import matplotlib dot pipe plot s plt splt then import pandas spd and import pandas data reader as web so those are the basic libraries that we're going to need here we're also going to use date time to just use date time date time as dt and then we're going to say from sklearn dot um pre-processing we're going to use a scalar so the import max min max scalar this is what we're going to use to scale the financial data in between zero and one so when always when you're working with neural networks it's best to have a data that is either scaled from zero to one or from negative 1 to 1 so the neural network can work with it better once we have that we're also going to say from tensorflow.keras.layers we're going to import the dense layer the drop out layer and the lstm layer which stands for long short term memory this is a recurrent layer which is very important for sequential data and we're also going to say from tensorflow.keras.models we're going to import the sequential model so do we have a space here yeah there you go so this is what we're going to need dense dropout lstm sequential and a scalar so those are the libraries we now imported them and the next thing you want to do is you want to get the financial data and this is actually quite easy we need to know what cryptocurrency we're actually interested in so we say crypto crypto currency and for example let's say we're interested in ethereum then we say eth if you're interested in bitcoin you say btc if you're interested in ripple you say xrp you can just google the ticker symbols we're going to use btc here for example and what's important is we need to also have an actual currency or actual currencies maybe not the right term a a basic currency that is not a cryptocurrency like a dollar or euros to uh to compare the bitcoin to because we want to know at the end of the day how many dollars is one bitcoin worth or how many dollars is one ethereum worth so we're going to call this against currency and this is going to be us dollars in this in this video here so after that we're going to specify a time frame for the training so we want to know okay what are we going to use as a training data here we're going to say the start is going to be dt.datetime uh 2016 first of january for example and the end date is going to be now so dt daytime now it's going to be today uh so this is the time frame that we're going to look at for the training data and in order to get the actual data we're going to say data equals web dot data reader and we're going to pass an f string here which is going to be the crypto cryptocurrencies so we're going to say f cryptocurrency against the dollar so against the against currency like that because that is how you get the price of the cryptocurrency in u.s dollars or in this against currency from the yahoo finance api and the next thing is we need to specify that we want to use the yahoo finance api and we need to specify the time frame so start and end so the next step is to go ahead and prepare the data for the neural network we're going to say prepare data in a comment here and the first thing we want to do is we want to scale down the data so that it is squeezed in between zero and one so that the neural network can work better with it but before we do that i want to show you that the data frame or what the data frame looks like by just calling data ahead so that you can see the couple the first couple of entries here and you're going to notice that this is a full data frame with different columns you have open high low close volume and just close and what we're actually interested in is just the closing value of a particular day because that is what we're going to use as a basis and that is also what we're going to predict so i'm going to delete that here and we're going to create a scalar first we're going to say scalar equals min max scalar with a feature range of zero one this is just the boundaries we're going to squeeze all the values in between zero and one and in order in order to get the scaled data set we're going to say scaled data is scalar dot fit transform and we're going to fit transform data but not all of it but just a closed column here the values of it and then we need to reshape this reshape negative one one because that is the format that the scalar needs so then what we're going to do is we're going to choose a number for the prediction days the prediction days are going to be the number of days that we're going to base our prediction on so the idea of this whole script is that we look at the past x days for example 60 or 90 and then we predict one day in the future so we look at 60 days and then we predict the 60 first day for example and for this we need to decide how many days we want to base our prediction on of course if you just base it on the last 10 days it's going to be more actual data or more recent data but if you base it on 60 days it's going to be a little bit more conservative so we're going to say prediction days is going to be 60 in this case and then we need to prepare the training data we need to have an x data and we need to have a y data so the y data is going to be the label so or the result you could say we're doing supervised learning here we have 60 days and we have an actual value so we want to train 60 days um we want to show the neural network 60 days and it has to predict uh the 61st day so we're going to say x train and y train is going to be empty list and empty list and then what we're going to do is we're going to fill those lists up with actual value so we're going to see 4x in range prediction day so we're going to start at 60 going up until the length of the whole data set of the scaled data and what we're going to do is we're going to say x train dot append and here we're going to append uh chunks off or or packages of 60 days so we're going to say scale data and we're going to start at x minus prediction days which is why we started prediction days because this is then zero in the first iteration and goes on and on afterwards so one two three and so on um and we're going to go up until x so we're going to take the span of 60 days starting at x minus prediction days up until x and 0 as i mentioned here and this is what we're going to add to the x strain and to the y train we're going to add something else we're going to say right train without a pen to white ring we're just going to append the last day so just x the day after that so we're going to use the time period from x minus prediction days up until x to predict the x the actual x because i think that when we do this x is not included and this is the uh day after that period so this is how we fill up the list and one last thing that we need to do is we need to say x train and y train is going to be np dot array of x strain and np dot array of y train so this is just turning it into a numpy array and after that what we're going to do is we're going to say x strain we need to reshape x train um mp.reshape xtrain and we're going to just add one more dimension we're going to keep the shape we're going to add an additional dimension so we're going to say x train dot shape not dot scale data dot shape zero xtrain dot shape one and one so this is how we do that all right now i saw that the camera was quite bright because the weather outside is changing i fixed that i hope this was not too annoying so the next thing we're going to do is we're going to build a neural network so we're going to create the neural network which is going to be the model that we're going to use for the prediction and for this one thing that you might want to try is you might want to downgrade your numpy version if you have some issues if you don't have any issues once the code is written you don't have to do anything but i had some issues because i had numpy 1.20 installed and maybe when you're watching this video in the future it's no longer a problem but for me i had a problem so what i had to do is i had to downgrade numpy to the version 1.19.5 how do you do that you just say pip install numpy equals equals 1.19.5 this is only something that you want to do if you have some problems with numpy if you don't have any problems and the script works fine you don't need to do anything but this is one fix if you have the same problem that i had so we're going to say model equals sequential we're going to just do a basic sequential model here and we're going to add lstm layers and dropout layers lstm layers so long short term memory layers are going to be the recurrent layers that we're going to use to memorize stuff since we're dealing with sequential data which has you know day one day two day three and so on it's it's sequential uh those layers are very powerful because they are specialized on that sort of data and long short-term memory uh layers are just memorizing the important information and feeding data back into the neural network so i'm not going to explain the details here because we're actually uh more focused on the coding here maybe i'm going to make a theoretical episode on that as well but this is the basic idea of an lstm layer and the drop out layer is just for preventing over overfitting so in order to not overfit the network we're going to add a dropout layer or many dropout layers in between so we're going to say model at the first long short-term memory layer and it's going to have 50 units and it's going to return sequences so return sequences equals true um and then we're going to say the input shape of that is going to be the shape of x strain so x strain dot shape one and one like that then we're going to say model at drop out 0.2 just to prevent overfitting and then we're going to say model add lstm again with 50 units of course you can try to tweak those values maybe you're going to get better results or worse with our results we're going to return sequences here as well then we're again going to say model at drop out 0.2 and then we're going to add an lstm layer here without return sequence is true and then we're going to add one more dropout layer and finally what we want to have this is important because in the end we don't want to have a bunch of different values we want to have one value we just want to have one number that is going to indicate the price and for this we need to add a final layer which is going to be model dot dense and it's going to have just one unit so units equals one because that is going to be the price prediction in and off itself um so then we need to go ahead and compile the model we're going to say model.compile and we're going to use an atom optimizer optimizer equals atom loss equals mean squared error as you can see here and then we're going to train them all so we're going to say model dot fit x train y train we're going to use 25 epochs and we're going to use a batch size of 32. so this is how you build and train the model okay so next up we're going to test the model so we're going to add a comment here which is testing the model and for this we need to specify a time frame for the testing data we need to load some testing data and then we're going to compare the test results the prediction results on the test data with the actual results and for this we're going to say test start is going to be dt date time and here we need to say something like 2020 for example first off january this is going to be the start date for the test data and the test end is going to be dt daytime dot now which is today so then we get the test data and we're going to do this uh the same way we did it up here so we're going to copy that here and i'm just going to rename this to test data and of course we need to change the time frame so in the end here we're going to say test start and test and even though end and test and are actually the same um so this is how we get the test data and then we want to have the actual prices so we're going to say actual prices is going to be test data close dot values we're going to get the closing values here as the actual prices and then we're going to combine this data set so the test data set with the data set from before so we're going to say total data set is pd dot concatenate so pd.concat and we're going to pick data close with a capital c and test data close like that and the access that we're going to use is zero even though i think that's the default but i'm not sure so this is the total data set and what we're now going to do is we're going to get the model inputs we're going to create a model inputs that we're then going to use for the prediction so we're going to say model inputs equals total data set and we're going to take the length of the total data set here minus the length of the test data set so we're going to go back before the test data set and um we're also going to go back for the prediction days so that we can start choosing from the 60 days uh before that and we're going to go up until the end we're going to take those values so basically uh in in a better way explain this is just total length of the data set all the values minus the test data minus 60 days for the predictions we're going to get get those values as the input since we want to predict the values after that so we're going to get that we're going to say model inputs equals model inputs dot reshape we need to reshape them again with -1 1 and of course we need to scale them down again because our model is trained on values between 0 and 1 and those values are not between 0 and 1 so we need to say model inputs equals scalar dot fit transform model inputs like that this is how we create the model inputs for the testing and now we need to make some predictions using the trained model so we're going to say x test is an empty list and for x in range we're actually doing the same as above when we created the training data we're going to start again at prediction days going up until the length of the model inputs we're going to say x test dot append model inputs from x minus prediction days so minus 60 starting at zero essentially because we started prediction days up until x and 0 here this is how we create a test data then of course we say this is a numpy array so we say x test equals np.array of x test and we need to reshape it again adding a third dimension so x test equals np dot reshape of x test and the shape is x test shape zero x test shape one and one like that so then we need to get the predictions we now have the actual prices and we want to know the prediction prices so here we're going to say prediction prices is going to be just the model predicting not mode model dot predict on the test data and of course we're going to then have the actual prices so we don't want to have the scaled values we want to inverse scale them so we need to say the actual prediction prices are not what we get from the model but the actual prediction prices are scalar that inverse transform uh the prediction prices so then we get the actual values now last but not least we need to now visualize that we have the prediction prices and we have the actual true prices so what we're now going to do is we're going to plot both of them in matplotlib by saying plot we're going to plot the actual prices uh in black color so color equals black and the label is going to be actual prices then we're going to plot the prediction prices in let's say green color and we're going to say predicted prices and of course we can add some stuff like titles so plt title and we can say cryptocurrency price prediction like that and plt.x label is going to be the time we're not going to add actual dates here so we're just going to call this time um and we're going to call the y-axis the price and of course we're going to also have a legend since we have the labels and the location of the legend is going to be upper left and then we say plt dot show in order to visualize all that now i think that the training is going to take a while i'm going to start them all to see if we have any mistakes here and if it starts training i'm just going to skip that part and come back to you once it's done but maybe we're going to get some exceptions so let's see if it starts training what do we have here just a warning sequential has no attribute dense uh oh yeah because we set model dot dense this is of course a problem we need to say model dot at dense layer but except for that since we came up until this point this should be fine so it should start training any moment once it's loaded and done with the warnings but other than that it should work yeah you can see it starts training so i'm going to skip that part here and i'm going to come back to you once the training is actually done all right so the training is done and the curve looks quite good but before you get too excited about this because this is almost the same curve you need to keep in mind that we're only predicting one day based on 60 days so it's not like we looked at the first 60 days and then the mall predicted this curve it's we looked at 60 days and the model predicted this single point we looked at 60 days after that and it predicted that point and because of that the curve is kind of accurate but it's not like we just start at 60 days and we predict all that and it's almost the same because that would be like magic it's not like that uh what you would have to do to build a model like that is you would have to feed its own prediction into it as the past 60 days so you would have to look at 60 days predict the 60 first day and then in order to predict the 62nd day you would have to look at 59 actual days and the one day that you predicted and so on and after 60 iterations the model would base its prediction on or its predictions on the predictions that it made in the past so there's no actual data involved anymore which means that slight slight errors in the predictions in the beginning will cause huge errors afterwards uh in the future prediction so you would not get a curve like this uh actually you can see that the curve is kind of lagging because you can see bitcoin fell here and maul predicted it will fall here after it already fell so this is not really good for predictions sometimes however you can see something that happens before that so i think here you can see the model is rising before it actually rose so it can predict sometimes but most of the times it's just going to lag behind now how can we use this to actually predict days in the future so not just looking at past performance but looking at the actual prediction for tomorrow in order to do this we're going to add a comment here predict next day and what we're going to do here is we're going to say the real data that we have up until now is model inputs length of model inputs plus 1 this is why what we need to predict the next day minus prediction days minus prediction days up until the length of model inputs plus one zero so this is the data that we're going to use uh that we have up until now of course we need to again say real data equals np.array real data and we need to again do the same thing with the dimensions so i'm going to oops let me just copy this line here and replace x test with real data and replace it here actually we can say x test replaced with real data there you go so this is how we can do that and then we're going to do the prediction so we're going to say prediction equals model dot predict on the real data and prediction is just going to be scalar inverse transform prediction and we can then print the prediction in us dollars now i'm not going to run them all again because it trains quite long on my laptop since my laptop is not really good but this is how you can do it you just predict and then you can print the prediction now one last thing that we can do here which is kind of experimental is we can target a day in the future that is not just the next day so we're not targeting the 61st day but we're targeting for example the 90th day which means that we're looking at 60 days and we're still just predicting one single value we're not going to predict a sequence of movements we're going to predict one price in the future but it's not going to be the 61st day but it's going to be the 90th day for example and we can do that by just saying for example future day is going to be 30 which means that we're going to predict not the next day but the 30th day after uh those 60 days and what we need to do in order to do that is to just subtract it here in the loop so we're going to say up up until length scale data minus future days or future day and down here we're not going to predict x but we're going to predict x plus future day this is just a slight change that we can use here in order to predict not the next day but 30 days after that so once the model is trained you can see that this is a result with the future day prediction so with 30 days into the future and it looks actually better now i'm not 100 sure if i read this right so i don't want to claim that this is in any way predicting the bitcoin price but if i read this right this would mean that the mall at this point in time thinks for example that in 30 days this is going to be the price and then you can go 30 days to the right and you can see if this price is right or not and if this is the case this is actually quite a good prediction because it says that in 30 days the price will be here and in around 30 days the price is actually here so this is not a bad prediction however this is not always going to be that good and actually in this case it probably just thinks that it's going to go straight up because it started going up here so it did not predict for example that it's going to fall down in between it just saw that it's rising and it's probably going to think that it's going to continue rising uh but you can play around with the values you can try to enter 100 in a future day variable you can change the bitcoin to ethereum or to ripple and so on and you can play around and see if the predictions make sense or not so that's it today's we hope you enjoyed hope you'll learn something if so let me know by hitting the like button and leaving a comment in the comment section down below also don't forget to check out tab 9 in the link in the description down below and of course hit the subscribe button and hit the notification bell to not miss a single future video for free other than that thank you very much for watching see you next video and bye [Music] you
Info
Channel: NeuralNine
Views: 29,257
Rating: 4.930902 out of 5
Keywords: python, finance, investing, stocks, chart, charts, stock visualization, data science, machine learning, prediction, stock analysis, stock prediction, predicting stocks, prices, price, stock, python stock prediction, neural networks, RNN, recurrent neural networks, LSTM, crypto, crypto currency, cryptocurrency, bitcoin, ethereum, dogecoin, ripple, bitcoin predict, crypto prediction
Id: GFSiL6zEZF0
Channel Id: undefined
Length: 31min 30sec (1890 seconds)
Published: Thu Apr 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.