Code Monkey home page Code Monkey logo

algorithmic_trading's Introduction

Ironhack Logo

Algorithmic Trading using SARIMAX for Bitcoin

Nicolás M. Japas

Berlin, Cohort 2021-01

Content

Project Description

The goal of the project was to reach an MVP for an automated trading platform using bots for buying and selling stocks and cryptocurrencies in real-time. I focused on Bitcoin but the general approach is valid for other instruments after introducing some adaptations. The project will be upscaled in the future. A SARIMAX model was used, and several 'exogenous' variables explored. I built two models, one for 1-hour charts and the other for 1-minute in real-time.

Hypotheses / Questions

  • How to make use of algorithms to predict the price of a stock or cryptocurrency?
  • Relevant to anyone interested in trading or investing.
  • Which exogenous variables are relevant for bitcoin? Which parameters are well suited for the ARIMA model applied? Does the confidence in the predictions decay over time?

Dataset

  • The data was obtained from several sources. I used two APIs, one for bitcoin (Binance) and another for stocks (Alpha Vantage). I saved everything on .csv files but the code for fetching data is included in the respective modules ('cryptos' and 'stocks'). The code is modularized inside this files. This way, it will be easy to upscale the project to support other financial instruments.
  • I also scraped data for Google Trends, Twitter, and others.

Cleaning

To get the data was the main difficulty, as I had to adapt to several limitations. For example alpha vantage API has a limit on the amount of GETs available if the free account is used. The documentation of the Binance API is quite poor, although a wrapper was useful for that. To scrape the data about Twitter, Google and Marketcap of bitcoin I had to look carefully at the source code of the website (bitinfocharts dot com), as it was fairly hidden inside javascript code. After all that, a lot of work dealing with datetime indexes and frequencies (which introduce NaNs in the code when a certain time period is not in the original data).

Analysis

  • Exploratory plotting, check for correlations, scatter, hist and combined plots.
  • pairplot_
  • j5_
  • j4_
  • Feature selection came in this case from BI, and knowing how bitcoin's movements are related with them. The plots and correlations were a confirmation. And the the model provided significant Pearson correlation coefficients in the diagnostics, further suggesting that the selected data was at least to some degree relevant.

Model Training and Evaluation

  • The data was scaled and split. I also used logarithmic transform.
  • To get the parameters for the ARIMA model was tricky, it wasn't enough to use the popular AUTO-ARIMA module which provides the apparent better values for p, d and q. It uses the AIC results (I believe), but it took me some trial and error to get better results.
  • For the real-time data, a simpler model was used, without exogenous variables. Only the endogenous.

Conclusion

  • I was able to generate predictions, both for the test values and for the unknown future (out-of-sample forecast).
  • A web-socket was implemented to fetch data in real-time directly from Binance, one of the major exchanges in the world. It streams data straight to the program, which can be analyzed in real-time (see the module binance_web_socket).
  • The features selected appear to be relevant. The model should be back-tested (not enough time) but appear to be a useful first approximation to an extremely complex problem.
  • plotly

Future Work

This is going to be a commercial project in the future, deployed on a website where users will be able to create an account, log-in and set-up their bots to trade for them without the need of knowing anything about coding. The real-time fetching of data will be generalized, and more and more stocks and cryptocurrencies will be available.

The connection with Binance API will enable the placement of real trades, automating the process for the user.

All the data which is now loaded from the disk has to be updated (after buffering) in real-time. For price, volume, etc. but also for Twitter, Google Trends, Marketcap, and other valuable data.

Workflow

Get the data, build the model, plot the results. Iterate. Try more features, compare results. Iterate.

The model provides diagnostics such as:

diagnostics

I also calculated Mean absolute percentage error, Root mean square error, Pearson product-moment correlation coefficients, and others.

But the best indicator would be a back-test, which wasn't able to complete due to time-constraints.

Organization

No Trello nor Kanban. Just a lot of work, dedication and passion.

Everything is more or less modularized so there are several notebooks, with 'main' as the one the be run. There are also versions of the notebooks in .py, because they are loaded from main as libraries.

There are several folders with data. The folder 'data' has BTCUSDT (Bitcoin), AMZN (Amazon) and GOOGL (Google) financial data. In 'project_files' are two files with the list of available stocks and cryptocurrencies (for future upscaling). In 'scrape_bitinfocharts' are data files scraped, about Twitter, Google Trends, etc. This structure will be improved later.

There is also a 'presentation' folder with the power point used for my presentation.

Links

https://github.com/nicojapas/algorithmic_trading

algorithmic_trading's People

Contributors

nicojapas avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.