tunaalaygut / reincrypt Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 560 KB

Reinforcement learning based cryptocurrency investment strategies

Home Page: https://reincrypt.com

Python 84.57% HTML 2.01% CSS 0.15% Jupyter Notebook 11.73% Shell 1.54%

cryptocurrency deep-q-network reinforcement-learning transformer

reincrypt's Introduction

Reincrypt Project

Introduction & Aim

In this project we aim to develop a python package that can:

Download raw cryptocurrency data between given dates using Yahoo Finance API (with yfinance package) with the following format.
For each date calculate various technical indicators using the raw data.
Normalize the technical indicator values into interval [0-255].
Create a .RIMG file for each date using normalized technical indicator values. A .RIMG file is a special file format that contains a grayscale image and the date.
Train a DQN to map the state (grayscale image) into the optimal action (Long, Short, Neutral).
Validate the models decision using Top/Bottom K and Market Neutral portfolios.

Usage

Setting Up the Environment

First, clone the repository with your preferred method. For example, to clone using GitHub CLI:

gh repo clone tunaalaygut/reincrypt && cd reincrypt

Then, create a virtual environment, activate it, and install the requirements

virtualenv env
source env/bin/activate
pip install -r requirements.txt

Now you are ready to move on.

Data Preparation

This section covers the necessary steps to prepare your data for training.

Downloading raw data

To download raw cryptocurrency data ticker_downloader.py module can be used. However, before you do that you need to create a list of currencies you want to download and write them to a .txt file. For example;

# currencies.txt
KCS-USD
WGR-USD
MYB-USD
...

cd src/data_prep
python ticker_downloader.py currencies.txt

Running the command above will produce an output similar to the following:

[*********************100%***********************]  1 of 1 completed
KCS-USD downloaded and written to file.
[*********************100%***********************]  1 of 1 completed
WGR-USD downloaded and written to file.
[*********************100%***********************]  1 of 1 completed
MYB-USD downloaded and written to file.
...

Downloaded raw data should have the following format for each crpytocurrency:

Date	Open	High	Low	Close	Adj Close	Volume
2017-11-09	0.000224	0.000297	0.0002119	0.000282	0.0002	8605
2017-11-10	0.000284	0.000292	0.0001340	0.000264	0.0002	4201
2017-11-11	0.000263	0.000233	0.0001232	0.000194	0.0001	1266
...	...	...	...	...	...	...

Ideally, you would have two files training_currencies.txt and verification_currencies.txt to download training and verification currency data separately. Or you can separate them later.

Creating `.RIMG` files

Once the raw data is downloaded you would need to create .RIMG files. image_creator.py module does exactly that.

It works by calculating 32 pre-defined and clustered technical indicators for 32 different time intervals, normalizing the calculated values between [0, 255]. Hence, creating a 32x32 grayscale image for each day for each currency.

It also adds the date and a scalar. Scalar is the percentage difference of the currency's closing value from the previous day.

A minimized .rimg file looks like the following:

000 000 001 005 ...
005 008 000 000 ...
012 009 000 000 ...
020 010 011 005 ...
...
$
-4.694593226213477
$
2018-06-20

Visualizing `.RIMG` images

To visualize the created .rimg file(s) one can use utility/rimg_viewer.py module.

To use it:

python rimg_viewer.py <FILENAME>.rimg

Which will produce an image similar to following image:

Training & Validation

Train a model

Once the data is ready, a training configuration needs to be defined in a .JSON format with the following fields:

num_actions: Number of actions the DQN will map states into
max_iterations: Number of iterations the training will take
learning_rate: Learning rate for the gradient step
epsilon_init: Initial epsilon value
epsilon_min: Minimum epsilon value
memory_size: Size of the experience replay memory
B: Number of iterations before updating the online network weights
C: Number of iterations before updating the target network weights
gamma: Discount factor
batch_size: Batch size
penalty: Penalty to apply for position change
patch_size: Transformer patch size
projection_dim: Transformer projection dimension
enable_resizing: Whether to enable image resizing
resized_image_size: Resized image size if enable_resizing
num_heads: Number of heads in ViT
mlp_head_units: Definition of MLP units in ViT
transformer_units: Definition of transformer units in ViT
transformer_layers: Number of transformer layers in ViT
description: Description of the configuration

A sample training configuration is as follows:

{
  "num_actions": 3,
  "max_iterations": 50000,
  "learning_rate": 0.0001,
  "epsilon_init": 1.0,
  "epsilon_min": 0.1,
  "memory_size": 1000,
  "B": 10,
  "C": 1000,
  "gamma": 0.99,
  "batch_size": 32,
  "penalty": 0.05,
  "patch_size": 6,
  "projection_dim": 64,
  "enable_resizing": false,
  "resized_image_size": 36,
  "num_heads": 4,
  "mlp_head_units": [
    2048,
    1024
  ],
  "transformer_units": [
    128,
    64
  ],
  "transformer_layers": 8,
  "description": "Using k-means clustered technical indicators."
}

With the configuration created, simply run

python main.py -i <PATH_TO_TRAINING_DATA_DIR> -c <CONFIG_NAME>.json

-i or --input-path argument is used to indicate the input data directory.
-c or --config' argument is used to specify which config file to use.

Verify the model

Output of the trained model will be placed under the directory $WORKSPACE/reincrypt/src/dqn/output/<CONFIG_NAME> and it will have a structure similar to this:

<CONFIG_NAME>
├── <CONFIG_NAME>_model/
├── training_<TIMESTAMP>.json
└── training_<TIMESTAMP>.png

Where <CONFIG_NAME>_model/ will be the keras model output, .json is the training logs and the .png file is the loss plot of the training. Keras model output can be used to load the model and make inferences.

To create portfolios and validate the model on the verification data set:

python main.py -v -i <PATH_TO_VERIFICATION_DATA_DIR> -c <CONFIG_NAME> -m <PATH_TO_CONFIG_NAME_model_DIR>

-i or --input-path argument is used to indicate the input data directory.
-v or --verification flag is used to indicate the program will be run in verification mode.
-c or --config' argument is used to specify which config file to use.
-m or --model argument is used to specify the keras model output created in training phase. Required if -v flag is set.

After the verification a verification log in JSON format and a cumulative asset chart (.png) will be created under the same output directory of the model. Output directories structure will be as following.

<CONFIG_NAME>
├── <CONFIG_NAME>_model/
├── training_<TIMESTAMP>.json
├── training_<TIMESTAMP>.png
├── verification_<VERIFICATION_TIMESTAMP>.json
└── verification_cumulative_assets_<VERIFICATION_TIMESTAMP>.png

An example verification_<VERIFICATION_TIMESTAMP>.json file looks like this:

{
  "portfolio_method": "Market Neutral",  // Portfolio used
  "verification_start": "2023-12-26 06:26:33.082213",  // Verification start timestamp
  "verification_end": "2023-12-26 06:27:18.631933",  // Verification end timestamp
  "verification_duration (m)": 0.0,  // How long did the verification take?
  "verification_tickers": [  // Currencies used in the verification
    "DGC-USD",
    "TRC-USD",
    "GLC-USD",
    "FTC-USD",
    "NMC-USD",
    "LTC-USD",
    "PPC-USD",
    "BTC-USD"
  ],
  "num_tickers": 8,  // # of currencies
  "config": {  
    // Model's configuration, as defined in training step,
    "experiment_name": "experiment_11",
    "height": 32,
    "width": 32,
    "num_days": 366
  },
  "results": {
    "position_change": 225.28317512,  // How many times model changed position
    "cumulative_asset": 6.14175614,  // Cumulative asset at the end
    "sharpe_ratio": 2.359828034700596,  // Calculated sharpe ratio
    "num_days": 366,  // # of verification days 
    "date_begin": "2021-10-30",  // Verification begin date
    "date_end": "2022-10-30",  // Verification end date
    "daily_results": [  // Daily movements made by the model
      {
        "day_index": 0,
        "cumulative_asset": 1,
        "avg_daily_return": -6.53714439
      },
      {
        "day_index": 1,
        "cumulative_asset": 0.93462856,
        "avg_daily_return": 2.23261434
      },
      {...},
    ]
  }
}

And a cumulative asset chart:

Verification process, by default, uses market neutral portfolio. In order to use Top/Bottom-K portfolio you would need to use the -k (--topbottomk) argument and specify the K value (percentage) you want to use.

For example, following example runs a verification using Top/Bottom K porfolio with the K value of 0.3:

python main.py -v \
  -i <PATH_TO_VERIFICATION_DATA_DIR> \
  -c <CONFIG_NAME> \
  -m <PATH_TO_CONFIG_NAME_model_DIR> \
  -k 0.3

reincrypt's People

Contributors

Stargazers

Watchers

Forkers

seferlab

reincrypt's Issues

Discuss y value normalization

Discuss whether we should normalize the y value

Divide config parameters into RL and transformer parts

In config files, divide input parameters to two parts:

Parameters of the agent (RL)
Parameters of transformer (hyperparameters)

Make parameter `projection_dim` actually parametric

Make parameter projection_dim actually parametric

Customize training code

Currently, we are using the training code as it is. Customize it to fit our needs.

train
model
experience_replay

While doing so, make sure you grasp the concepts properly.

Split rimg data for training, hyperparameter optimization and testing

Figure out how and do

Refactor `image_creator`

Modify dqn to accept new image file type

Modify read_X and read_y methods in DataPPRL.py

Experiment with new set of hyperparameters

Experiment with new set of hyperparameters given in this article.

Move testing code elsewhere

Neutralized portfolio etc. should be inside another module

Update `image_creator` to accept a directory as input

Update image creator so it accepts a directory as an input and sets the outputs accordingly

Investigate why the loss values seem so irrational

Investigate why the loss look so unstable while training

Warnings in #29
Customized gradient step, is it accurate?

Height and width should not be parametric

Height and width should not be set explicitly, rather they should be extrapolated from the data.

Ignore `Adj Close` column in downloader

Add data shape to logger

Update logger such that it shows shapes for tensors X and y

Update `image_downloader.py` to download only diff

Update image_downloader.py in a such a way that it only downloads the missing days.

Implement `image_creator` module

Implement image_creator module.

Define its interface
Implement in Python

Try the other image creation method

Try the other image creation method, where they used chart images, explained in the DQN method.

Make output model name parametric

Maybe place it in the same folder as logging?

Investigate Apache Spark

In the paper Algorithmic financial trading with deep convolutional neural networks, they mention that they used Apache Spark. Investigate to see whether it can be helpful or not.

Review (possibly refactor) image creation process

There seems to be some issues with image creation process.

Review the process and refactor, if need be.

Create a .rimg image viewer

Investigate: Training warnings

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. model.compile_metrics will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as dense_20_layer_call_fn, dense_20_layer_call_and_return_conditional_losses, embedding_1_layer_call_fn, embedding_1_layer_call_and_return_conditional_losses, query_layer_call_fn while saving (showing 5 of 100). These functions will not be directly callable after loading.

Check the causes of the warnings above

Review transformer code

Review transformer code and make sure nothing is over looked

Cross reference between multiple implementations in keras website

Investigate if there is a way to parameterize technical indicator functions

Create README.md for experimentation

Figure out: How to best utilize the GPU

There are still some lines of code that we do not understand

gpu_config = tf.compat.v1.ConfigProto()
# only use required resource(memory)
gpu_config.gpu_options.allow_growth = True
gpu_config.gpu_options.per_process_gpu_memory_fraction = 0.5  # restrict to 50%

Find keras implementation here

Google Drive
GCS
?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.