tatsath / fin-ml Goto Github PK

This github repository of "Machine Learning and Data Science Blueprints for Finance". Please star.

Jupyter Notebook 99.97% Python 0.03%

python machine-learning finance fintech algorithmic-trading

fin-ml's Introduction

Machine Learning and Data Science Blueprints for Finance - Jupyter Notebooks

This github repository contains the code to the case studies in the O'Reilly book Machine Learning and Data Science Blueprints for Finance

Simply open the Jupyter notebooks you are interested in by cloning this repository and running Jupyter locally. This option lets you play around with the code. In this case, follow the installation instructions below.

Want to play with these notebooks online without having to install anything?

Use any of the following services.

WARNING: Please be aware that these services provide temporary environmets: anything you do will be deleted after a while, so make sure you download any data you care about.

Recommended: Open it in Binder:
- Note: Binder is a hosting service and the directories of the book will open exactly like they open on your local machine with no installation required. The connection between different files within the folder will work seamlessly. Most of the time, Binder starts up quickly and works great, but when the github repository of this book is updated, Binder creates a new environment from scratch, and this can take quite some time. Also, some of the case study, specially that require more cache data might be slow.
Open this repository in Colaboratory:
- Note: Google colab supports GPU and can be quite fast. However, the linkages to data file located in the folders of the git directory may not work. Upload the data files seperately while running the jupyter notebooks on google colab. For loading the data files on google colab, you can replace the local directory path with the github path. For example, for the data of case study 1 of chapter 7 dataset = read_csv('Dow_adjcloses.csv') in the code can be replace with dataset = read_csv('https://raw.githubusercontent.com/tatsath/fin-ml/master/Chapter%207%20-%20Unsup.%20Learning%20-%20Dimensionality%20Reduction/CaseStudy1%20-%20Portfolio%20Management%20-%20Eigen%20Portfolio/Dow_adjcloses.csv') for it to work on google colab.

Just want to quickly look at some notebooks, without executing any code?

Browse this repository using jupyter.org's notebook viewer:

Want to install this project on your own machine?

Start by installing Anaconda (or Miniconda), git, and if you have a TensorFlow-compatible GPU, install the GPU driver.

Next, clone this project by opening a terminal and typing the following commands (do not type the first $ signs on each line, they just indicate that these are terminal commands):

$ cd $HOME  # or any other development directory you prefer
$ git clone https://github.com/tatsath/fin-ml.git
$ cd fin-ml

If you do not want to install git, you can instead download master.zip, unzip it, rename the resulting directory to fin-ml and move it to your development directory.

If you are familiar with Python and you know how to install Python libraries, go ahead and install the libraries listed in requirements.txt and jump to the Starting Jupyter section. If you need detailed instructions, please read on. We would encourage you to stick to the version of the packages in the 'requirement.txt' file.

Python & Required Libraries

Of course, you obviously need Python. Python 3 is already preinstalled on many systems nowadays. You can check which version you have by typing the following command (you may need to replace python3 with python):

$ python3 --version  # for Python 3

Any Python 3 version should be fine, preferably 3.5 or above. If you don't have Python 3, we recommend installing it. To do so, you have several options: on Windows or MacOSX, you can just download it from python.org. On MacOSX, you can alternatively use MacPorts or Homebrew. If you are using Python 3.6 on MacOSX, you need to run the following command to install the certifi package of certificates because Python 3.6 on MacOSX has no certificates to validate SSL connections (see this StackOverflow question):

$ /Applications/Python\ 3.6/Install\ Certificates.command

On Linux, unless you know what you are doing, you should use your system's packaging system. For example, on Debian or Ubuntu, type:

$ sudo apt-get update
$ sudo apt-get install python3 python3-pip

Installing Anaconda

After installing Python, we recommend installing Anaconda. This is a package that includes both Python and many scientific libraries. You should prefer the Python 3 version.

Using pip

Installing Anaconda, should install most of the commonly used libraries in the case studies. Given that there might be changes to the Anaconda package and some libraries might be out of date, it is a good idea to learn how to install packages in python using pip.

Installing pip

These are the commands you need to type in a terminal if you want to use pip to install. Note: in all the following commands, if you chose to use Python 2 rather than Python 3, you must replace pip3 with pip, and python3 with python.

First you need to make sure you have the latest version of pip installed. If you are on the latest version of Python, pip should already be installed. You can check using the following command.

$ pip -V

If you do not have pip install, you can run the following command on Linux

$ sudo apt-get install python3-pip

Or download get-pip.py and install it on Windows using

$ python3 get-pip.py

If you have pip already installed, it might be a good idea to upgrade it.

$ python3 -m pip install --user --upgrade pip

The --user option will install the latest version of pip only for the current user. If you prefer to install it system wide (i.e. for all users), you must have administrator rights (e.g. use sudo python3 instead of python3 on Linux), and you should remove the --user option. The same is true of the command below that uses the --user option.

Creating an environment (optional)

Next, you can optionally create an isolated environment. This is recommended as it makes it possible to have a different environment for each project (e.g. one for this project), with potentially very different libraries, and different versions:

$ python3 -m pip install --user --upgrade virtualenv
$ python3 -m virtualenv -p `which python3` env

This creates a new directory called env in the current directory, containing an isolated Python environment based on Python 3. If you installed multiple versions of Python 3 on your system, you can replace `which python3` with the path to the Python executable you prefer to use.

Now you must activate this environment. You will need to run this command every time you want to use this environment.

$ source ./env/bin/activate

On Windows, the command is slightly different:

$ .\env\Scripts\activate

Installing Python packages

Next, use pip to install the required python packages. If you are not using virtualenv, you should add the --user option (alternatively you could install the libraries system-wide, but this will probably require administrator rights, e.g. using sudo pip3 instead of pip3 on Linux).

The following command is used to install python package with a particular version.

$ pip3 install <PACKAGE>==<VERSION>

If you want to try to install a list of packages from a file. You can use the following command.

$ python3 -m pip install --upgrade -r requirements.txt

Great! You're all set, you just need to start Jupyter now.

Installing Package models

For the chapter on Natural Language Processing. We will be using the spaCy python package. Installing spaCy does not install the language models used. In order to do that, we need to open up python and install it ourselves using the following commands.

$ python -m spacy download en_core_web_lg

Starting Jupyter

Okay! You can now start Jupyter, simply type:

$ jupyter notebook

This should open up your browser, and you should see Jupyter's tree view, with the contents of the current directory. If your browser does not open automatically, visit 127.0.0.1:8888. Click on index.ipynb to get started!

Installing Libraries in Jupyter using pip

If you install a library and are not able to import it on the jupyter notebook. You might be installing them on the system python environment. We can use Jupyter notebooks to install packages using the ! symbol at the start. THe following libraries are the ones that are required outside the latest Anaconda package as of now.

$ !pip install spacy
$ !pip install pandas-datareader
$ !pip install keras
$ !pip install dash
$ !pip install dash
$ !pip install dash_daq
$ !pip install quandl
$ !pip install cvxopt

Want to look at the individual case studies or jupyter notebooks?

Notebooks by Application in Finance

Notebooks by Machine Learning Types

1. Supervised Learning- Regression and Time series Models

Stock Price Prediction
Derivative Pricing
Investor Risk Tolerance and Robo-advisors
Yield Curve Prediction

2. Supervised Learning- Classification Models

Fraud Detection
Loan Default Probability
Bitcoin Trading Strategy

3. Unsupervised Learning- Dimensionality Reduction Models

Portfolio Management - Eigen Portfolio
Yield Curve Construction and Interest Rate Modeling
Bitcoin Trading - Enhancing Speed and accuracy

4. Unsupervised Learning- Clustering

Clustering for Pairs Trading
Portfolio Management - Clustering Investors
Hierarchial Risk Parity

5. Reinforcement Learning

Reinforcement Learning based Trading Strategy
Derivatives Hedging
Portfolio Allocation

6. Natural Language Processing

NLP and Sentiments Analysis based Trading Strategy
Digital Assistant-chat-bots
Documents Summarization

Master Template for different machine learning type

Supervised learning - Regression and Time series
Supervised learning - Classification
Unsupervised learning - Dimensionality Reduction
Unsupervised learning - Clustering
Natural Language Processing

fin-ml's People

Contributors

Stargazers

Watchers

Forkers

evanafonseka vishalbelsare eeznf kmishra1204 yesitsrg harry-zhao78 matus-jan-lavko istom1n bgarias bunny3363 humepac abarton214 caozq19 renves kallooa raevas2010 victorhzhang handielg dli-invest simongarisch boyangerretzen dickronez pawpol deantsmith mohsinzafaruk1996 genarionogueira amandarossimandi thunder-fire cosncode chenken070501 laurasanchz2 frankfan007 wfule jingmouren aronraja allensmile chaoso to-be-or-not yonglin tuqinabc hlz1992 lucas-he-1992 chetanmehra foreverqing tcwltcwl bibofeng xrosliang navin9292 skytay wardhills snapbuy lauyuda ibrainscn jql623 javedhassans ifshirokikh anilmm2005 seddikib hkskunal077 procourses hyung003 fedeglan nostalgg mccti078 mr10101 nboustanitab cl19951225 maxreiss123 kurucan aviro092 bgkyer txnguyen292 phuysmans amimul sachinmittal2212 quant-tree marcelotorresi trikala56 andyliangdong ilias325 riteshchhetri10 hessikaveh dorogov smartmonkeyluca vinngs1208 kcompher karsubr ruilorenzetti real-tk gridl evgenygrobov element824 ggomes88 abskarimi tranvanluan2 smalltarget108 street-king ulriksebastienr sfg11 gibson4690

fin-ml's Issues

Chapter 2 : dataset

Hi,

Where is the dataset related to the chapter 2 ? Thank you in advance.

Bitcoin trading strategy with RF, where is the actual signal output for the most recent time stamp?

Okay so say I want to use this on the data up until today, rather than just the signal of the short ema higher than the long ema, how do i combine the signal of all of the features to then come up with the binary decision to buy or sell? All that I am seeing is the binary decision for the ['signal'] column which is only based on the ema's, how do i go about retreiving the signal that combines all of the features, I see where thing's get backtested and we see the final results but how do i dig into the most recent time stamp to tell if it's a buy or sell signal based on all of the features rather than just the ema's for todays price? I'm confused of how to go about extracting the signal to actually use it in production, I would greatly appreciate your response, thank you.

Chapter 9 - Trading strategy - kernel Died

During the training of the model i can't even finish the first episode because after some time I get the message "Kernel died". I'm using a macbook with m1. I tried to lower the training size, the memory size, but the result is always the same.

Chapter 9, Case study 1 - no buy signal on the test set

Hello,

I have been trying to run Case 1 of Chapter 9 (Reinforcement Learning) both on Jupyter and on Pyhton on my local PC. Every time I get a blank result from running the trained model on the test set - there are no buy signals and hence no P&L generated. I tried different options including using the train set instead of the test set, but still the result is the same. The model training goes ok by going through a number of episodes, but then when it comes to using the model that has been created, there seems to be an issue as there are no buy signals generated. Could you please look into this to see if the Jupyter code as posted on GitHub works fine and whether there are any typos or errors in it.

Thank you and kind regards,
Alex M.

Chapter 6 -

Hi -

I am getting error ""['id'] not in index" when the below code is run in chapter 6 - LoanDefaultProbability
dataset[['id','emp_title','title','zip_code']].describe()

Can you help, I made no changes in the code.

Chapter 9, Case study 1 - typo / logical

Hello, I found a typo or logical error in chapter 9 - RL model page 290.

Figure 9-4 describes:
In-state S1 (stagnant market) there are 2 option:

hold a0
sell a2

But the paragraph explains another way:
[...]In state s1 it has only two possible actions:

hold a0
buy a1

Which is correct?
Thank you.

Chapter 7, Do the portfolio's which have negative weights for stocks like weights[1] mean to sell those?

Hi i just wanted to clarify if for example portfolio 2 / weights[2] which has negative weights does that mean in the portfolio its meant to be a sell with such a weight?

example : weights
WMT 154.600
HD 98.409
WBA 83.098
INTC 73.599
MRK 70.360
PG 69.292
VZ 67.323
KO 63.387
CSCO 59.287
PFE 57.201
MSFT 54.698
JNJ 50.306
MCD 50.279
IBM 39.230
NKE 32.219
AAPL 26.293
DIS 26.185
AXP -3.605
TRV -16.914
UNH -32.764
JPM -35.070
GS -53.526
MMM -64.635
BA -78.023
UTX -83.237
CAT -133.755
XOM -225.916
CVX -248.319

or do the negative numbers just mean do nothing with them?

error in section 6.2 of FraudDetection.ipynb

The code, as written, does not create equally distributed classes.

Binder does not open and Yahoo Finance no longer supported. Which library to use?

Hello,

I have been trying to get started with the case studies in the book using both the Binder notebooks and Jupyter notebooks provided through supplemental GitHub material. When I try to create a Binder instance, I receive the following error:

` error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
/tmp/pip-install-mbfkwcqr/pandas_6b0bec17a8b44e1eab271987003b87bd/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
/srv/conda/envs/notebook/lib/python3.10/site-packages/setuptools/init.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************
  
  !!
    dist.fetch_build_eggs(dist.setup_requires)
  error in pandas setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected end or semicolon (after version specifier)
      pytz >= 2011k
           ~~~~~~~^
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Removing intermediate container bc318755c302
The command '/bin/sh -c ${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache-dir -r "requirements.txt"' returned a non-zero code: 1`

I tried poking around the documentation but did not see any good way to implement the recommended 'PEP 57 installer'.

Then I tried using the Jupyter notebooks for the "Stock Price Prediction" case. Unfortunately, Yahoo finance is no longer supported, yfinance is a fragile web scraper substitute, and quandl did not work either.

What is the recommended path forward here? Unfortunately, I do not have a developer background and come more from the finance side. I am happy to explore the technical architecture and requirements, but feel like I am a bit outside my domain of expertise.

5.3.1 ln [67] ValueError

Trying to follow along with code from
https://github.com/tatsath/fin-ml/blob/master/Chapter%205%20-%20Sup.%20Learning%20-%20Regression%20and%20Time%20Series%20models/Case%20Study%201%20-%20Stock%20Price%20Prediction/StockPricePrediction.ipynb

I am receiving the following error:
"ValueError: Setting a random_state has no effect since shuffle is False. You should leave random_state to its default (None), or set shuffle=True."

Chapter 8 - HRP

Trying to follow along with the code, I have an error while trying to get the portfolio weights :

AttributeError: 'Series' object has no attribute 'append'

The error seems to be in the function getQuasiDiag(link) where it's trying to append a Series object

chapter 9, case study 3, fuction nn_pred_to_weights is not reasonable

In DQN, the Q-network is an approximation of optimal action value fuction, so that the action with max value is the action agent should choose. If the input to Q-network is only the state, the output should be value of each action, so the output dim should be same as action dim. The implicit policy is the argmax of action value. However, in this code, the action is a complex mapping from all action value to asset weight (in fuction nn_pred_to_weights), which I think is not reasonable.

Derivatives Hedging does not work with TensorFlow 2.0

The Derivatives Hedging notebook in Chapter 9 does not work with the latest version of TensorFlow (Keras).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.