Code Monkey home page Code Monkey logo

mlfrm's Introduction

Machine Learning for Financial Risk Management with Python

This repository provides Python code and Jupyter Notebooks accompanying the Machine Learning for Financial Risk Management with Python book published by O'Reilly.

Buy the book on Amazon.

github_cover

mlfrm's People

Contributors

abdullahkarasan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mlfrm's Issues

Chapter 8 - Cost calculations

cost_log = conf_mat_log[0][1] * cost_fp + conf_mat_boost[1][0] *
cost_fn.mean() + conf_mat_log[1][1] * cost_tp
cost_dt = conf_mat_dt[0][1] * cost_fp + conf_mat_boost[1][0] *
cost_fn.mean() + conf_mat_dt[1][1] * cost_tp
cost_rf = conf_mat_rf[0][1] * cost_fp + conf_mat_boost[1][0] *
cost_fn.mean() + conf_mat_rf[1][1] * cost_tp

Should all 3 cost variables be using conf_mat_boost[1][0] or should it be:

cost_log = conf_mat_log[0][1] * cost_fp + conf_mat_log[1][0] *
cost_fn.mean() + conf_mat_log[1][1] * cost_tp
cost_dt = conf_mat_dt[0][1] * cost_fp + conf_mat_dt[1][0] *
cost_fn.mean() + conf_mat_dt[1][1] * cost_tp
cost_rf = conf_mat_rf[0][1] * cost_fp + conf_mat_rf[1][0] *
cost_fn.mean() + conf_mat_rf[1][1] * cost_tp

I had to use a higher memory machine (4 vCPU and 32GB vs 4 vCPU and 16GB) to enable the cost sensitive models to complete. Otherwise a memory allocation error was generated or the kernel crashed.

Amended 'from keras import regularizers' to import from tensorflow as 'from tensorflow.keras import regularizers'

Autoencoder training takes a while to complete 100 epochs.

Chapter 1

  1. Had to install plotly:

!pip install plotly

  1. Missing a closing bracket from

rand = np.random.rand(n_assets

3.. portfolio = np.array([port_return(np.random.randn(n_assets, i))
for i in range(1, 101)])

This appears to be varying the number of simulations from 1 to 100 for each of 100 experiments, should it be:

portfolio=np.array([port_return(np.random.randn(n_assets, n_simulation)) for i in range(1,101)])

Chapter 3 - Date range mismatch between prediction and test dataset

The prediction appears to be generated against the last n_steps of the training data (22/11 - 11/12) and then plotted against the date range of the test set (12/12 - 31/12)

These are the same length because the prediction is generated over n_steps = 13 which is same as the last 5% of the original 251 records selected for the test set.

Chapter 7 - Ticker selection and window calculations

  1. The rolling_five calculation seems to loop over the first rows of INTC values for each of the TICKERs regardless of the value of j. Should the subset of liq_data matching the TICKER value be selected as per:

rolling_five.append(liq_data[liq_data.TICKER == j][i:i+5].agg({'BIDLO': 'min',
'ASKHI': 'max',
'VOL': 'sum',
'SHROUT': 'mean',
'PRC': 'mean'}))

This same bug seems to be present in subsequent sections as well.

  1. The last 4 values of the rolling window have 4,3,2 and 1 rows to calculate over, such that the last row is simply the values of BIDLO,ASKHI, VOL, SHROUT & PRC. If the modification above is included this occurs at the end of each set of rows per TICKERs as opposed to just once at the end of the file.

  2. For the liq_ratio calculation the numerator is a sum of 5 sets of 'price X volume' calculations whereas the denominator is a single difference of means. As such the numerator-denominator ratio comes out 5 times greater than perhaps it should. Alternatively means could be used in the numerator as well as:

liq_ratio.append((liq_vol_all[liq_data.TICKER == j]['PRC'][i+1:i+6].mean() *
liq_vol_all[liq_data.TICKER == j]['VOL'][i+1:i+6].mean())/

  1. When calculating the turnover ratio if the [liq_data.TICKER == j] modification suggested above is included the covariance calculation fails on row 233 as there are insufficient rows left for i, i+6 to be compared against i,i+5 as there are only 238 INTC rows in total.

The only fix i can suggest is to skip calculating the roll value for last 5 rows for each TICKER but this doesn't seem very acceptable as it leaves missing values in the dataset:

for j in liq_vol_all.TICKER.unique():
for i in range(len(liq_vol_all[liq_vol_all.TICKER == j])-5):

  1. On Lhh calculation we have:

(liq_vol_all[liq_data.TICKER == j]['VOL'][i:i+5].sum() /

As a suggestion to ensure the last few rows are of the same magnitude as the previous values as fewer rows are available as we come to the end of the dataset an option might be:

((liq_vol_all[liq_data.TICKER == j]['VOL'][i:i+5].mean()*5) /

I stopped here with this chapter based on my comments above

Chapter 5 - Kernel Density Parameters and CSV Datapath

  1. kde = KernelDensity(bandwidth, kernel='gaussian') gives:

TypeError: init() takes 1 positional argument but 2 positional arguments (and 1 keyword-only argument) were given

Suggest:

kde = KernelDensity(bandwidth=bandwidth, kernel='gaussian')

  1. Suggest modify csv reader to include datasets file path

bid_ask = pd.read_csv('datasets/bid_ask.csv')

Chapter 9 - Stock Price Crash Measures

This section of code is not needed as crash_dataw is not referenced further:

std = crash_data.groupby('TICKER')['RET'].resample('W').std()
.reset_index()
crash_dataw['std'] = pd.DataFrame(std['RET'])

The count values are not used:

merge_all = merge_grouped.groupby('TICKER')
.resample('1Y').agg({'down':['sum', 'count'],
'up':['sum', 'count']})
.reset_index()
merge_all.head()

Dataset missing

I am trying to rewrite the code and find that bs_v.3 and Fraudtrain data missing.
Does anyone know where to find the missing data? Thanks in advance

Chapter 10. cannot import ctgan

As below error message, ctgan library cannot import CTGANSynthesizer

This code was exercised in Google colab and the same error occurred in my Anaconda (python version = 3.8)


from ctgan import CTGANSynthesizer

ctgan = CTGANSynthesizer(epochs=10)
ctgan.fit(california_housing_df)
synt_sample = ctgan.sample(len(california_housing_df))


ImportError Traceback (most recent call last)
in <cell line: 1>()
----> 1 from ctgan import CTGANSynthesizer
2
3 ctgan = CTGANSynthesizer(epochs=10)
4 ctgan.fit(california_housing_df)
5 synt_sample = ctgan.sample(len(california_housing_df))

ImportError: cannot import name 'CTGANSynthesizer' from 'ctgan' (/usr/local/lib/python3.10/dist-packages/ctgan/init.py)


NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

Chapter 3

  1. Had to install tensorflow and yfinance

  2. Erroneous >

model.add(Flatten())>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.