abdullahkarasan / mlfrm Goto Github PK

License: Other

Jupyter Notebook 86.92% Python 13.08%

mlfrm's Introduction

Machine Learning for Financial Risk Management with Python

This repository provides Python code and Jupyter Notebooks accompanying the Machine Learning for Financial Risk Management with Python book published by O'Reilly.

Buy the book on Amazon.

mlfrm's People

Contributors

Stargazers

Watchers

Forkers

kadhirgit samy176 duggar weiminye vidam1 gowrishankar89 tsanghan amarulhaq astefa69 arturoaguilarkn chesterai jrodriguez-data belle1306 andrey-laktionov carlosbarrod binzhango maxclchen bombaybanker dereckamesquita udexvinda optimalbrew anhcao286 ethan-eplee hongli-actsci ruixizhao joehenres kinjobek alex61236123 etiennelavignebe kristenorm arturo-kaxanuk chrysshawk dharmikethan gzh111 michal-bober syq23719034 meesokim johnxu88 mrspatbile gus338 richardfoltyn leechang-soo mish45 rabbyui raz-finance-org

mlfrm's Issues

Chapter 8 - Cost calculations

cost_log = conf_mat_log[0][1] * cost_fp + conf_mat_boost[1][0] *
cost_fn.mean() + conf_mat_log[1][1] * cost_tp
cost_dt = conf_mat_dt[0][1] * cost_fp + conf_mat_boost[1][0] *
cost_fn.mean() + conf_mat_dt[1][1] * cost_tp
cost_rf = conf_mat_rf[0][1] * cost_fp + conf_mat_boost[1][0] *
cost_fn.mean() + conf_mat_rf[1][1] * cost_tp

Should all 3 cost variables be using conf_mat_boost[1][0] or should it be:

cost_log = conf_mat_log[0][1] * cost_fp + conf_mat_log[1][0] *
cost_fn.mean() + conf_mat_log[1][1] * cost_tp
cost_dt = conf_mat_dt[0][1] * cost_fp + conf_mat_dt[1][0] *
cost_fn.mean() + conf_mat_dt[1][1] * cost_tp
cost_rf = conf_mat_rf[0][1] * cost_fp + conf_mat_rf[1][0] *
cost_fn.mean() + conf_mat_rf[1][1] * cost_tp

I had to use a higher memory machine (4 vCPU and 32GB vs 4 vCPU and 16GB) to enable the cost sensitive models to complete. Otherwise a memory allocation error was generated or the kernel crashed.

Amended 'from keras import regularizers' to import from tensorflow as 'from tensorflow.keras import regularizers'

Autoencoder training takes a while to complete 100 epochs.

Chapter 1

Had to install plotly:

!pip install plotly

Missing a closing bracket from

rand = np.random.rand(n_assets

3.. portfolio = np.array([port_return(np.random.randn(n_assets, i))
for i in range(1, 101)])

This appears to be varying the number of simulations from 1 to 100 for each of 100 experiments, should it be:

portfolio=np.array([port_return(np.random.randn(n_assets, n_simulation)) for i in range(1,101)])

Chapter 3 - Date range mismatch between prediction and test dataset

The prediction appears to be generated against the last n_steps of the training data (22/11 - 11/12) and then plotted against the date range of the test set (12/12 - 31/12)

These are the same length because the prediction is generated over n_steps = 13 which is same as the last 5% of the original 251 records selected for the test set.

Chaper 8 datasets

FraudTrain.txt file is missing

Chapter 7 - Ticker selection and window calculations

The rolling_five calculation seems to loop over the first rows of INTC values for each of the TICKERs regardless of the value of j. Should the subset of liq_data matching the TICKER value be selected as per:

rolling_five.append(liq_data[liq_data.TICKER == j][i:i+5].agg({'BIDLO': 'min',
'ASKHI': 'max',
'VOL': 'sum',
'SHROUT': 'mean',
'PRC': 'mean'}))

This same bug seems to be present in subsequent sections as well.

The last 4 values of the rolling window have 4,3,2 and 1 rows to calculate over, such that the last row is simply the values of BIDLO,ASKHI, VOL, SHROUT & PRC. If the modification above is included this occurs at the end of each set of rows per TICKERs as opposed to just once at the end of the file.
For the liq_ratio calculation the numerator is a sum of 5 sets of 'price X volume' calculations whereas the denominator is a single difference of means. As such the numerator-denominator ratio comes out 5 times greater than perhaps it should. Alternatively means could be used in the numerator as well as:

liq_ratio.append((liq_vol_all[liq_data.TICKER == j]['PRC'][i+1:i+6].mean() *
liq_vol_all[liq_data.TICKER == j]['VOL'][i+1:i+6].mean())/

When calculating the turnover ratio if the [liq_data.TICKER == j] modification suggested above is included the covariance calculation fails on row 233 as there are insufficient rows left for i, i+6 to be compared against i,i+5 as there are only 238 INTC rows in total.

The only fix i can suggest is to skip calculating the roll value for last 5 rows for each TICKER but this doesn't seem very acceptable as it leaves missing values in the dataset:

for j in liq_vol_all.TICKER.unique():
for i in range(len(liq_vol_all[liq_vol_all.TICKER == j])-5):

On Lhh calculation we have:

(liq_vol_all[liq_data.TICKER == j]['VOL'][i:i+5].sum() /

As a suggestion to ensure the last few rows are of the same magnitude as the previous values as fewer rows are available as we come to the end of the dataset an option might be:

((liq_vol_all[liq_data.TICKER == j]['VOL'][i:i+5].mean()*5) /

I stopped here with this chapter based on my comments above

Chapter 5 - Kernel Density Parameters and CSV Datapath

kde = KernelDensity(bandwidth, kernel='gaussian') gives:

TypeError: init() takes 1 positional argument but 2 positional arguments (and 1 keyword-only argument) were given

Suggest:

kde = KernelDensity(bandwidth=bandwidth, kernel='gaussian')

Suggest modify csv reader to include datasets file path

bid_ask = pd.read_csv('datasets/bid_ask.csv')

Chapter 9 - Stock Price Crash Measures

This section of code is not needed as crash_dataw is not referenced further:

std = crash_data.groupby('TICKER')['RET'].resample('W').std()
.reset_index()
crash_dataw['std'] = pd.DataFrame(std['RET'])

The count values are not used:

merge_all = merge_grouped.groupby('TICKER')
.resample('1Y').agg({'down':['sum', 'count'],
'up':['sum', 'count']})
.reset_index()
merge_all.head()

I am the Chinese translator of your book

Hi Abdullah,

I am the Chinese translator of your book.
This is my LinkedIn profile:
https://www.linkedin.cn/injobs/in/weimin-ye-9203806

In order to get the best quality of the translation, could you add me via Linkedin and we can communicate more?

Dataset missing

I am trying to rewrite the code and find that bs_v.3 and Fraudtrain data missing.
Does anyone know where to find the missing data? Thanks in advance

Chapter 10. cannot import ctgan

As below error message, ctgan library cannot import CTGANSynthesizer

This code was exercised in Google colab and the same error occurred in my Anaconda (python version = 3.8)

from ctgan import CTGANSynthesizer

ctgan = CTGANSynthesizer(epochs=10)
ctgan.fit(california_housing_df)
synt_sample = ctgan.sample(len(california_housing_df))

ImportError Traceback (most recent call last)
in <cell line: 1>()
----> 1 from ctgan import CTGANSynthesizer
2
3 ctgan = CTGANSynthesizer(epochs=10)
4 ctgan.fit(california_housing_df)
5 synt_sample = ctgan.sample(len(california_housing_df))

ImportError: cannot import name 'CTGANSynthesizer' from 'ctgan' (/usr/local/lib/python3.10/dist-packages/ctgan/init.py)

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

Chapter 3

Had to install tensorflow and yfinance
Erroneous >

model.add(Flatten())>