Code Monkey home page Code Monkey logo

equitycharacteristics's Introduction

Contact

Version

Academic Background

For financial researches, we need equity characteristics. This repository is a toolkit to calculate asset characteristics in individual equity level and portfolio level.

Prerequisite

  • Read the listed papers
  • WRDS account with subscription to CRSP, Compustat and IBES.
  • Python

Files

Main Files

  • accounting_60_hxz.py -- most annual, quarterly and monthly frequency characteristics
  • functions.py -- impute and rank functions
  • merge_chars.py -- merge all the characteristics from different pickle file into one pickle file
  • impute_rank_output_bchmk.py -- impute the missing values and standardize raw data
  • iclink.py -- preparation for IBES
  • pkl_to_csv.py -- converge the pickle file to csv

Single Characteristic Files

  • beta.py -- 3 months rolling CAPM beta
  • rvar_capm.py, rvar_ff3.py -- residual variance of CAPM and fama french 3 factors model, rolling window is 3 months
  • rvar_mean.py -- variance of return, rolling window is 3 months
  • abr.py -- cumulative abnormal returns around earnings announcement dates
  • myre.py -- revisions in analysts’ earnings forecasts
  • sue.py -- unexpected quarterly earnings
  • ill.py -- illiquidity, rolling window is 3 months
  • maxret_d.py -- maximum daily returns, rolling window is 3 months
  • std_dolvol.py -- std of dollar trading volume, rolling window is 3 months
  • std_turn.py -- std of share turnover, rolling window is 3 months
  • bid_ask_spread.py -- bid-ask spread, rolling window is 3 months
  • zerotrade.py -- number of zero-trading days, rolling window is 3 months

How to use

  1. run accounting_60_hxz.py
  2. run all the single characteristic files (you can run them in parallel)
  3. run merge_chars.py
  4. run impute_rank_output_bckmk.py (you may want to comment the part of sp1500 in this file if you just need the all stocks version)

Outputs

Data

The date range is 1972 to 2019. The stock universe is top 3 exchanges (NYSE/AMEX/NASDAQ) in US.

The currant time of data is $ret_t = chars_{t-1}$

  1. chars_raw_no_impute.feather (all data with original missing value)
  2. chars_raw_imputed.feather (impute missing value with industry median/mean value)
  3. chars_rank_no_imputed.feather (standardize chars_raw_no_impute.pkl)
  4. chars_rank_imputed.feather (standardize chars_raw_imputed.pkl)

Information Variables:

  • stock indicator: gvkey, permno
  • time: datadate, date, year ('datadate' is the available time for data and 'date' is the date of return)
  • industry: sic, ffi49
  • exchange info: exchcd, shrcd
  • return: ret (we also provide original return and return without dividend, you can keep them by modifing impute_rank_output_bchmk.py)
  • market equity: me/rank_me

Method

Equity Characteristics

This topic is summaried by Green Hand Zhang and Hou Xue Zhang.

Portfolio Characteristics

Portfolio charactaristics is the equal-weighted / value-weighted averge of the characteristics for all equities in the portfolio.

The portfolios includes and not limited to:

Reference

Papers

Many papers contribute a lot to this repository. I am very sorry for only listing the following papers.

Codes

All comments are welcome.

equitycharacteristics's People

Contributors

ericma4 avatar velonisa avatar xinhe97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

equitycharacteristics's Issues

Out of memory when running chars60/beta.py

When running python char60/beta.py, my machine runs for about 37 minutes before running out of memory and crashing with the following std output in my Ubuntu 22 Terminal:

[1] 7949 killed python char60/beta.py

I am not sure if this is a memory-related error, but some aspects make this seem the most likely conclusion to me:

  1. The memory usage of Python after about 10 minutes is still only 9.5 GB and keeps increasing.
  2. After about 15-20 minutes, no network traffic is detected anymore, so I assume that the call to WRDS was successful. CPU usage increases significantly when network traffic falls to zero.
  3. Memory usage keeps increasing steadily, then hovers around 24 GB after about 30 minutes.

Questions

A) I'm using a computer with 32 GB RAM - is this too little for running the scripts from this repository?
B) Am I doing something wrong when running this script?
C) Should I even run the script or should I only run pychars/beta.py?
D) If this is a bug, would you like me to look into it? In issue #11, williamjin1992 mentions that the script is unnecessarily slow?

Thank you!

Possible faster method for calculating Beta

In the file of char60/beta.py, I noticed that there is a TODO for a faster way to get rolling Beta estimation, the original function get_beta uses the matrix operation of the OLS formula to estimate Beta, I think that the covariance representation can be faster, here is my suggestion:
def_get_beta_VCV(df): temp = crsp.loc[df.index,:] vcv = temp.loc[:,['exret','mktrf]].cov(min_periods = None, ddof = 1).values beta = vcv[0,1]/vcv[1,1] return beta
I've tested it on my computer, and it is faster than the matrix operation approach.

Add a Constant to allow clearer indication where data is written

The output from char files are written to the current directory. This may not be desirable if the directory is networked connected or sync'ed across multiple computers.

Proposed solution: add a constant at the top of affected files as follows:

# Output directory. Usually use the current directory as './' or 'c:/temp/' as an example
OUT_DIR = 'c:/temp/'

Change code to use OUT_DIR

    with open(OUT_DIR + 'zerotrade.feather', 'wb') as f:
        feather.write_feather(crsp, f)

Files and lines impacted:

abr.py:238:with open('abr.feather', 'wb') as f:
accounting_100.py:1637:with open('chars_a_60.pkl', 'wb') as f:
accounting_100.py:1640:with open('chars_q_60.pkl', 'wb') as f:
accounting_60_hxz.py:1231:with open('chars_a_60.feather', 'wb') as f:
accounting_60_hxz.py:1234:with open('chars_q_60.feather', 'wb') as f:
accounting_60.py:1214:with open('chars_a_60.feather', 'wb') as f:
accounting_60.py:1217:with open('chars_q_60.feather', 'wb') as f:
beta.py:181:with open('beta.feather', 'wb') as f:
bid_ask_spread.py:160:with open('baspread.feather', 'wb') as f:
feather_to_csv.py:5:# with open('chars60_raw_imputed.feather', 'rb') as f:
feather_to_csv.py:8:with open('chars60_rank_imputed.feather', 'rb') as f:
iclink.py:243:with open('iclink.feather', 'wb') as f:
ill.py:174:with open('ill.feather', 'wb') as f:
impute_rank_output_bchmk_60.py:11:with open('chars_q_raw.feather', 'rb') as f:
impute_rank_output_bchmk_60.py:19:with open('chars_a_raw.feather', 'rb') as f:
impute_rank_output_bchmk_60.py:96:with open('chars60_raw_no_impute.feather', 'wb') as f:
impute_rank_output_bchmk_60.py:118:with open('chars60_raw_imputed.feather', 'wb') as f:
impute_rank_output_bchmk_60.py:131:with open('chars60_rank_no_impute.feather', 'wb') as f:
impute_rank_output_bchmk_60.py:143:with open('chars60_rank_imputed.feather', 'wb') as f:
impute_rank_output_bchmk_60.py:150:# with open('/home/jianxinma/chars/data/sp1500_impute_benchmark.feather', 'rb') as f:
impute_rank_output_bchmk_60.py:160:# with open('sp1500_impute_60.feather', 'wb') as f:
impute_rank_output_bchmk_60.py:166:# with open('sp1500_rank_60.feather', 'wb') as f:
maxret_d.py:158:with open('maxret.feather', 'wb') as f:
merge_chars_60.py:9:with open('chars_a_60.feather', 'rb') as f:
merge_chars_60.py:17:with open('beta.feather', 'rb') as f:
merge_chars_60.py:27:with open('rvar_capm.feather', 'rb') as f:
merge_chars_60.py:37:with open('rvar_mean.feather', 'rb') as f:
merge_chars_60.py:47:with open('rvar_ff3.feather', 'rb') as f:
merge_chars_60.py:57:with open('sue.feather', 'rb') as f:
merge_chars_60.py:67:with open('myre.feather', 'rb') as f:
merge_chars_60.py:77:with open('abr.feather', 'rb') as f:
merge_chars_60.py:87:with open('baspread.feather', 'rb') as f:
merge_chars_60.py:97:with open('maxret.feather', 'rb') as f:
merge_chars_60.py:107:with open('std_dolvol.feather', 'rb') as f:
merge_chars_60.py:117:with open('ill.feather', 'rb') as f:
merge_chars_60.py:127:with open('std_turn.feather', 'rb') as f:
merge_chars_60.py:137:with open('zerotrade.feather', 'rb') as f:
merge_chars_60.py:148:with open('chars_a_raw.feather', 'wb') as f:
merge_chars_60.py:155:with open('chars_q_60.feather', 'rb') as f:
merge_chars_60.py:163:with open('beta.feather', 'rb') as f:
merge_chars_60.py:173:with open('rvar_capm.feather', 'rb') as f:
merge_chars_60.py:183:with open('rvar_mean.feather', 'rb') as f:
merge_chars_60.py:193:with open('rvar_ff3.feather', 'rb') as f:
merge_chars_60.py:203:with open('sue.feather', 'rb') as f:
merge_chars_60.py:213:with open('myre.feather', 'rb') as f:
merge_chars_60.py:223:with open('abr.feather', 'rb') as f:
merge_chars_60.py:233:with open('baspread.feather', 'rb') as f:
merge_chars_60.py:243:with open('maxret.feather', 'rb') as f:
merge_chars_60.py:253:with open('std_dolvol.feather', 'rb') as f:
merge_chars_60.py:263:with open('ill.feather', 'rb') as f:
merge_chars_60.py:273:with open('std_turn.feather', 'rb') as f:
merge_chars_60.py:283:with open('zerotrade.feather', 'rb') as f:
merge_chars_60.py:294:with open('chars_q_raw.feather', 'wb') as f:
myre.py:23:with open('iclink.feather', 'rb')as f:
myre.py:120:with open('myre.feather', 'wb') as f:
rvar_capm.py:185:with open('rvar_capm.feather', 'wb') as f:
rvar_ff3.py:218:with open('rvar_ff3.feather', 'wb') as f:
rvar_mean.py:167:with open('rvar_mean.feather', 'wb') as f:
std_dolvol.py:158:with open('std_dolvol.feather', 'wb') as f:
std_turn.py:158:with open('std_turn.feather', 'wb') as f:
sue.py:106:with open('sue.feather', 'wb') as f:
zerotrade.py:161:with open('zerotrade.feather', 'wb') as f:

Characteristic .py files too many connections

A connection to WRDS is created in the main Python process -- and also -- in each pool process. With a large pool e.g. 20, errors of a PostgreSQL too many database connections can occur. Other redundant code is also processed unnecessarily.

There is a test for name == 'main' and this also needs to be expanded to top and bottoms of code that only needs execution in the main process.

A judicious use of a conn.close() is also warranted to free the connection(s) for other characteristic runs in parallel.

Use of mp.Pool() Needs Correction.

An attempt in many characteristic .py files is to split dataframe and process according to a specific CPU situation.

For example zerotrade.py line 153:

if __name__ == '__main__':
    crsp = main(0, 1, 0.05)

This leads to zerotrade.py line 137:

zerotrade.py line 137:
    pool = mp.Pool()

However, Python documents the Pool() as worker processes to use. If processes is None then the number returned by [os.cpu_count()] is used.

This is inefficient (especially with debugging with limited cores and shorter SQL date ranges) when debugging or on large machines with large number of cores -- especially if the README.md advice is followed to run characteristic in parallel.

I show 107,616K Working Set and 703,800K Private Bytes for each of my 20 cores even if I change the main function to select 1 core.

Suggest changes to all impacted .py files. Add a constant at the top, call Pool() with an explicit value, change main() to clearer show intent.

# Number of CPU cores to use. Usually use 20 or more. 1 for debugging.
CPU_CORE_COUNT = 10
# ...
    pool = mp.Pool(CPU_CORE_COUNT)
# ...
   crsp = main(0, 1, 1/CPU_CORE_COUNT)

Files to be changed:
beta.py
bid_ask_spread.py
ill.py
maxret_d.py
rvar_capm.py
rvar_ff3.py
rvar_mean.py
std_dolvol.py
std_turn.py
zerotrade.py

Need a utility to show characteristic .feather stats

I propose a utility that will show any missing characteristic .feather files before a merge_chars is run.

In addition, a .feather file is capable of providing its own internal statistics on its rows, cols, and bytes.

The file is to be named char_file_stats.py. Output would be

WARNING: File ./abr.feather does not exist
WARNING: File ./sue.feather does not exist
file: baspread.feather rows: 4,547,622, cols: 3, bytes: 90,952,440
file: beta.feather rows: 4,631,954, cols: 3, bytes: 92,639,080
file: chars_a_60.feather rows: 2,969,208, cols: 90, bytes: 2,196,523,973
file: chars_q_60.feather rows: 2,697,915, cols: 62, bytes: 1,359,411,950

Many characteristics runs generate Pandas FutureWarnings

Many characteristics runs generate Pandas FutureWarnings. As an example, let look at beta.py.

Line 195 is
crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
and warns as
FutureWarning: Series.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html and https://www.geeksforgeeks.org/python-pandas-dataframe-bfill/ for more information.

When fillna is removed, the impact to EquityCharacteristics will be significant. A grep of fillna results in

abr.py:48:ccm['linkenddt'] = ccm['linkenddt'].fillna(pd.to_datetime('today'))
accounting_100.py:160:crsp['me'] = np.where(crsp['permno'] == crsp['permno'].shift(1), crsp['me'].fillna(method='ffill'), crsp['me'])
accounting_100.py:198:ccm['linkenddt'] = ccm['linkenddt'].fillna(pd.to_datetime('today'))
accounting_100.py:248:data_rawa['txditc'] = data_rawa['txditc'].fillna(0)
accounting_100.py:373:data_rawa['noa'] = ((data_rawa['at']-data_rawa['che']-data_rawa['ivao'].fillna(0))-
accounting_100.py:374: (data_rawa['at']-data_rawa['dlc'].fillna(0)-data_rawa['dltt'].fillna(0)-data_rawa['mib'].fillna(0)
accounting_100.py:375: -data_rawa['pstk'].fillna(0)-data_rawa['ceq'])/data_rawa['at_l1'])
accounting_100.py:582:data_rawa['ffi49'] = data_rawa['ffi49'].fillna(49)
accounting_100.py:1326:crsp_mom['dlret'] = crsp_mom['dlret'].fillna(0)
accounting_100.py:1327:crsp_mom['ret'] = crsp_mom['ret'].fillna(0)
accounting_100.py:1448:data_rawa['datadate'] = data_rawa.groupby(['permno'])['datadate'].fillna(method='ffill')
accounting_100.py:1449:data_rawa = data_rawa.groupby(['permno', 'datadate'], as_index=False).fillna(method='ffill')
accounting_100.py:1456:data_rawq['datadate'] = data_rawq.groupby(['permno'])['datadate'].fillna(method='ffill')
accounting_100.py:1457:data_rawq = data_rawq.groupby(['permno', 'datadate'], as_index=False).fillna(method='ffill')
accounting_60_hxz.py:158:crsp['me'] = np.where(crsp['permno'] == crsp['permno'].shift(1), crsp['me'].fillna(method='ffill'), crsp['me'])
accounting_60_hxz.py:196:ccm['linkenddt'] = ccm['linkenddt'].fillna(pd.to_datetime('today'))
accounting_60_hxz.py:247:data_rawa['txditc'] = data_rawa['txditc'].fillna(0)
accounting_60_hxz.py:363:data_rawa['noa'] = ((data_rawa['at']-data_rawa['che']-data_rawa['ivao'].fillna(0))-
accounting_60_hxz.py:364: (data_rawa['at']-data_rawa['dlc'].fillna(0)-data_rawa['dltt'].fillna(0)-data_rawa['mib'].fillna(0)
accounting_60_hxz.py:365: -data_rawa['pstk'].fillna(0)-data_rawa['ceq'])/data_rawa['at_l1'])
accounting_60_hxz.py:574:data_rawa['ffi49'] = data_rawa['ffi49'].fillna(49)
accounting_60_hxz.py:1038:crsp_mom['dlret'] = crsp_mom['dlret'].fillna(0)
accounting_60_hxz.py:1039:crsp_mom['ret'] = crsp_mom['ret'].fillna(0)
accounting_60_hxz.py:1106:data_rawa['datadate'] = data_rawa.groupby(['permno'])['datadate'].fillna(method='ffill')
accounting_60_hxz.py:1108:data_rawa = data_rawa.groupby(['permno1', 'datadate1'], as_index=False).fillna(method='ffill')
accounting_60_hxz.py:1115:data_rawq['datadate'] = data_rawq.groupby(['permno'])['datadate'].fillna(method='ffill')
accounting_60_hxz.py:1117:data_rawq = data_rawq.groupby(['permno1', 'datadate1'], as_index=False).fillna(method='ffill')
accounting_60.py:158:crsp['me'] = np.where(crsp['permno'] == crsp['permno'].shift(1), crsp['me'].fillna(method='ffill'), crsp['me'])
accounting_60.py:196:ccm['linkenddt'] = ccm['linkenddt'].fillna(pd.to_datetime('today'))
accounting_60.py:247:data_rawa['txditc'] = data_rawa['txditc'].fillna(0)
accounting_60.py:363:data_rawa['noa'] = ((data_rawa['at']-data_rawa['che']-data_rawa['ivao'].fillna(0))-
accounting_60.py:364: (data_rawa['at']-data_rawa['dlc'].fillna(0)-data_rawa['dltt'].fillna(0)-data_rawa['mib'].fillna(0)
accounting_60.py:365: -data_rawa['pstk'].fillna(0)-data_rawa['ceq'])/data_rawa['at_l1'])
accounting_60.py:574:data_rawa['ffi49'] = data_rawa['ffi49'].fillna(49)
accounting_60.py:1021:crsp_mom['dlret'] = crsp_mom['dlret'].fillna(0)
accounting_60.py:1022:crsp_mom['ret'] = crsp_mom['ret'].fillna(0)
accounting_60.py:1089:data_rawa['datadate'] = data_rawa.groupby(['permno'])['datadate'].fillna(method='ffill')
accounting_60.py:1091:data_rawa = data_rawa.groupby(['permno1', 'datadate1'], as_index=False).fillna(method='ffill')
accounting_60.py:1098:data_rawq['datadate'] = data_rawq.groupby(['permno'])['datadate'].fillna(method='ffill')
accounting_60.py:1100:data_rawq = data_rawq.groupby(['permno1', 'datadate1'], as_index=False).fillna(method='ffill')
beta.py:56:crsp['dlret'] = crsp['dlret'].fillna(0)
beta.py:57:crsp['ret'] = crsp['ret'].fillna(0)
beta.py:80:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
bid_ask_spread.py:63:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
functions.py:708:def fillna_atq(df_q, df_a):
functions.py:733:def fillna_ind(df, method, ffi):
functions.py:765: df['%s' % na_column] = df['%s' % na_column].fillna(df['%s_mean' % na_column])
functions.py:768: df['%s' % na_column] = df['%s' % na_column].fillna(df['%s_median' % na_column])
functions.py:775:def fillna_all(df, method):
functions.py:805: df['%s' % na_column] = df['%s' % na_column].fillna(df['%s_mean' % na_column])
functions.py:808: df['%s' % na_column] = df['%s' % na_column].fillna(df['%s_median' % na_column])
functions.py:832: df = df.fillna(0)
ill.py:54:crsp['dlret'] = crsp['dlret'].fillna(0)
ill.py:55:crsp['ret'] = crsp['ret'].fillna(0)
ill.py:77:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
impute_rank_output_bchmk_60.py:105:df_impute['ffi49'] = df_impute['ffi49'].fillna(49) # we treat na in ffi49 as 'other'
impute_rank_output_bchmk_60.py:109:df_impute = fillna_ind(df_impute, method='median', ffi=49)
impute_rank_output_bchmk_60.py:111:df_impute = fillna_all(df_impute, method='median')
impute_rank_output_bchmk_60.py:112:df_impute['re'] = df_impute['re'].fillna(0) # re use IBES database, there are lots of missing data
maxret_d.py:61:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
rvar_capm.py:56:crsp['dlret'] = crsp['dlret'].fillna(0)
rvar_capm.py:57:crsp['ret'] = crsp['ret'].fillna(0)
rvar_capm.py:80:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
rvar_ff3.py:56:crsp['dlret'] = crsp['dlret'].fillna(0)
rvar_ff3.py:57:crsp['ret'] = crsp['ret'].fillna(0)
rvar_ff3.py:80:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
rvar_mean.py:47:crsp['dlret'] = crsp['dlret'].fillna(0)
rvar_mean.py:48:crsp['ret'] = crsp['ret'].fillna(0)
rvar_mean.py:71:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
std_dolvol.py:61:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
std_turn.py:61:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')
sue.py:47:ccm['linkenddt'] = ccm['linkenddt'].fillna(pd.to_datetime('today'))
zerotrade.py:61:crsp['month_count'] = crsp.groupby(['permno'])['month_count'].fillna(method='bfill')

Possible little bug & typo in accounting.py

Line 118, filling nan with 0

comp['xsga0'] = np.where(comp['xsga'].isnull, 0, 0)

might be

comp['xsga0'] = np.where(comp['xsga'].isnull(), 0, comp['xsga'])

Line 166-175, dealing with multiple "permno" under the same "permco"

crsp1 = pd.merge(crsp, crsp_maxme, how='inner', on=['monthend', 'permco', 'me'])

For there are a few different "permno" with the same "me", so merge function return a slightly lager DataFrame than expectation, though having little influence to empircal results, we should be caucious with merging on numeric columns (they are probably not unique).

I think following procedure may be better

crsp_summe = crsp.groupby(['monthend', 'permco'])['me'].sum().reset_index()
crsp1 = crsp.sort_values(by=['permco', 'monthend', 'me'], ascending=[True, True, False]).drop_duplicates(['monthend', 'permco'])
crsp1 = crsp1.drop(['me'], axis=1)
crsp2 = pd.merge(crsp1, crsp_summe, how='left', on=['monthend', 'permco'])

Line 231-235, deal with the duplicates, should drop the "temp" column or generate different temp columns like

data_rawa.loc[data_rawa.groupby(['datadate', 'permno', 'linkprim'], as_index=False).nth([0]).index, 'temp1'] = 1
data_rawa = data_rawa[data_rawa['temp1'].notna()]
data_rawa.loc[data_rawa.groupby(['permno', 'yearend', 'datadate'], as_index=False).nth([-1]).index, 'temp2'] = 1
data_rawa = data_rawa[data_rawa['temp2'].notna()]

if not, last two lines will filter nothing out.

README.md Update for char_file_stats.py

Add line to README.md in the How to use section after item #2 to read

  1. run char_file_stats.py and check for missing .feather file. (Run anytime to see .feather stats for rows, cols, bytes)

iclink.py fails to run with current version of Pandas

Near the end of the iclink.py file is this line:
iclink = _link1_2.append(_link2_3)

As of Pandas version 2.0, the Pandas append() function is deprecated. Use the Pandas pd.concat() function instead

A correction to
iclink = pd.concat([_link1_2, _link2_3])

Note that the concat() single parameter is a Python list.

Question regarding the README: Clarification on which files need to be run

I suggest a small change to the README so that it is clearer and easier to understand.

From reading the README, it is unclear what folders the executable files are located in.

I would therefore suggest adding the relative file path to the README.

Old README Section

  1. run accounting_60_hxz.py
  2. run all the single characteristic files (you can run them in parallel)
  3. run merge_chars.py
  4. run impute_rank_output_bckmk.py (you may want to comment the part of sp1500 in this file if you just need the all stocks version)

New README Section

  1. run char60/accounting_60_hxz.py by running python char60/accounting_60_hxz.py.
  2. run all the single characteristic files (you can run them in parallel) by running all files in the char60 folder.
  3. run pychars/merge_chars.py.
  4. run pychars/impute_rank_output_bckmk.py (you may want to comment the part of sp1500 in this file if you just need the all stocks version).

Next Steps

Should I make a PR with the above changes?

ValueError with misaligned shapes in pychars/beta.py

When running python pychars/beta.py, I get the following error:

/EquityCharacteristics/pychars/beta.py", line 59, in <module>
    crsp_temp = crsp.groupby('permno').rolling(rolling_window).apply(get_beta, raw=False)

[...]

/EquityCharacteristics/pychars/beta.py", line 54, in get_beta
    beta = (X.T.dot(M).dot(X)).I.dot((X.T.dot(M).dot(Y)))
ValueError: shapes (1,600) and (60,60) not aligned: 600 (dim 1) != 60 (dim 0)

Is this a known issue and can this be mitigated? Maybe I'm doing something wrong. Please notify me if you consider this a bug, I can try to find a solution and make a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.