ngyb / stocks Goto Github PK
View Code? Open in Web Editor NEWPrograms for stock prediction and evaluation
License: Apache License 2.0
Programs for stock prediction and evaluation
License: Apache License 2.0
Hi @yibinng,
I love reading your articles and your publication of time series prediction codes. All of them are wonderful and I enjoy playing with your projects very much.
While I was experimenting with XGBoost Stock Prediction, I've noticed in the Final Model section, the model.predict is taking in X_sample_scaled as input parameter. If you trace the assignment of X_sample_scaled (= test_scaled[features]), it is feature of the Test dataframe, which the associate label is the Test dataframe is what we intend to predict.
Since test_scaled[features] is derived from lag data, it means this is generated based on Test dataframe i.e. future data of dev/cv dataframe! It explains why the predicted data is lagged by actual data for few cycles.
How we could confirm this issue is by removing X_sample_scaled from the equation and solely relying on y_sample (= test[target]) as the golden result.
The fix is simple, in the final model, replace X_sample_scaled with X_cv_scaled, where X_cv_scaled is the present dataframe before Test dataframe was created in real scenario.
I'm seeing the prediction is similar to expected y_sample but with larger RMSE and MAPE values.
Please let me know if you think else.
My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.
I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.
The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.
With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).
I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.
Dear,
I am following you directions.
My data is 567 rows (train size = 400, Val_size=167, H=30)
When I reach step 140 in your notebook, ,i get multiple errors, due to the predictions falling in H (after Train_val_size) and do not have corresponding actual data.
another error I am getting is different dataframe size .
preds_list = forecast['yhat'][train_val_size:train_val_size+H]
print("For forecast horizon %d, predicting on day %d, date %s, the RMSE is %f" % (H, i, df['date'][i-1]+ timedelta(days = 1), get_rmse(df[i:i+H]['y'], preds_list)))
print("For forecast horizon %d, predicting on day %d, date %s, the mean MAPE is %f" % (H, i, df['date'][i-1]+ timedelta(days = 1), get_mape(df[i:i+H]['y'], preds_list)))
print("For forecast horizon %d, predicting on day %d, date %s, the mean MAE is %f" % (H, i, df['date'][i-1]+ timedelta(days = 1), get_mae(df[i:i+H]['y'], preds_list)))
your graph is showing that your predictions are falling within your actual data, and you are fine tuning the parameters accordingly.
Should the preds_list=forecast['yhat'][train size:train_val_size]?
Thank you for your time and consideration
When running
est_list = get_preds_mov_avg(df, 'adj_close', N_opt, 0, num_train+num_cv)
test['est' + '_N' + str(N_opt)] = est_list
print("RMSE = %0.3f" % math.sqrt(mean_squared_error(est_list, test['adj_close'])))
print("MAPE = %0.3f%%" % get_mape(test['adj_close'], est_list))
test.head()
The following warnings occur:
ipykernel_launcher.py:21: RuntimeWarning: invalid value encountered in less
ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
hello sir, when I try to run the algorithm with a different dataset I am having this error
[Errno 2] No such file or directory: './out/v6d_val_rmse_bef_tuning_2016-02-09.pickle'
the code line that creates this error is the following
results = defaultdict(list)
ests = {} # the predictions**
date_list = ['2016-01-04',
'2016-02-09',
'2016-06-07',
'2016-08-22',
'2016-11-07',
'2017-01-23',
'2017-04-10',
'2017-09-07',
'2017-11-29',
'2018-03-05',
'2018-05-07',
'2018-09-04']
for date in date_list:
results['date'].append(date)
results['val_rmse_bef_tuning'].append(pickle.load(open( "./out/v6d_val_rmse_bef_tuning_" + date + ".pickle", "rb")))
results['val_rmse_aft_tuning'].append(pickle.load(open( "./out/v6d_val_rmse_aft_tuning_" + date + ".pickle", "rb")))
results['test_rmse_bef_tuning'].append(pickle.load(open( "./out/v6d_test_rmse_bef_tuning_" + date + ".pickle", "rb")))
results['test_rmse_aft_tuning'].append(pickle.load(open( "./out/v6d_test_rmse_aft_tuning_" + date + ".pickle", "rb")))
results['test_mape_bef_tuning'].append(pickle.load(open( "./out/v6d_test_mape_bef_tuning_" + date + ".pickle", "rb")))
results['test_mape_aft_tuning'].append(pickle.load(open( "./out/v6d_test_mape_aft_tuning_" + date + ".pickle", "rb")))
results['test_mae_bef_tuning'].append(pickle.load(open( "./out/v6d_test_mae_bef_tuning_" + date + ".pickle", "rb")))
results['test_mae_aft_tuning'].append(pickle.load(open( "./out/v6d_test_mae_aft_tuning_" + date + ".pickle", "rb")))
ests[date] = pickle.load(open( "./out/v6d_test_est_aft_tuning_" + date + ".pickle", "rb"))
results = pd.DataFrame(results)
results_>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.