enzoampil / fastquant Goto Github PK
View Code? Open in Web Editor NEWfastquant — Backtest and optimize your ML trading strategies with only 3 lines of code!
License: MIT License
fastquant — Backtest and optimize your ML trading strategies with only 3 lines of code!
License: MIT License
I can add a financial network analysis example, following Dr. Legara's notebook.
We may need to add in our contributing procedure code formatter and checker tests e.g. using black and flake8, similar to this.
Add nlp
module to analyze output of disclosures
module
Examples:
A link to examples e.g. nbviewer can be added in README
Problem:
The current Bollinger Band Strategy is naive, since it simply treats the upper and lower bands as resistance and support lines, respectively.
Solution:
We can make this more robust by applying a trend dimension, that e.g. recommends a buy only in the additional case that a current uptrend exists, and vice versa.
For companies that have relevant hashtags, we can listen to the tweets about them which can serve as a financial indicator.
Please fix the bug report template below, specifically the 2nd to the last line:
- lightkurve version (e.g. 1.0b6): <-- We need to take note of fastquant versioning
to something like
- fastquant version (e.g. 1.0)
It'll be better if version can be printed via the stardard means
import fastquant
fastquant.__version__
but not sure how. May be adding a version.py
and importing in __init__
?
import fastquant
# insert code here ...
The old phisix API endpoint:
http://phisix-api2.appspot.com/stocks/
has been replaced with a new one:
http://1.phisix-api.appspot.com/stocks/
The fix is implemented in #58 .
Problem: it takes a while to pull disclosures from a lot of companies
Solution: store the data into a DB for easy access
quantopian, mlfinlab, etc offer advanced quant tools. Make sure to check which strategies/functionalities can be readily implemented here.
Currently the backtest
function only takes on one set of arguments at a time. In practice we actually want to run results on multiple combinations. (e.g. multiple possible values for slow and fast moving average).
Utilize cerebro.optstrategy
to perform the above (reference)
Plotting data takes a lot of steps and this can be painful for beginners:
from fastquant import get_stock_data
import pandas as pd
df = get_stock_data('JFC', '2018-01-01', '2019-01-01', format='dc')
#set dt as datetime object
df['dt'] = pd.to_datetime(df.dt)
#set dt as index
df = df.set_index('dt')
df.plot()
I propose to make dt column a pandas datetime object and set it as index by default in get_stock_data
so the above code can be simplified into:
from fastquant import get_stock_data
df = get_stock_data('JFC', '2018-01-01', '2019-01-01')
df.plot()
I think this should eventually live in a tutorial section in sphinx
webpage, but. this should be good for now. Related to #24 .
Some company disclosures have attachment(s) that provide more details, e.g.
https://edge.pse.com.ph/openDiscViewer.do?edge_no=d01aed5ca14a1ab20de8473cebbd6407
It is not urgent but useful to scrape in the future.
Please read through the references below and feel free to ask for help in the issues or slack channel!
Tutorials on the fastquant website:
https://enzoampil.github.io/fastquant-blog/
fastquant example notebooks:
https://nbviewer.jupyter.org/github/enzoampil/fastquant/tree/master/examples/
Intro to backtrader Reference:
https://algotrading101.com/learn/backtrader-for-backtesting/
Quickstart backtrader guide:
Test that each of the strategies in the module (classes inheriting from BaseStrategy
) will work if they're called by the backtest
function.
Current sample in README still reflects the old DCV default.
Check why investagrams returns empty html:
fastquant/fastquant/disclosures.py
Line 584 in 9286b55
Current dataset includes weekends and non-trading days and tags them as rows all with N/A values. Data could be treated either by removing the non-trading days, or filling these days with the last value.
The current implementation is naive since it triggers a buy or sell at the moment the band is exceeded. An implementation with allowance that's configurable would be preferable.
get_company_disclosures()
currently returns a dataframe. It is useful to have another function that downloads and parses its contents, e.g. a particular press release, by supplying say the document's Circular Number. Then it would be easy to run sentiment analysis, etc.
The url format of company disclosures in pse is cryptic:
e.g. https://edge.pse.com.ph/openDiscViewer.do?edge_no=8571ce07732abd9643ca035510b6ec2b
Posting a request to https://edge.pse.com.ph/announcements/form.do seems to return only tables, not links to the actual document.
What do you think?
Newspaper seems like a good tool to scrape and curate articles related to PSE-listed stocks.
I can imagine using a different tool to search for recent news related to a company and using newspaper
to scrape that article.
What do you think?
Edit:
ERROR: fastquant 0.1.2.5 has requirement pandas==0.25.3, but you'll have pandas 1.0.0 which is incompatible.
I just upgraded to pandas 1.0 and found out that fastquant package is not yet compatible with it. Not a severe issue since I can just rollback to 0.25 version and given pandas 1.0 is very new.
This makes it a lot easier to perform EDAs with numerical columns. E.g. age
.
Some notebooks are mature enough to be turned into a blog article. I propose to add a blog page in our gh-pages.
Here is a running list of articles:
I installed psequant using pip install psequant
and it worked alright.
When you pushed an update in get_company_disclosures()
, I tried to pip install psequant --upgrade
but it didn't seem to have the latest commit.
So I tried to clone your repo and install locally in develop mode using pip install -e .
so I have to git pull only to reflect the recent updates.
Doing so throwed an error:
Building wheel for psequant (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/jp/miniconda3/envs/py3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-mhaabhka/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-mhaabhka/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-tn2238pz --python-tag cp37
cwd: /tmp/pip-req-build-mhaabhka/
Complete output (11 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/psequant
copying psequant/psequant.py -> build/lib/psequant
copying psequant/__init__.py -> build/lib/psequant
running build_scripts
creating build/scripts-3.7
error: [Errno 21] Is a directory: 'psequant'
----------------------------------------
ERROR: Failed building wheel for psequant
I figured that the path to the script in setup.py should be psequant/psequant
instead. Since that script is still a template, i recommend to remove it in setup.py for the meantime .
As seen in edge query form, only the first 50 results are shown by default. We need to fix get_company_disclosures
to get the entire results.
@enzoampil can you check this? I really don't know how to solve this.
edge pse/ investagrams may change its api anytime. We should archive all disclosures information in a private database.
Will create functions that pull and listen to specified PSE related account:
Examples are the official PSE and stock brokerage accounts:
https://twitter.com/phstockexchange?lang=en
https://twitter.com/colfinancial?lang=en
https://twitter.com/firstmetrosec?lang=en
https://twitter.com/BPItrade
https://twitter.com/Philstocks_
https://twitter.com/itradeph
https://twitter.com/UTradePH
https://twitter.com/wealthsec
For COL, listening to #COLResearch specifically will filter to the analyst reports.
Note that acronyms preceded by a "$" are stock tickers, so we can use this to identify the company that a tweet is about (e.g. $MWC).
We need to implement smart caching for get_pse_data
, similar to load_disclosures
. The former only checks for exact match of filename of saved stock data before loading; otherwise it re-downloads everything from scratch even if there's only 1 day difference between old and new query. The latter finds any saved discloures data of that company and appends older and/or newer data depending on the query so no data is downloaded twice.
Disclosures can be found in PSE edge
I'm unable to run the "three line code" from lesson 1
Initially, installed fastquant in Anaconda terminal using
pip install git+git://github.com/enzoampil/fastquant.git
Proceeded to open my Jupyter notebook to run the following codes
from fastquant import get_pse_data
df = get_pse_data('JFC', '2018-01-01', '2019-01-01')
df.head()
Error showed
AssertionError Traceback (most recent call last)
in
1 from fastquant import get_pse_data
----> 2 df = get_pse_data('JFC', '2018-01-01', '2019-01-01')
3 df.head()C:\ProgramData\Anaconda3\lib\site-packages\fastquant\fastquant.py in get_pse_data(symbol, start_date, end_date, save, max_straight_nones, format)
404 )
405 else:
--> 406 cache = get_pse_data_cache(symbol=symbol)
407 cache = cache.reset_index()
408 # oldest_date = cache["dt"].iloc[0]C:\ProgramData\Anaconda3\lib\site-packages\fastquant\fastquant.py in get_pse_data_cache(symbol, cache_fp, update, verbose)
303 print("Loaded: ", cache_fp)
304 errmsg = "Cache does not exist! Try update=True"
--> 305 assert cache_fp.exists(), errmsg
306 df = pd.read_csv(cache_fp, index_col=0, header=[0, 1])
307 df.index = pd.to_datetime(df.index)AssertionError: Cache does not exist! Try update=True
Downloaded the "data" folder from fastquant repo and pasted in my installation directory
C:\ProgramData\Anaconda3\Lib\site-packages\fastquant
Re-ran my Jupyter notebook and re-ran
from fastquant import get_pse_data
df = get_pse_data('JFC', '2018-01-01', '2019-01-01')
df.head()
Same error message.
fastquant/fastquant/strategies.py
Line 536 in 36fc7ca
After running backtest, it'll be better to return the cerebro object in case user needs to access its properties (at least I do).
@enzoampil Do you agree? I will do this if so.
We can use the ff sources as alt data source.
get_stock_data(..., format='dcv')
throws error/ quits prematurely when start_date
input is much earlier than when data is available. This happens because we set 10 consecutive NaNs as threshold for escaping the loop. Ideal solution is to figure out if date inputs are wrong and prescribe the correct date, or just return available data even if far from either dates without raising an exception. I prefer the first solution so it notifies the user that requested data on date is unavailable.
Note: We can also refer to listing date column as earliest possible start_date
input especially for new companies enlisted in the last 5 years. Some companies are listed decades ago, but why data stock data is available only recently?
from fastquant import get_stock_data
df = get_stock_data("MAH", start_date="2018-01-01", end_date="2020-01-01", format="dcv")
throws error, whereas the one below does not
df = get_stock_data("MAH", start_date="2019-01-01", end_date="2020-01-01", format="dcv")
I found from data cache that MAH has data only from "2018-06-04". Probably a fix is to assert start_date
>first date entry in cache, if it exists and additionally add a note to user to use later dates when Exception arises. We may need to add a custom exception handler for this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.