quantopian / zipline Goto Github PK

Zipline, a Pythonic Algorithmic Trading Library

License: Apache License 2.0

Emacs Lisp 0.01% Shell 0.20% Python 95.74% Batchfile 0.12% Jupyter Notebook 3.80% PowerShell 0.08% Dockerfile 0.06%

quant python algorithmic-trading zipline

zipline's Introduction

Zipline is a Pythonic algorithmic trading library. It is an event-driven system for backtesting. Zipline is currently used in production as the backtesting and live-trading engine powering Quantopian -- a free, community-centered, hosted platform for building and executing trading strategies. Quantopian also offers a fully managed service for professionals that includes Zipline, Alphalens, Pyfolio, FactSet data, and more.

Join our Community!
Documentation
Want to Contribute? See our Development Guidelines

Features

Ease of Use: Zipline tries to get out of your way so that you can focus on algorithm development. See below for a code example.
"Batteries Included": many common statistics like moving average and linear regression can be readily accessed from within a user-written algorithm.
PyData Integration: Input of historical data and output of performance statistics are based on Pandas DataFrames to integrate nicely into the existing PyData ecosystem.
Statistics and Machine Learning Libraries: You can use libraries like matplotlib, scipy, statsmodels, and sklearn to support development, analysis, and visualization of state-of-the-art trading systems.

Installation

Zipline currently supports Python 2.7, 3.5, and 3.6, and may be installed via either pip or conda.

Note: Installing Zipline is slightly more involved than the average Python package. See the full Zipline Install Documentation for detailed instructions.

For a development installation (used to develop Zipline itself), create and activate a virtualenv, then run the etc/dev-install script.

Quickstart

See our getting started tutorial.

The following code implements a simple dual moving average algorithm.

from zipline.api import order_target, record, symbol

def initialize(context):
    context.i = 0
    context.asset = symbol('AAPL')


def handle_data(context, data):
    # Skip first 300 days to get full windows
    context.i += 1
    if context.i < 300:
        return

    # Compute averages
    # data.history() has to be called with the same params
    # from above and returns a pandas dataframe.
    short_mavg = data.history(context.asset, 'price', bar_count=100, frequency="1d").mean()
    long_mavg = data.history(context.asset, 'price', bar_count=300, frequency="1d").mean()

    # Trading logic
    if short_mavg > long_mavg:
        # order_target orders as many shares as needed to
        # achieve the desired number of shares.
        order_target(context.asset, 100)
    elif short_mavg < long_mavg:
        order_target(context.asset, 0)

    # Save values for later inspection
    record(AAPL=data.current(context.asset, 'price'),
           short_mavg=short_mavg,
           long_mavg=long_mavg)

You can then run this algorithm using the Zipline CLI. First, you must download some sample pricing and asset data:

$ zipline ingest
$ zipline run -f dual_moving_average.py --start 2014-1-1 --end 2018-1-1 -o dma.pickle --no-benchmark

This will download asset pricing data data sourced from Quandl, and stream it through the algorithm over the specified time range. Then, the resulting performance DataFrame is saved in dma.pickle, which you can load and analyze from within Python.

You can find other examples in the zipline/examples directory.

Questions?

If you find a bug, feel free to open an issue and fill out the issue template.

Contributing

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. Details on how to set up a development environment can be found in our development guidelines.

If you are looking to start working with the Zipline codebase, navigate to the GitHub issues tab and start looking through interesting issues. Sometimes there are issues labeled as Beginner Friendly or Help Wanted.

Feel free to ask questions on the mailing list or on Gitter.

Note

Please note that Zipline is not a community-led project. Zipline is maintained by the Quantopian engineering team, and we are quite small and often busy.

Because of this, we want to warn you that we may not attend to your pull request, issue, or direct mention in months, or even years. We hope you understand, and we hope that this note might help reduce any frustration or wasted time.

zipline's People

Contributors

Stargazers

Watchers

Forkers

snth andycwang pprett bf0 shawground milktrader jeff-lewis elektra58 sergshabal twmeggs decbis jonbaer rday appleeye frrp rymurr snoopying bhutley aichi dpflann thedangler burgalon mjhanninen bwhalley vhagerty shanbady downrightmike hughdbrown vibster daver76 gawecoti fjordan bshort jimtai portalninja hiroakip gitlisted lemonhall joskid bjorskog gutelius semantium louisr kkozel jbasu tlmaloney cottrell aidoom mklechan kevinktlin cruz-jaydev carlosnasillo aeppert mdengler antiface mofish9 drkthng tengteng wesm laravelbook wdaye kinsteronline dandrews johnhess steve21124 jkoelker postpcera jsalva andrewkittredge ericjohn pperezrubio pg1337 benmccann d1on timothydavidellis distagon falaina gul2u flashingpumpkin depassp johnhan1987 dmcmurtr jilott qs8607a csamuel bogdan-kulynych pokey909 codelurker wware chimenchen jmorris0x0 chinux23 c0mpsc1 cartesys theleaflet adam-m-mcelhinney sarvi andrebco chuongtranhong wzdf1982

zipline's Issues

Open source set_universe feature from quantopian.com

quantopian.com provides a set_universe function that allows an algorithm to select stocks based on a criterion, e.g. percentile range of dollar volume rankings.

Example algorithm, https://www.quantopian.com/posts/a-very-basic-set-universe-example-to-refer-to-with-a-zipline-issue

The majority of the code that provides set_universe is currently within the internal/closed code that runs backtests on Quantopian. (With exception of the supplemental_data object within batch transforms, which handles an edge case of set_universe)

Besides providing the modules that load the data for a universe, there will need to be some universe "creation" tools developed for standalone Zipline.
i.e. if the dollar volume rankings were to be the universe open sourced, there will need to be a corresponding way to create a local database/file that contains the rankings.

Can't install zipline with pip

Downloading/unpacking zipline
Downloading zipline-0.5.7.tar.gz (83Kb): 83Kb downloaded
Running setup.py egg_info for package zipline
Traceback (most recent call last):
File "", line 14, in
File "/home/peter/build/zipline/setup.py", line 32, in
LONG_DESCRIPTION = doc.rst
File "/usr/local/lib/python2.7/dist-packages/pandoc/core.py", line 35, in
(lambda x, fmt=fmt: cls._output(x, fmt)), # fget
File "/usr/local/lib/python2.7/dist-packages/pandoc/core.py", line 48, in _output
stdout=subprocess.PIPE
File "/usr/lib/python2.7/subprocess.py", line 679, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1259, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 14, in

File "/home/peter/build/zipline/setup.py", line 32, in

LONG_DESCRIPTION = doc.rst

File "/usr/local/lib/python2.7/dist-packages/pandoc/core.py", line 35, in

(lambda x, fmt=fmt: cls._output(x, fmt)), # fget

File "/usr/local/lib/python2.7/dist-packages/pandoc/core.py", line 48, in _output

stdout=subprocess.PIPE

File "/usr/lib/python2.7/subprocess.py", line 679, in init

errread, errwrite)

File "/usr/lib/python2.7/subprocess.py", line 1259, in _execute_child

raise child_exception

OSError: [Errno 2] No such file or directory

Add option to filter data through Winsorisation

Filter out extreme values which are assumed to be spurious because of their extremity.

As requested by Jessica Stauth on Quantopian forums, https://www.quantopian.com/posts/feature-requests-what-changes-would-you-like-to-see

Quoted from that post:

add an option to 'winsorise' returns for outlier handling - a notorious issue with backtests is
hidden outliers in returns data - sometimes they are obvious, you trade a stock and it makes
10,000% in 1 day (oops pricing error, currency issue etc) - but sometimes these errors can be
hidden. Winsorizing your returns data allows you to set sanity bounds on what returns you think
a stock can achieve, so you might say, clip my returns data at -99% and + 2 standard
deviations from the mean returns for that time period. Better explained here:
http://en.wikipedia.org/wiki/Winsorising

Regularly compile publish documentation.

Most likely to https://readthedocs.org/

Investigate: Nans in algorithm_returns

If, in risk.py, we change (line 586 and 589):

self.algorithm_returns = self.algorithm_returns_cont.valid()
self.benchmark_returns = self.benchmark_returns_cont.valid()

to:

self.algorithm_returns = self.algorithm_returns_cont[:dt]
self.benchmark_returns = self.benchmark_returns_cont[:dt]

to do explicit slicing rather than implicit using valid(), I get the following test error:

======================================================================
ERROR: test_risk_metrics_returns (tests.test_risk_compare_batch_iterative.RiskCompareIterativeToBatch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/whyking/working/projects/quant/zipline/tests/test_risk_compare_batch_iterative.py", line 104, in test_risk_metrics_returns
    self.all_benchmark_returns[todays_return_obj.date])
  File "/home/whyking/working/projects/quant/zipline/zipline/finance/risk.py", line 624, in update
    self.beta.append(self.calculate_beta()[0])
  File "/home/whyking/working/projects/quant/zipline/zipline/finance/risk.py", line 463, in calculate_beta
    eigen_values = la.eigvals(C)
  File "/home/whyking/envs/zipline/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 767, in eigvals
    _assertFinite(a)
  File "/home/whyking/envs/zipline/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 165, in _assertFinite
    raise LinAlgError, "Array must not contain infs or NaNs"
LinAlgError: Array must not contain infs or NaNs

Dropping into a debugger it seems that there is a NaN in front of the last dt which seems very odd:

>>> self.algorithm_returns
2006-01-03 00:00:00+00:00       NaN
2006-01-04 00:00:00+00:00    0.0093

Use numpy.vdot for performance.PerformancePeriod.calculate_positions_value

For a performance speed up, use numpy's vdot method to calculate the total position value.

On my machine, using the timeit module with a count of 1e5 times and 400 positions:

A function that mimics calculate_position_value takes 17.42 seconds.
A function that uses np.vdot on two np.arrays representing the amounts and last_sale_prices takes 0.25 seconds.

Compatibility with app.quantopian.com?

Are there plans to make this release of zipline compatible with app.quantopian.com's implementation? It'd be awesome to test and build strategies from the site against unsupported instruments like forex. Or even the ability to backtest real estate housing market data or Intrade quotes. Not to mention strategy development in a comfortable environment. Bringing the extra data available in data inside handle_data() would be welcome I'm sure.

Thoughts?

I'd post this on the zipline Google group but it isn't open for business just yet.

nosetests is failing

Running nosetests gives:

======================================================================
ERROR: Failure: ImportError (No module named nose_parameterized)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 390, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 86, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/bmccann/src/ziplineclean/tests/test_perf_tracking.py", line 19, in <module>
    from nose_parameterized import parameterized
ImportError: No module named nose_parameterized

----------------------------------------------------------------------
Ran 66 tests in 41.324s

FAILED (errors=1)

test_lse_calendar_vs_environment fails when cached yahoo data is more than a day old

Run at 23:22 US/Eastern:

FAIL: test_lse_calendar_vs_environment (tests.test_tradingcalendar.TestTradingCalendar)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jik/repo/zipline/tests/test_tradingcalendar.py", line 56, in test_lse_calendar_vs_environment
self.check_days(env_days, cal_days)
  File "/home/jik/repo/zipline/tests/test_tradingcalendar.py", line 63, in check_days
"{diff} should be empty".format(diff=diff)
AssertionError: <class 'pandas.tseries.index.DatetimeIndex'>
[2013-02-19 00:00:00]
Length: 1, Freq: None, Timezone: UTC should be empty
'1 != 0' = '%s != %s' % (safe_repr(1), safe_repr(0))
"<class 'pandas.tseries.index.DatetimeIndex'>\n[2013-02-19 00:00:00]\nLength: 1, Freq: None, Timezone: UTC should be empty" = self._formatMessage("<class 'pandas.tseries.index.DatetimeIndex'>\n[2013-02-19 00:00:00]\nLength: 1, Freq: None, Timezone: UTC should be empty", '1 != 0')
>>  raise self.failureException("<class 'pandas.tseries.index.DatetimeIndex'>\n[2013-02-19 00:00:00]\nLength: 1, Freq: None, Timezone: UTC should be empty")

load_from_yahoo

load_from_yahoo might have a problem handling the last day(s) where data on Yahoo might not yet be available. As an example, consider the data file generated by the following snippet (as of today 9-Jan-2013 10:18 UTC)

start = dt.datetime(2010, 1, 1)
end = dt.datetime(2013, 1, 9) 
data = load_from_yahoo(stocks=['^GSPC'], start=start, end=end)
data.save('SPX 20100101 to 20130109.dat')

Reading the file and running some trading algorithm in zipline through

data = pd.load('SPX 20100101 to 20130109.dat')
print data
sqn = AAA()
results = aaa.run(data)

will produce the following chain of error messages:

Traceback (most recent call last):
  File "C:\xx\yy\zz.py", line 48, in <module>
    results = aaa.run(data)
  File "C:\Python27\lib\site-packages\zipline\algorithm.py", line 195, in run
    perfs = list(self.gen)
  File "C:\Python27\lib\site-packages\zipline\gens\tradesimulation.py", line 113, in simulate
    for message in performance_messages:
  File "C:\Python27\lib\site-packages\zipline\gens\tradesimulation.py", line 202, in transform
    for date, snapshot in stream_in:
  File "C:\Python27\lib\site-packages\zipline\finance\performance.py", line 218, in transform
    event.perf_message = self.process_event(event)
  File "C:\Python27\lib\site-packages\zipline\finance\performance.py", line 259, in process_event
    message = self.handle_market_close()
  File "C:\Python27\lib\site-packages\zipline\finance\performance.py", line 289, in handle_market_close
    self.todays_performance.returns)
  File "C:\Python27\lib\site-packages\zipline\finance\risk.py", line 444, in update
    self.all_benchmark_returns.pop(0).returns)
IndexError: pop from empty list

If you limit the time window to

end = dt.datetime(2013, 1, 7)

the error does not occur when processing the appropriate file 'SPX 20100101 to 20130107.dat'

The data files mentioned above are available on request.

Use 10 year treasury as benchmark

The 10 year treasury should be used as the benchmark. E.g. Investopedia states The 30-year Treasury used to be the bellwether U.S. bond but now most consider the 10-year Treasury to be the benchmark..

Besides the 10-year rate being the more common benchmark, it's also far easier to get better data for. Beginning February 18, 2002, Treasury ceased publication of the 30-year constant maturity series. On February 9, 2006, Treasury reintroduced the 30-year constant maturity. These 4 years of missing data makes it difficult to use as a benchmark. Right now zipline's logging is complaining vociferously about this missing data.

I would recommend using the daily 10-Year Treasury Constant Maturity Rate data available from the Federal Reserve on this page instead of the XML feed that is currently being used.

per trade commission rounding error

Repro case from quantopian.com: https://www.quantopian.com/posts/odd-behavior-olmar-algorithm-and-commissions
Bug is here: https://github.com/quantopian/zipline/blob/master/zipline/finance/commission.py#L61

Add Oct 29, 2012 to trading calendar rules.

Due to Hurricane Sandy, NYSE and Nasdaq closed trading for the day of Oct 29, 2012

http://www.forbes.com/sites/abrambrown/2012/10/29/nyse-nasdaq-to-shut-trading-hurricane-sandy/

Data source that uses a local csv file

Create a data source that makes it easier to load and use a csv file for OHLCV price data.

log (ln) returns in volatility calculation and annualization of metrics

It's common practice in quant finance to use log returns (natural logarithm) when calculating an assets volatility, See: http://en.wikipedia.org/wiki/Volatility_(finance)

Log returns are primarily used because asset price paths are usually modelled as following geometric Browniam motion, which avoids the problem of negative prices and brings other useful atributes. See: http://www.risklatte.com/Articles/QuantitativeFinance/QF79.php

From looking at the below it looks like simple daily returns are being used in the volatility function, which probably needs to be changed

def calculate_volatility(self, daily_returns):        
    return np.std(daily_returns, ddof=1) * math.sqrt(self.trading_days)

Also, I think

math.sqrt(self.trading_days)

in the volatility function also needs to be changed. It is common to present volatilities as annualised metrics, regardless of the periodicity used in calculating the returns. The function is using daily returns, therefore if we wish to present annualised volatility we should correctly use:

math.sqrt(number_of_trading_days_in_year)

where number_of_trading_days_in_year is usually assumed to be ~252, but can be calculated accurately with correct use of holiday calendars. Alternatively, if you wished the volatility to be a monthly volatility, it would similarly require the calculation to be:

math.sqrt(number_of_trading_days_in_month)

Likewise, if we had monthly returns, we could annualise them by multiplying by sqrt(12). In general, we can annualize interval standard deviations by multiplying by the square root of the number of intervals in a year:

Finally, this 'annualisation' is usually carried over into the calculation of Sharpe ratios. Therefore, (ignoring risk free rates) if we have daily returns and the stdev of these daily returns, the ratio needs to be multiplied by sqrt(252). If we have monthly returns and the stdev of these monthly returns, the ratio needs to be multiplied by sqrt(12).

Sharpe himself proposes this standardisation (to aid comparison) below point (10) here: http://www.stanford.edu/~wfsharpe/art/sr/sr.htm

Weird portfolio behaviour in latest zipline

In the latest zipline code the initial orders appear to get stuck in limbo. The following modifications to the olmar example shows it better than I could explain:

diff --git a/zipline/examples/olmar.py b/zipline/examples/olmar.py
index 9d05579..6f11c37 100644
--- a/zipline/examples/olmar.py
+++ b/zipline/examples/olmar.py
@@ -16,7 +16,8 @@ zipline_logging = logbook.NestedSetup([
 ])
 zipline_logging.push_application()

-STOCKS = ['AMD', 'CERN', 'COST', 'DELL', 'GPS', 'INTC', 'MMM']
+#STOCKS = ['AMD', 'CERN', 'COST', 'DELL', 'GPS', 'INTC', 'MMM']
+STOCKS = ['AAPL', 'GOOG']


 class OLMAR(TradingAlgorithm):
@@ -101,6 +102,10 @@ class OLMAR(TradingAlgorithm):
             positions_value = self.portfolio.positions_value + \
                 self.portfolio.cash

+        print self.datetime
+        print ' -> VALUE', positions_value
+        print ' -> CASH', self.portfolio.cash
+
         for i, stock in enumerate(self.stocks):
             current_amount[i] = self.portfolio.positions[stock].amount
             prices[i] = data[stock].price
@@ -153,12 +158,14 @@ def simplex_projection(v, b=1):

 if __name__ == '__main__':
     import pylab as pl
-    start = datetime(2004, 1, 1, 0, 0, 0, 0, pytz.utc)
-    end = datetime(2008, 1, 1, 0, 0, 0, 0, pytz.utc)
+    start = datetime(2010, 2, 9, 0, 0, 0, 0, pytz.utc)
+    end = datetime(2010, 2, 20, 0, 0, 0, 0, pytz.utc)
+    #start = datetime(2004, 1, 1, 0, 0, 0, 0, pytz.utc)
+    #end = datetime(2008, 1, 1, 0, 0, 0, 0, pytz.utc)
     data = load_from_yahoo(stocks=STOCKS, indexes={}, start=start,
                            end=end)
     data = data.dropna()
     olmar = OLMAR()
     results = olmar.run(data)
     results.portfolio_value.plot()
-    pl.show()
+    #pl.show()

The output from the modified example is

(py27)[kieran@smallbang zipline]$ PYTHONPATH=. python zipline/examples/olmar.py > olmar.log && head -20 olmar.log 
AAPL
GOOG
[2013-05-04 10:58] INFO: Transform: Running StatefulTransform [mavg]
2010-02-16 00:00:00+00:00
 -> VALUE 100000.0
 -> CASH 100000.0
2010-02-17 00:00:00+00:00
 -> VALUE 434.222783112
 -> CASH 434.222783112
2010-02-18 00:00:00+00:00
 -> VALUE 99855.8131957
 -> CASH 99312.5931957
[2013-05-04 10:58] INFO: Transform: Finished StatefulTransform [mavg]
2010-02-19 00:00:00+00:00
 -> VALUE 99542.8218917
 -> CASH 49857.8218917
[2013-05-04 10:58] INFO: Performance: Simulated 8 trading days out of 8.
[2013-05-04 10:58] INFO: Performance: first open: 2010-02-09 14:31:00+00:00
[2013-05-04 10:58] INFO: Performance: last close: 2010-02-19 21:00:00+00:00

Notice the portfolio value on the 17th after our first order. The orders have been placed, money has been taken from the cash account but the portfolio value isn't reflecting the purchase of the stocks. This mucks up the olmar algorithm as it incorrectly believes it only has ~ $400 to play with.

Note the same behaviour appears to occur for any selection of stocks and time period, it's just easier to see with a couple of stocks and shorter time-frame.

With the same setup zipline v0.5.9 produces:

(py27)[kieran@smallbang zipline]$ PYTHONPATH=. python zipline/examples/olmar.py > olmar.log && head -20 olmar.log 
AAPL
GOOG
[2013-05-04 11:07] INFO: Transform: Running StatefulTransform [mavg]
2010-02-16 00:00:00+00:00
 -> VALUE 100000.0
 -> CASH 100000.0
2010-02-17 00:00:00+00:00
 -> VALUE 100000.0
 -> CASH 784.28
2010-02-18 00:00:00+00:00
 -> VALUE 100555.54
 -> CASH -483.38
[2013-05-04 11:07] INFO: Transform: Finished StatefulTransform [mavg]
2010-02-19 00:00:00+00:00
 -> VALUE 100097.98
 -> CASH 50412.98
[2013-05-04 11:07] INFO: Performance: Simulated 8 trading days out of 8.
[2013-05-04 11:07] INFO: Performance: first open: 2010-02-09 14:30:00+00:00
[2013-05-04 11:07] INFO: Performance: last close: 2010-02-19 21:00:00+00:00

The portfolio value on the 17th reflects the full amount, $99,215.72 invested in stocks and $784.28 remaining in cash. (There's also the question of why the cash amount is different in both versions, note the order amounts placed by olmar on the 16th are the same in each case).

Is this intended, is there some new behaviour I'm not aware of?

html docs include mathjax.org call

I've built the docs for offline use, I can see there's a call to http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML

Can it be included in the repo?

Add support for bid/ask specification.

Currently zipline requires a price (and volume) field to be present for each sid. From this price we calculate the bid/ask spread when orders are placed.

The order function should be adapted so that when an bid and ask field are present it uses those as the price.

Return risk metrics

.run() should return risk metrics together with performance metrics.

Need unit tests that isolate tradesimulation module

When implementing d21b500 found it difficult to isolate the results emitted from tradesimulation.AlgorithmSimulator and check whether the extra daily_perf performance result was emitted.

We should consider adding unit tests for that module where events can be piped through without having to go through an entire algorithm.

Integrate rebalance_portfolio function

https://github.com/quantopian/quantopian-algos/blob/master/OLMAR.py#L59

implements a function to rebalance a portfolio. We probably want to include this in zipline.TradingAlgorithm for users to be able to call.

ToDo:

documentation
unittesting

Documentation and Understanding the Simulation Environment

Hi,

First off, let me say well done on a great project and for making this Open Source. I'm quite excited about the prospects this opens up and would like to use this for my own work and contribute where possible.

In order to assess whether it can meet my needs in a short enough time period I'm trying to understand how the Trade Simulator works to see if I can use it for my purposes. In particular I'd like to understand the class model and how the classes interact, i.e. who emits which messages and what exactly they contain.

However the documentation in the docs subdirectory still seems quite thin and out of date. For example I found a couple of references to zmq although I don't see it being used anywhere in the code. There are also some references to "QBT" which I can only assume is the previous name of zipline?

I have some experience with Sphinx and I am happy to document as I go along during my explorations. I just want to check that this isn't at odds with any of your own efforts as I noticed that there is a docs branch.

Also what would be the best way to interact with you guys? For example I'm wondering whether any of your developers are available on IRC or G+ for interactive feedback?

Add zipline.version string identifier.

Add tests to verify risk/performance on a simple algo

We have a bunch of unit tests that exercise performance and risk module objects, which is good, but we should add some tests that test the 'full' stack with a small set of event and pipe the events through the entire system and check the algorithm returns, etc.

The goal is to have a set of events with 'pen and paper'ed results, and derivations shown in the test.

batch_transform lagging behind by 1?

@jlowin reports that the batch_transform when calling the transform does not contain the most recent event which would obviously be a bug. Example:

"Let's say today's price is 100, yesterday's is 99 and the day before is 98. If I am running my batch transform today with a window size 2, I expect it to contain [99, 100] rather than [98, 99] so that I am running my calculation on the most recent information available at the time I run it."

It's also possible that the porting to the rolling window inadvertently fixed this. In fact, I'd be surprised if it existed there.

Leverage Control

Restricting borrowing on cash is also possible. I lean toward giving the algorithm access to the data and leaving responsibility for such controls to the algorithm, but I could be convinced of the convenience of a set_leverage_limit function.

from https://www.quantopian.com/posts/my-overdraft

My initial thoughts were to follow the commission/slippage pattern and add zipline/finance/leverage.py with say PercentLeverage and AbsoluteLeverage.

Leverage control would then work in a way similar to VolumeShareSlippage.simulate - checking available cash and returning None if the portfolio had insufficient funds.

Is this likely to work, though?

The event from the transaction is partially processed in creating the event (1) and then completed in PerformancePeriod.execute_transaction (2).

At (1) the original PerformancePeriod object - with canonical cash balance - is not available. At (2) canonical cash balance is available but is it too late to return None, as the event is already partially processed?

John

Zipline is not to be used as an installed python module?

While trying to reproduce the example on my local machine:

http://nbviewer.ipython.org/3962843/

I installed zipline from the github repo with

sudo pip install git+https://github.com/quantopian/zipline.git

which causes loader.py dump_benchmark() to try to write benchmark.msgpack in

'/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/zipline/data/benchmark.msgpack'

which fails of course with permission denied. It works as expected when run within a project but not installed in site libs. Can or should the msgpack files be stored in a configurable location?

Thank you for releasing this. :)

Warning issued: Risk: No rate within 1 trading day of end date

[2013-04-25 17:08] WARNING: Risk: No rate within 1 trading day of end date = 2006-02-08 00:00:00+00:00 and term = 30year. Using 2002-02-15 00:00:00+00:00. Check that date doesn't exceed treasury history range.

How to reproduce:
run dual_moving_average.py example.

Remove market_aware flags.

There is no good reason to set market_aware=False. I think the reason we have it is that it makes some unittests easier. However, a unittest should really test the real-world case.

ToDo: Remove market_aware everywhere and only use code for market_aware=True.

batch_transform not dropping nans when used with universe?

According to this thread https://www.quantopian.com/posts/help-w-slash-batch-transform-problems nans do not get dropped when a universe is present.

I was under the assumption that we cleverly backfill the events in that case. Is that not true?

Add streaming of treasury and benchmark

These will get picked up by EventWindow and saved in the queue.

Certain transforms (e.g. Sharpe) require these.

Implementation wise, maybe we can add a new generator that gets placed after the source that tags these on.

Issue left to resolve: how to make sure that they are in sync.

Add an option to turn off benchmark/treasury updating

So that Zipline can be run offline, i.e. so that a downloaded dataset can be used when there is no internet connection.

Without the option to disable benchmarks Zipline will currently crash on a socket exception while calling data.loader.update_benchmarks

Data source that uses a local hdf5 file

Analogous to #108, provide a wrapper for using a local hdf5 file(s) as a data source.

test_transforms fails for BatchTransformAlgorithm

While writing unit tests for a proposed new transform I stumbled over a potential need to update the unit test for the BatchTransformAlgorithm. The reason I raise this issue is that my own unit test comes up with a similar error: AttributeError

Where in the code do I have to make modifications to account for a new transform not previously in the code?

__init__.py for transforms
anywhere else?

Sorry if this sounds naïve...

======================================================================
ERROR: test_event_window (__main__.TestBatchTransform)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:/xxx/test_transforms.py", line 411, in test_event_window
    for data in algo.history_return_sid_filter[wl:]:
AttributeError: 'BatchTransformAlgorithm' object has no attribute 'history_return_sid_filter'

Add half days to trading calendar.

Add a halfday trading calendar to tradingcalendar module, for July 3rd, Friday after Thanksgiving, etc.

ImportError: No module named requests

>>python -c "import zipline"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/zipline/__init__.py", line 10, in <module>
    import data
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/__init__.py", line 1, in <module>
    import loader
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/loader.py", line 26, in <module>
    from treasuries import get_treasury_data
  File "/usr/local/lib/python2.7/dist-packages/zipline/data/treasuries.py", line 1, in <module>
    import requests
ImportError: No module named requests

Not sure if that is a stand alone package?

Better defaults for examples using load_from_yahoo

The examples show loading data using load_from_yahoo. The defaults are really strange though:

if start is None:
    start = pd.datetime(1993, 1, 1, 0, 0, 0, 0, pytz.utc)
if end is None:
    end = pd.datetime(2002, 1, 1, 0, 0, 0, 0, pytz.utc)

Why not load all the data up to today by default? The class comment says "Factory functions to prepare useful data for tests." If this function is meant to be used only in tests then perhaps we should rewrite the examples using pandas.io.data.DataReader instead of load_from_yahoo so that we get more sensible examples.

Add unit test covering intraday performance with AlgorithmSimulator

Commit eb42d4b fixes a bug when
AlgorithmSimulator uses an emission_rate of 'minute'.

The test coverage for emission_rate only covered the use of emission_rate
directly in PerformanceTracker.

There should be a test that runs an AlgorithmSimulator with a
'minute' emission_rate, so that the integration between PeformanceTracker
and AlgorithmSimulator in that mode is covered.

integrate with yinhm/datafeed?

see comments from @trbck in this issue for datafeed: yinhm/datafeed#1

could be a very useful datasource in zipline.

Add short codes prefixes for commits to style guide

As recommended by Fabian Braennstroem on the Google group.

Add to the style guide the inclusion of prefixes to git commits for the commit "type".

Here's the list that SciPy/NumPy uses: (Pulled from http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html)

API: an (incompatible) API change
BLD: change related to building numpy
BUG: bug fix
DEP: deprecate something, or remove a deprecated object
DEV: development tool or utility
DOC: documentation
ENH: enhancement
MAINT: maintenance commit (refactoring, typos, etc.)
REV: revert an earlier commit
STY: style fix (whitespace, PEP8)
TST: addition or modification of tests
REL: related to releasing numpy

Is there a standard out there that other projects are settling on?

Move quantopian.com's algorithm proxy class to zipline

Currently, it's fairly onerous to go back and forth between a zipline algorithm using the TradingAlgorithm class and the 'scriptable' style of algorithm provided by the interface in https://www.quantopian.com/

To alleviate this pain, we could move the algorithm proxy class that is used internally by https://www.quantopian.com into zipline, so that users of zipline in stand alone mode can write algorithms in the same style as the app's interface.

No issue

no issue

Cannot backtest before 1-2-1990

Edit zipline/examples/buyapple.py to start from an earlier date than the default:

data = load_from_yahoo(stocks=['AAPL'], indexes={}, start=datetime.datetime(1985, 1, 1, tzinfo=pytz.utc))

Then you will get:

Traceback (most recent call last):
  File "./buyapple.py", line 34, in <module>
    results = simple_algo.run(data)
  File "/home/bmccann/src/zipline/zipline/algorithm.py", line 249, in run
    perfs = list(self.gen)
  File "/home/bmccann/src/zipline/zipline/gens/tradesimulation.py", line 134, in simulate
    for message in performance_messages:
  File "/home/bmccann/src/zipline/zipline/gens/tradesimulation.py", line 258, in transform
    for date, snapshot in stream:
  File "/home/bmccann/src/zipline/zipline/finance/performance.py", line 212, in transform
    messages = self.process_event(event)
  File "/home/bmccann/src/zipline/zipline/finance/performance.py", line 265, in process_event
    messages.append(self.handle_market_close())
  File "/home/bmccann/src/zipline/zipline/finance/performance.py", line 314, in handle_market_close
    self.todays_performance.returns)
  File "/home/bmccann/src/zipline/zipline/finance/risk.py", line 585, in update
self.end_date
  File "/home/bmccann/src/zipline/zipline/finance/risk.py", line 254, in choose_treasury
    search_dist = search_day_distance(end_date, prev_day)
  File "/home/bmccann/src/zipline/zipline/finance/risk.py", line 199, in search_day_distance
    assert tdd >= 0
    AssertionError

Time zone conversions, Delorean necessary?

I've looked at the various places where time zone conversions are being done, e.g.:

def get_next_trading_dt(current, interval):
    naive = current.replace(tzinfo=None)
    delo = Delorean(naive, pytz.utc.zone)
    ex_tz = trading.environment.exchange_tz
    next_dt = delo.shift(ex_tz).datetime

    while True:
        next_dt = next_dt + interval
        next_delo = Delorean(next_dt.replace(tzinfo=None), ex_tz)
        next_utc = next_delo.shift(pytz.utc.zone).datetime
        if trading.environment.is_market_hours(next_utc):
            break

    return next_utc

I'm wondering whether you might be better served using the pandas.Timestamp object and its tz_convert method. Notably adding timedeltas to Timestamp shifts the underlying UTC timestamp and respects DST transitions, etc. It would be faster and simpler than this mucky business with Delorean, and you can get rid of the extra library dependency.

Optional speedup performance of batch transform with an iterative version

The following is as described by @jkoelker in this forum post, https://www.quantopian.com/posts/iterative-batch-transforms

Batch Transforms are awesome for computing a series of values, but with large changing datasets performance can be quite spotty. If the transform can be written in an iterative fashion with EventWindows, performance is more consistent, but you lose the batch nature and speed when iterating over the list of securities updating all the windows. I've written a little decorator that combines the two so my algorithm doesn't have to keep track in the context or wherever the previous batch result. Sharing in case others might find it useful.

Problem running with data outside of provide benchmark date range

This is using a local copy of zipline instead of the site-packages one.

The data is OHCL and other indicator exported out as CSV from Metatrader 4. The timestamps are then munged to be the index similar to fast-data-mining-with-pytables-and-pandas.pdf and also localized

data = read_csv(data_file)
data['time'] = None
for i in data.index:
    data['time'][i] = datetime.strptime(data['Date'][i] + " " + data['Time'][i] + ":00", '%m-%d-%Y %H:%M:%S')

data.index = data['time']
del data['time']

data.index = tseries.index.DatetimeIndex(data=data.index).tz_localize('US/Eastern')

and when trying out the algo

class TestAlgo(TradingAlgorithm):
    def handle_data(self,data):
        print data

my_algo = TestAlgo()
results = my_algo.run(data)

this results in

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-b9b16310508f> in <module>()
     24 
     25 my_algo = TestAlgo()
---> 26 results = my_algo.run(data)
     27 print results.portfolio_value

/Users/michael/Downloads/PandasTutorialFiles/zipline/algorithm.pyc in run(self, source, start, end)
    177         # loop through simulated_trading, each iteration returns a
    178         # perf ndict
--> 179         perfs = list(self.gen)
    180 
    181         # convert perf ndict to pandas dataframe

/Users/michael/Downloads/PandasTutorialFiles/zipline/gens/tradesimulation.pyc in simulate(self, stream_in)
    113         # day.  It will also yield a risk report at the end of the
    114         # simulation.
--> 115         for message in performance_messages:
    116             yield message
    117 

/Users/michael/Downloads/PandasTutorialFiles/zipline/gens/tradesimulation.pyc in transform(self, stream_in)
    202             # Group together events with the same dt field. This depends on the
    203             # events already being sorted.
--> 204             for date, snapshot in groupby(stream_in, attrgetter('dt')):
    205                 # Set the simulation date to be the first event we see.
    206                 # This should only occur once, at the start of the test.

/Users/michael/Downloads/PandasTutorialFiles/zipline/finance/performance.pyc in transform(self, stream_in)
    217                 yield event
    218             else:
--> 219                 event.perf_message = self.process_event(event)
    220                 event.portfolio = self.get_portfolio()
    221                 del event['TRANSACTION']

/Users/michael/Downloads/PandasTutorialFiles/zipline/finance/performance.pyc in process_event(self, event)
    249 
    250         if(event.dt >= self.market_close):
--> 251             message = self.handle_market_close()
    252 
    253         if event.TRANSACTION:

/Users/michael/Downloads/PandasTutorialFiles/zipline/finance/performance.pyc in handle_market_close(self)
    278         #update risk metrics for cumulative performance
    279         self.cumulative_risk_metrics.update(
--> 280             self.todays_performance.returns, datetime.timedelta(days=1))
    281 
    282         # increment the day counter before we move markers forward.

/Users/michael/Downloads/PandasTutorialFiles/zipline/finance/risk.pyc in update(self, returns_in_period, dt)
    417         self.algorithm_volatility.append(
    418             self.calculate_volatility(self.algorithm_returns))
--> 419         self.treasury_period_return = self.choose_treasury()
    420         self.excess_returns.append(
    421             self.algorithm_period_returns[-1] - self.treasury_period_return)

/Users/michael/Downloads/PandasTutorialFiles/zipline/finance/risk.pyc in choose_treasury(self)
    344             term=self.treasury_duration
    345         )
--> 346         raise Exception(message)
    347 
    348 

Exception: no rate for end date = 2012-04-17 00:00:00-04:00 and term = 1month. Check         that date doesn't exceed treasury history range.

At this point the basic test does work using the local copy of zipline which was my sanity check.

[edit: iPython notebook's trackback is clearer]

Return panel from yahoo finance including OHLP.

Change number representations from float to decimal

We are currently using python floats for e.g. calculating commission costs which represents a problem for some strategies since python uses machine representation (e.g. 1.1 * 1.1 != 1.21 since python returns 1.2100000000000002). So in most financial applications floats are represented as ints or as an object of some class (Decimal or cDecimal in python).

As such, I think we should replace float computations to Decimal where possible. Outstanding concern are performance regressions.

Add a way to update benchmark data

Currently, the only way to update benchmark data is to delete the benchmark file contained in "$HOME/.zipline", so that the code that detects missing data pulls in the full data set up to the most recently available data.

One solution is to just check the latest date in the file and fill in the attempt to fill the data if it's not the previous market day.

Or we could add an utility script that can be run to update the data.