Code Monkey home page Code Monkey logo

skfolio / skfolio Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 82.0 102.96 MB

Python library for portfolio optimization built on top of scikit-learn

Home Page: https://skfolio.org

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
asset-allocation asset-management convex-optimization cvar-optimization cvxpy efficient-frontier hierarchical-clustering machine-learning portfolio portfolio-optimization quantitative-finance quantitative-investment risk-parity scikit-learn trading-strategies

skfolio's Introduction

Licence Codecov Black PythonVersion PyPi CI/CD Downloads Ruff Contribution Website

skfolio documentation skfolio

skfolio is a Python library for portfolio optimization built on top of scikit-learn. It offers a unified interface and tools compatible with scikit-learn to build, fine-tune, and cross-validate portfolio models.

It is distributed under the open source 3-Clause BSD license.

examples

Important links

Installation

skfolio is available on PyPI and can be installed with:

pip install -U skfolio

Dependencies

skfolio requires:

  • python (>= 3.10)
  • numpy (>= 1.23.4)
  • scipy (>= 1.8.0)
  • pandas (>= 1.4.1)
  • cvxpy (>= 1.4.1)
  • scikit-learn (>= 1.3.2)
  • joblib (>= 1.3.2)
  • plotly (>= 5.15.0)

Key Concepts

Since the development of modern portfolio theory by Markowitz (1952), mean-variance optimization (MVO) has received considerable attention.

Unfortunately, it faces a number of shortcomings, including high sensitivity to the input parameters (expected returns and covariance), weight concentration, high turnover, and poor out-of-sample performance.

It is well known that naive allocation (1/N, inverse-vol, etc.) tends to outperform MVO out-of-sample (DeMiguel, 2007).

Numerous approaches have been developed to alleviate these shortcomings (shrinkage, additional constraints, regularization, uncertainty set, higher moments, Bayesian approaches, coherent risk measures, left-tail risk optimization, distributionally robust optimization, factor model, risk-parity, hierarchical clustering, ensemble methods, pre-selection, etc.).

With this large number of methods, added to the fact that they can be composed together, there is a need for a unified framework with a machine learning approach to perform model selection, validation, and parameter tuning while reducing the risk of data leakage and overfitting.

This framework is built on scikit-learn's API.

Available models

  • Portfolio Optimization:
    • Naive:
      • Equal-Weighted
      • Inverse-Volatility
      • Random (Dirichlet)
    • Convex:
      • Mean-Risk
      • Risk Budgeting
      • Maximum Diversification
      • Distributionally Robust CVaR
    • Clustering:
      • Hierarchical Risk Parity
      • Hierarchical Equal Risk Contribution
      • Nested Clusters Optimization
    • Ensemble Methods:
      • Stacking Optimization
  • Expected Returns Estimator:
    • Empirical
    • Exponentially Weighted
    • Equilibrium
    • Shrinkage
  • Covariance Estimator:
    • Empirical
    • Gerber
    • Denoising
    • Detoning
    • Exponentially Weighted
    • Ledoit-Wolf
    • Oracle Approximating Shrinkage
    • Shrunk Covariance
    • Graphical Lasso CV
    • Implied Covariance
  • Distance Estimator:
    • Pearson Distance
    • Kendall Distance
    • Spearman Distance
    • Covariance Distance (based on any of the above covariance estimators)
    • Distance Correlation
    • Variation of Information
  • Prior Estimator:
    • Empirical
    • Black & Litterman
    • Factor Model
  • Uncertainty Set Estimator:
    • On Expected Returns:
      • Empirical
      • Circular Bootstrap
    • On Covariance:
      • Empirical
      • Circular bootstrap
  • Pre-Selection Transformer:
    • Non-Dominated Selection
    • Select K Extremes (Best or Worst)
    • Drop Highly Correlated Assets
  • Cross-Validation and Model Selection:
    • Compatible with all sklearn methods (KFold, etc.)
    • Walk Forward
    • Combinatorial Purged Cross-Validation
  • Hyper-Parameter Tuning:
    • Compatible with all sklearn methods (GridSearchCV, RandomizedSearchCV)
  • Risk Measures:
    • Variance
    • Semi-Variance
    • Mean Absolute Deviation
    • First Lower Partial Moment
    • CVaR (Conditional Value at Risk)
    • EVaR (Entropic Value at Risk)
    • Worst Realization
    • CDaR (Conditional Drawdown at Risk)
    • Maximum Drawdown
    • Average Drawdown
    • EDaR (Entropic Drawdown at Risk)
    • Ulcer Index
    • Gini Mean Difference
    • Value at Risk
    • Drawdown at Risk
    • Entropic Risk Measure
    • Fourth Central Moment
    • Fourth Lower Partial Moment
    • Skew
    • Kurtosis
  • Optimization Features:
    • Minimize Risk
    • Maximize Returns
    • Maximize Utility
    • Maximize Ratio
    • Transaction Costs
    • Management Fees
    • L1 and L2 Regularization
    • Weight Constraints
    • Group Constraints
    • Budget Constraints
    • Tracking Error Constraints
    • Turnover Constraints

Quickstart

The code snippets below are designed to introduce the functionality of skfolio so you can start using it quickly. It follows the same API as scikit-learn.

Imports

from sklearn import set_config
from sklearn.model_selection import (
    GridSearchCV,
    KFold,
    RandomizedSearchCV,
    train_test_split,
)
from sklearn.pipeline import Pipeline
from scipy.stats import loguniform

from skfolio import RatioMeasure, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.model_selection import (
    CombinatorialPurgedCV,
    WalkForward,
    cross_val_predict,
)
from skfolio.moments import (
    DenoiseCovariance,
    DetoneCovariance,
    EWMu,
    GerberCovariance,
    ShrunkMu,
)
from skfolio.optimization import (
    MeanRisk,
    NestedClustersOptimization,
    ObjectiveFunction,
    RiskBudgeting,
)
from skfolio.pre_selection import SelectKExtremes
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import BlackLitterman, EmpiricalPrior, FactorModel
from skfolio.uncertainty_set import BootstrapMuUncertaintySet

Load Dataset

prices = load_sp500_dataset()

Train/Test split

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Minimum Variance

model = MeanRisk()

Fit on Training Set

model.fit(X_train)

print(model.weights_)

Predict on Test Set

portfolio = model.predict(X_test)

print(portfolio.annualized_sharpe_ratio)
print(portfolio.summary())

Maximum Sortino Ratio

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.SEMI_VARIANCE,
)

Denoised Covariance & Shrunk Expected Returns

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=EmpiricalPrior(
        mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()
    ),
)

Uncertainty Set on Expected Returns

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    mu_uncertainty_set_estimator=BootstrapMuUncertaintySet(),
)

Weight Constraints & Transaction Costs

model = MeanRisk(
    min_weights={"AAPL": 0.10, "JPM": 0.05},
    max_weights=0.8,
    transaction_costs={"AAPL": 0.0001, "RRC": 0.0002},
    groups=[
        ["Equity"] * 3 + ["Fund"] * 5 + ["Bond"] * 12,
        ["US"] * 2 + ["Europe"] * 8 + ["Japan"] * 10,
    ],
    linear_constraints=[
        "Equity <= 0.5 * Bond",
        "US >= 0.1",
        "Europe >= 0.5 * Fund",
        "Japan <= 1",
    ],
)
model.fit(X_train)

Risk Parity on CVaR

model = RiskBudgeting(risk_measure=RiskMeasure.CVAR)

Risk Parity & Gerber Covariance

model = RiskBudgeting(
    prior_estimator=EmpiricalPrior(covariance_estimator=GerberCovariance())
)

Nested Cluster Optimization with Cross-Validation and Parallelization

model = NestedClustersOptimization(
    inner_estimator=MeanRisk(risk_measure=RiskMeasure.CVAR),
    outer_estimator=RiskBudgeting(risk_measure=RiskMeasure.VARIANCE),
    cv=KFold(),
    n_jobs=-1,
)

Randomized Search of the L2 Norm

randomized_search = RandomizedSearchCV(
    estimator=MeanRisk(),
    cv=WalkForward(train_size=252, test_size=60),
    param_distributions={
        "l2_coef": loguniform(1e-3, 1e-1),
    },
)
randomized_search.fit(X_train)

best_model = randomized_search.best_estimator_

print(best_model.weights_)

Grid Search on Embedded Parameters

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.VARIANCE,
    prior_estimator=EmpiricalPrior(mu_estimator=EWMu(alpha=0.2)),
)

print(model.get_params(deep=True))

gs = GridSearchCV(
    estimator=model,
    cv=KFold(n_splits=5, shuffle=False),
    n_jobs=-1,
    param_grid={
        "risk_measure": [
            RiskMeasure.VARIANCE,
            RiskMeasure.CVAR,
            RiskMeasure.VARIANCE.CDAR,
        ],
        "prior_estimator__mu_estimator__alpha": [0.05, 0.1, 0.2, 0.5],
    },
)
gs.fit(X)

best_model = gs.best_estimator_

print(best_model.weights_)

Black & Litterman Model

views = ["AAPL - BBY == 0.03 ", "CVX - KO == 0.04", "MSFT == 0.06 "]
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=BlackLitterman(views=views),
)

Factor Model

factor_prices = load_factors_dataset()

X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

model = MeanRisk(prior_estimator=FactorModel())
model.fit(X_train, y_train)

print(model.weights_)

portfolio = model.predict(X_test)

print(portfolio.calmar_ratio)
print(portfolio.summary())

Factor Model & Covariance Detoning

model = MeanRisk(
    prior_estimator=FactorModel(
        factor_prior_estimator=EmpiricalPrior(covariance_estimator=DetoneCovariance())
    )
)

Black & Litterman Factor Model

factor_views = ["MTUM - QUAL == 0.03 ", "VLUE == 0.06"]
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        factor_prior_estimator=BlackLitterman(views=factor_views),
    ),
)

Pre-Selection Pipeline

set_config(transform_output="pandas")
model = Pipeline(
    [
        ("pre_selection", SelectKExtremes(k=10, highest=True)),
        ("optimization", MeanRisk()),
    ]
)
model.fit(X_train)

portfolio = model.predict(X_test)

K-fold Cross-Validation

model = MeanRisk()
mmp = cross_val_predict(model, X_test, cv=KFold(n_splits=5))
# mmp is the predicted MultiPeriodPortfolio object composed of 5 Portfolios (1 per testing fold)

mmp.plot_cumulative_returns()
print(mmp.summary()

Combinatorial Purged Cross-Validation

model = MeanRisk()

cv = CombinatorialPurgedCV(n_folds=10, n_test_folds=2)

print(cv.get_summary(X_train))

population = cross_val_predict(model, X_train, cv=cv)

population.plot_distribution(
    measure_list=[RatioMeasure.SHARPE_RATIO, RatioMeasure.SORTINO_RATIO]
)
population.plot_cumulative_returns()
print(population.summary())

Recognition

We would like to thank all contributors behind our direct dependencies, such as scikit-learn and cvxpy, but also the contributors of the following resources that were a source of inspiration:

  • PyPortfolioOpt
  • Riskfolio-Lib
  • scikit-portfolio
  • microprediction
  • statsmodels
  • rsome
  • gautier.marti.ai

Citation

If you use skfolio in a scientific publication, we would appreciate citations:

Bibtex entry:

@misc{skfolio,
  author = {Delatte, Hugo and Nicolini, Carlo},
  title = {skfolio},
  year  = {2023},
  url   = {https://github.com/skfolio/skfolio}
}

skfolio's People

Contributors

8w9ag avatar carlonicolini avatar hugodelatte avatar lsattolo avatar matteoettam09 avatar microprediction avatar rriski avatar vacarme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skfolio's Issues

Differently Sized Series

Securities often have different lenghts, how would you deal with this? Thanks for the package.

Import Dependencies

Are there any reasons we are setting the import dependencies are set so high? For example, when installing skfolio alongside other libraries, it often leads to installation failures. Starting from Python 3.10, the following dependency specifications should work fine, adhering to minimal dependency versioning or minimal sufficient versioning. We can probably just test these to see if they work:

python = "^3.10"
numpy = ">=1.20"
pandas = ">=1.0"
plotly = ">=5.0"
scipy = ">=1.0"
scikit-learn = ">=1.0"

Date Matched Composition

With multiperiod walk-forward portfolio, we are able to grab the compositions over the multiple rebalancing periods. It would be great to have a date-matched composition i.e., portfolio weights, even if the weights repeat for multiple days, I can think of a dataframe days x assets, and then of course the NaN return dates will have zero weight for the respective asset. I have around 5 more suggestions to make, I will open a new issue for each. This is what the composition looks like currently.

image

maybe .portfolio_weights, .daily_weights something like that could distinguish from above.

prices_to_returns cuts off dataframe to last asset launch [BUG]

Describe the bug

prices_to_returns promises to take a dataframe of prices and convert them to returns. In the event I have a dataframe with returns with different starting times (such as VTI and BITI) this function will cut rows to the start of BITI.

To Reproduce

from skfolio.preprocessing import prices_to_returns
import yfinance as yf
import pandas as pd

STOCKS = ["VTI", "BITI"]
return_dfs = []
for ticker in STOCKS:
  ticker_yahoo = yf.Ticker(ticker)
  df = ticker_yahoo.history(period="1d", start="2010-01-01")
  df.index = df.index.date
  return_dfs.append(df[["Close"]].rename(columns={"Close": ticker}))
return_df = pd.concat(return_dfs, axis=1).sort_index().fillna(0.0)

prices_to_returns(return_df)

Expected behavior

Returns for BITI before 2022-06-21 are 0.0. Instead it cuts off the dataframe.

Additional context

This occurs due to the dropna. I believe this should use fillna with 0.0 after the pct_change call to mark the previous periods are 0.0 for returns. I would be more than happy to make this change just want to be sure this is considered the right move by the maintainers.

Versions

0.2.1

[ENH] Drop black for ruff embedded formatter.

Is your feature request related to a problem? Please describe.
FYI ruff now includes a formatter faster than black and with very similar behavior.

Describe the solution you'd like
Drop black in favor of ruff to slim skfolio's optional-dependencies.

Additional context
a format step could be added to the tests job to warn if code is not properly formatted.

[BUG] Portfolios number and avg nb of assets per portfolio" are wrong in the Population.summary() report.

Describe the bug
The last two lines "Portfolios number" and "Avg nb of assets per portfolio" report the wrong dimension in the resulting dataframe of the Population.summary() when objects resulting from a cross_val_predict are used to initialize the Population object.

To Reproduce

from skfolio.model_selection import WalkForward

from skfolio import RiskMeasure, RatioMeasure
from skfolio.optimization import MeanRisk, HierarchicalRiskParity, EqualWeighted
from skfolio.population import Population

from skfolio.model_selection import cross_val_predict

estimators = {
    "minvol": MeanRisk(risk_measure=RiskMeasure.VARIANCE, transaction_costs=0),
    "mad": MeanRisk(
        risk_measure=RiskMeasure.MEAN_ABSOLUTE_DEVIATION, transaction_costs=0
    ),
    "ew": EqualWeighted(),
    "hrp": HierarchicalRiskParity(risk_measure=RiskMeasure.VARIANCE,transaction_costs=0),
}

cv_preds = {}
for name, estimator in estimators.items():
    cv_preds[name] = cross_val_predict(
        estimator=estimator,
        X=train_returns,
        cv=WalkForward(test_size=252, train_size=9 * 252, expend_train=False),
        n_jobs=1,
    )

cv_pop = Population(list(cv_preds.values()))
cv_pop.summary()

Expected behavior
The "Avg nb of assets per portfolio" could be the average of the effective number of assets (like described in PR #17).
The portfolios number is not clear to me though.

Versions
0.0.9

MultiPeriodPortfolio Contribution

We currently have MultiPeriodPortfolio.composition which is helpful, it would be great if we could do the same for MultiPeriodPortfolio.contribution, which would allow us again to have something similar to composition over the different test sets,
image

Or as recommended in #69 (comment) a date type Contribution that rolls forward.

image

[BUG] Not sure the plot_train_test_folds 0-train 1-test encoding is correct in CombinatorialPurgedCV

Describe the bug
The title of the CombinatorialPurgedCV.plot_train_test_folds() is probably misleading.
Otherwise it happens that the portfolio is fitted with very small training sets and tested on very long ones.

To Reproduce

from skfolio.model_selection import CombinatorialPurgedCV
CombinatorialPurgedCV().plot_train_test_folds()

Expected behavior
I believe train is 1 and test is 0.

Additional context
image

Versions
0.1.1

I also did an analysis based on this encoding and the results are similar:
image

[BUG] IndexError when using HRP for a small set of indexes

Describe the bug

IndexError is raised when computing the maximum amount of clusters when the amount of columns in the returns dataframe is low.

│ /Users/xxx/dev/xxx/venv/lib/python3.11/site-packages/skfolio/optimization/cluster/hierar │
│ chical/_hrp.py:326 in fit                                                                        │
│                                                                                                  │
│   325 │   │                                                                                      │
│ ❱ 326 │   │   self.hierarchical_clustering_estimator_.fit(distance)                              │
│   327                                                                                            │
│                                                                                                  │
│ /Users/xxx/dev/xxx/venv/lib/python3.11/site-packages/skfolio/cluster/_hierarchical.py:19 │
│ 6 in fit                                                                                         │
│                                                                                                  │
│   195 │   │   if max_clusters is None:                                                           │
│ ❱ 196 │   │   │   max_clusters = compute_optimal_n_clusters(                                     │
│   197 │   │   │   │   distance=X,                                                                │
│                                                                                                  │
│ /Users/xxx/dev/xxx/venv/lib/python3.11/site-packages/skfolio/utils/stats.py:454 in       │
│ compute_optimal_n_clusters                                                                       │
│                                                                                                  │
│   453 │   for k in range(max_clusters):                                                          │
│ ❱ 454 │   │   level = cut_tree[:, n - k - 1]                                                     │
│   455 │   │   cluster_density = []                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: index -4 is out of bounds for axis 1 with size 3

To Reproduce

X_before.parquet.zip

import pandas as pd
from skfolio.optimization import HierarchicalRiskParity
from skfolio.moments import DenoiseCovariance
from skfolio.distance import CovarianceDistance
from skfolio import RiskMeasure

df = pd.read_parquet("X_before.parquet")
model = HierarchicalRiskParity(risk_measure=RiskMeasure.MEAN_ABSOLUTE_DEVIATION,portfolio_params={"name": "HRP-MAD-Ward-DenoisedPearson"},distance_estimator=CovarianceDistance(covariance_estimator=DenoiseCovariance()))
model.fit(df)

Expected behavior

I believe this comes from the code here which defines max clusters at a minimum as 8, whereas the amount of columns could be less than 8.

Additional context

If HRP shouldn't be used for less than 8 return columns this could be exposed and allowed to be checked for by the consumer to avoid catching the IndexError. Otherwise I would suggest taking off the 8 max and changing it to the length of the columns array.

Versions
0.2.1

[BUG] pandas plotting backend not set before plot call

Describe the bug

skfolio calls pandas plot to create its graphs. It sets the backend in places like this when skfolio is imported. However this may be changed by other operators, so when it comes time to plot issues like this can arise:

│ /Users/xxx/dev/xxx/venv/lib/python3.11/site-packages/skfolio/population/_population.py:6 │
│ 20 in plot_cumulative_returns                                                                    │
│                                                                                                  │
│   619 │   │   fig = df.plot()                                                                    │
│ ❱ 620 │   │   fig.update_layout(                                                                 │
│   621 │   │   │   title=title,                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Axes' object has no attribute 'update_layout'

In this case the backend has been changed to matplotlib by other libraries, while skfolio expects the backend to be plotly.

To Reproduce

from skfolio.datasets import load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.optimization import EqualWeighted
from skfolio import Population
import pandas as pd

prices = load_sp500_dataset()
X = prices_to_returns(prices)
model = EqualWeighted()
model.fit(X)
pop = Population([model.predict(X)])
pd.options.plotting.backend = "matplotlib"
pop.plot_cumulative_returns()

Expected behavior

No crash, during the plot_cumulative_returns call the backend is set to plotly and then set back again at the end of the call.

Additional context

I would be more than happy to make this change, just want to check that maintainers agree with the diagnosis.

Versions
0.2.1

GINI_MEAN_DIFFERENCE Risk Measure for MeanRisk model [BUG]

GINI_MEAN_DIFFERENCE Risk Measure for MeanRisk model not working. I have tried with SCIP and CLARABEL solvers. Looks like I am missing something. Appreciate any help. Thanks!

model = MeanRisk(
    risk_measure=RiskMeasure.GINI_MEAN_DIFFERENCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe"), 
    solver='CLARABEL'
)
model.fit(X)
model.weights_-->

[DOC] Type Hints Suggestion.

Describe the issue linked to the documentation

Let's see type hints as documentation for the IDE :)

Let me suggest some improvements to some type hints encountered throughout the code.

I will upload this issue alongside my PR.

The goal of this issue is not to discuss design choices and their implications on the hints but to help with minor suggestions.

Type issues:

  • used of any the python built-on instead of Any the type.

[ENH] sktime integration?

Very nice package, stringently designed!

I was wondering whether you were thinking about sktime integration?

  • there are a lot of strategies for portfolio optimization that are based on forecasts, and sktime has native support fo hierarchical data, pairwise distance transformers for time series, etc.
  • further, I see you are looking into uncertainty set estimates, including cyclic boosting, which would integrate well with the companion package skpro.
  • thinking longer ahead, combining covariance estimators, forecasters, etc into a strategy would likely become compositional across packages. For this, we have built scikit-base, which is similar to your "import BaseEstimator" but imo more architecturally stringent. Especially if you want unified interface tests that go beyond sklearn check_estimator.

Opening an issue since I'm not quite sure what the best way is to get in touch.

[DOC] how to customize datasets?

Describe the issue linked to the documentation

https://skfolio.org/user_guide/datasets.html
The user is requesting the addition of documentation on how to customize datasets. Currently, the native datasets provided only include a portion of stocks from the S&P 500 and NASDAQ. In order to construct a portfolio of ETFs or stocks from other countries' markets, there is a need for custom datasets.

Suggest a potential alternative/fix

Provide information on how to create and customize datasets, ideally with relevant code examples.

[ENH] Integrate sklearn Metadata Routing

Scikit-learn released Metadata Routing in v1.4.
Integrating this feature into skfolio will make it cleaner and easier to use extrageneous datasets into meta estimators.
For example, creating a Covariance estimator that uses the assets' implied volatility time series in addition to the assets' returns time series. Then using such estimator in meta estimators such as Optimization estimators, pipelines and cross validators.

Rolling Measure for Population

Currently the rolling measure plot only works for portfolios and not for populations. This could be a helpful addition. Expected behaviour.

population.plot_rolling_measure(measure=RatioMeasure.SHARPE_RATIO)

Then if 4 difference portfolios in population, they would all be plotted

GraphicalLassoCV Example

GraphicalLassoCV Example

Is it possible to have an example where the precision matrix obtained with GraphicalLassoCV is employed?

[BUG] Population.plot_cumulative_returns() does not show all portfolios

Describe the bug

When integrating 2 portfolios into the Population class that result from an MeanRisk() with a third portfolio that was created with the Portfolio class, when I try to use the plot_cumulative_returns() method, only 1 portfolio is shown.

To Reproduce

X = pd.DataFrame(np.random.rand(365,3), columns=["SPY", "GOVT", "QQQ"], 
                 index=pd.date_range(start='2018-01-01', end='2018-12-31'))

model = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe"),
    min_weights={"SPY": 0.3, "GOVT": 0.1},
    max_weights={"SPY": 0.30001, "GOVT": 0.10001}
)

benchmark_inv = InverseVolatility(portfolio_params=dict(name="Inverse Vol"))
benchmark = Portfolio(X["SPY"].to_numpy().reshape(-1, 1), weights=[1.0], name="OnlySPY")

benchmark_inv.fit(X)
model.fit(X)

pred_model = model.predict(X)
pred_bench = benchmark_inv.predict(X)

fig = Population([pred_model, benchmark, pred_bench]).plot_cumulative_returns()
show(fig)

Expected behavior

The image should show the three portfolio cumulative returns.

Versions

import skfolio; print(skfolio.__version__)
0.1.2

[ENH] Renaming prior_estimator to distribution_estimator

Is your feature request related to a problem? Please describe.
The name prior_estimator and PriorModel could be confusing for some users as it's not clear what model the prior refers to.

Describe the solution you'd like
A solution would be to rename prior_estimator by distribution_estimator and PriorModel() byDistribution

Describe alternatives you've considered
Alternative could be dist_estimator, Dist, DistModel

Additional context
This will impact the API of all optimization estimators by renaming the parameter prior_estimator.

Dynamic Portfolio Weights

from skfolio import Portfolio

X = [
    [0.003, -0.001],
    [-0.001, 0.002],
    [0.0015, 0.004],
]

weights = [0.6, 0.4]

portfolio = Portfolio(X=X, weights=weights)

print(portfolio.returns)
>>> array([0.0014, 0.0002, 0.0025])

Starting from here, should there not be a weight that allows for 2D input? So instead of a static portfolio, a dynamic one.

For example, If I rebalance every month,

X = [
[0.003, -0.001],
...new months
[-0.001, 0.002],
[0.0015, 0.004],
....
]

weights = [
[0.90, 0.10],
..new month
[0.45, 0.55],
[0.45, 0.55],
....
]

portfolio = Portfolio(X=X, weights=weights)

[ENH] Weighted group constraints

Funds may contain multiple sectors. For example, the A fund may invest 40% to banking, 30% tech, 30% estate and the B fund may invest 20% to banking and 80% to industry. In group-and-linear-constraints page, asset groups can be assigned.

Let weights assigned to be these groups such as {A: {banking: 0.4, tech: 0.3, estate: 0.3}, so a constraint to banking counts A as 0.4 and B as 0.2.

It would be very useful for a portfolio composed of ETFs.

Add support for python3.9

Can Skfolio add support for python3.9? Because some commonly used quantization packages like qlib support only up to python3.9.

[ENH] Implement the effective number of assets metric

Is your feature request related to a problem? Please describe.
Effective number of assets is a metric related to portfolio concentration.

Describe the solution you'd like
Having a new metric taking a weights array and returning the inverse of the sum of squared weights.
wikipedia page

[DOC] jupyerlite kernel & notebooks paths

Describe the issue linked to the documentation

Wow, super cool library & docs.
I had not seen sphinx-gallery docs before, they look nice. It looks like the gallery includes a jupyterlite environment where you can run notebooks like this; I haven't seen that before either.

Suggest a potential alternative/fix

Given the docs are hosted on GitHub pages for now, I guess you probably can't install and run a kernel in that environment. It seemed that it couldn't quite find the notebooks, for example, https://skfolio.org/lite/lab/?path=auto_examples/1_mean_risk/plot_1_maximum_sharpe_ratio.ipynb, but maybe that can be fixed in conf.py via these docs - maybe the notebook can still be displayed nicely in jupyterlite, just not run, which would probably still be cool.
Either way, maybe you'd want to temporarily remove allusions to the jupyterlite pages from your docs, e.g. plot_1_maximum_sharpe_ratio.html#L399 & plot_1_maximum_sharpe_ratio.html#L836 with a script or something, or maybe can just be done in sphinx-gallery build.

[BUG] Ruff *pyproject* Configuration is Deprecated.

Describe the bug
Since Ruff v0.2.0 the configuration of ruff using the pyproject.toml file changed.

To Reproduce
any ruff command will echo a warning encouraging the user to update the ruff configuration.

ruff check

Expected behavior
No warning should be displayed.

Versions
0.1.2

[BUG] risk_free_rate argument does not seem to work

It seems the MeanRisk class ignores the input value of risk_free_rate and keeps it fixed at 0 no matter what the user inputs.
You can run the example below and notice that the risk free rate is printed as 0 even if it is input as 0.02/252.

You can run the code below.

from yfinance import download
DATA = download(["TSLA","SBUX","CAKE"], start="2019-5-1", end="2024-5-1")
ADJUSTED = DATA["Adj Close"]
from skfolio.preprocessing import prices_to_returns
RETURNS = prices_to_returns(ADJUSTED)
from skfolio.optimization import MeanRisk, ObjectiveFunction
MODEL = MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_RATIO , risk_free_rate=0.02/252)
P = MODEL.fit_predict(RETURNS)
print( P.risk_free_rate)

Expected behavior

The printed risk free rate should not equal 0.

[ENH] Add Exposure Stacking as final_estimator in StackingOptimization

Is your feature request related to a problem? Please describe.
Add equation (3) from the Exposure Stacking article https://ssrn.com/abstract=4709317 as final_estimator in StackingOptimization.

Describe the solution you'd like
Adding the inner sum from equation (3) as a final_estimator objective in StackingOptimization. To adapt the method to the case without parameter uncertainty, I suggest that the data is split into K folds (L in the article) on which optimal portfolio exposures are computed for each estimator and then combined using the Exposure Stacking objective.

If the functionality is at some point used in combination with resampled parameter uncertainty as in the article, the number of folds parameter L = 2 seems to be a good default value, while it will be interesting to experiment with that one as well. Especially if multiple portfolio estimators are applied to the parameter samples.

Describe alternatives you've considered
Alternatives are the current implementation with portfolio optimization objectives that are implemented elegantly in this package already.

Additional context
Exposure Stacking is a method to stack the results of different portfolio optimizations that is independent of any investment risk measure and, hence, a good candidate when portfolios have been optimized using different portfolio optimization objectives. For the resampled parameter uncertainty case, it alleviates the need for adding additional constraints on risk and return to avoid significant unintended drift from the average risk and return of the optimized portfolios.

[BUG] Python Version Request: Reduce python >= 3.9

Thanks for the great-looking package!

I have hopefully a small request. Currently you have:

requires-python = ">=3.10"

Is it possible to change this to 3.9 to reduce the restrictiveness?

Sklearn doesn't actually specify a python version in their project.toml. And I don't see a need to restrict from the current dependencies:

dependencies = [
    "numpy>=1.23.4",
    "scipy>=1.8.0",
    "pandas>=1.4.1",
    "cvxpy>=1.4.1",
    "scikit-learn>=1.3.2",
    "joblib>=1.3.2",
    "plotly>=5.15.0"
]

[BUG] Annualization factor with 252 business days in a year

Describe the bug
In all metrics the 255 business days standard is used.
On average US based calendar has 252 business days in a year.
While I understand this depends on the country and on the specific year, most annualization metrics should at least be clear about it.

Expected behavior
Annualization should be based on typically 252 business days for US based user.

Date-centered Test Sets (Rebalancing)

As of now, the CV is valuable, but to bring it closer to asset management as opposed to data science the rebalancing ordinarily doesn't occur in a number of days, but perhaps on the 15th of every month or every 4th Friday.

Expected behaviour:


##polars nomenclature
self.cv = WalkForwardDate(period=252, every="4w",  offset="-3d") ## This would run every 4th Friday.

##asset management nomenclature
self.cv = WalkForwardDate(lookback=252, rebalance="4w",  offset="-3d") ## This would run every 4th Friday.

Here I am taking inspiration from the group_by_dynamic from polars.

    df_signal_long.group_by_dynamic(
                index_column=date_col,
                every=every,
                period=f"{lookback}{every[-1]}" if isinstance(lookback, int) else lookback,
                by=entity_col,
                closed="left",
                label="left",
                offset="-3d",  # Offset by 3 days to end on Friday
                include_boundaries=False
            )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.