Comments (20)
Will do augment_fourier (discussed with Justin Kurland) :)
from pytimetk.
Awesome that's much appreciated!
from pytimetk.
As long as it works as intended I'm Ok. Thanks!
from pytimetk.
I will do ts_summary at the same time because I need it.
I updated checks.py like this (not yet pushed):
def check_data_type(data, authorized_dtypes: list, error_str=None): if not error_str: error_str = f'Input type must be one of {authorized_dtypes}' if not sum(map(lambda dtype: isinstance(data, dtype), authorized_dtypes)) > 0: raise TypeError(error_str) def check_dataframe_or_groupby(data: Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]) -> None: check_data_type( data, authorized_dtypes = [ pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy ], error_str='`data` is not a Pandas DataFrame or GroupBy object.') def check_dataframe_or_groupby_polar(data: Union[pl.DataFrame, pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]) -> None: check_data_type(data, authorized_dtypes = [ pl.DataFrame, pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy ])
It seems more Pythonic to me, if you agree with it. I ran the tests/ it is working :)
I am doing a polars version of augment_fourier, then if possible I plan to merge the polar version with augment_fourier_v2, converting pandas dtypes to polars dtypes, then doing the computations, then converting back. Is that what you intended to do ?
@GTimothee yes, that is correct. pandas -> polars -> pandas ... where inside the function the conversions occur. There may be some functions at the moment where polars dataframes are being accepted. Do not use that pattern those have to be refactored to only accept pandas.
from pytimetk.
Understood :) Sorry I am lacking time a little bit but I am on it !
from pytimetk.
Ok sounds good. I plan to release 0.2.0 tomorrow. Let me know if there is anything I can do to help.
from pytimetk.
I think that's just an example of the time zone
from pytimetk.
No I believe it can be different time zones. That comment is just an example.
from pytimetk.
Can I take ceil_date? @JustinKurland
Absolutely @seyf97 . I had begun working on this to figure out what this looked like for polars
dataframes and series. I actually finished figuring this out for most dates, but did not start on datetimes. This code should help you start quickly.
Dataframes
import polars as pl
# Create a DataFrame with a datetime column
df = pl.DataFrame({
'date': ['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04', '2024-02-26'],
'value': [1, 2, 3, 4, 5]
})
# Convert the date column to datetime
df = df.with_columns(pl.col('date').str.strptime(pl.Date, format="%Y-%m-%d"))#.cast(pl.Datetime)
# week
(df.with_columns(
(pl.col('date')
.dt.offset_by('1w')
.dt.truncate('1w')
.dt.offset_by('-1d'))
.alias('ceil_W'))
)
# month
(df.with_columns(
(pl.col('date')
.dt.offset_by('1mo')
.dt.truncate('1mo')
.dt.offset_by('-1d'))
.alias('ceil_M')
)
)
# or you can use this but I think given the pattern it probably makes more sense to actually not use it and use the pattern
df.with_columns(pl.col("date").dt.month_end())
# quarter
(df.with_columns(
(pl.col('date')
.dt.offset_by('1q')
.dt.truncate('1q')
.dt.offset_by('-1d'))
.alias('ceil_Q')))
# year
(df.with_columns(
(pl.col('date')
.dt.offset_by('1y')
.dt.truncate('1y')
.dt.offset_by('-1d'))
.alias('ceil_Y')))
# So the missing ceiling now for the dataframe pattern all relates to the time component like hour, minute, and
# second and whatever other `pandas` frequency we have included to ensure alignment.
Series
pl_series = pl.Series('date', ['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04', '2024-02-26'])
pl_series = pl_series.str.strptime(pl.Date, format="%Y-%m-%d")
# Week
pl_series.dt.offset_by('1w_saturating').dt.truncate('1w').dt.offset_by('-1d')
# Month - In the case of the month I recommend to use this as using the offset pattern does not give consistent results
# but .month_end() does
pl_series.dt.month_end()
# Quarter
pl_series.dt.offset_by('1q_saturating').dt.truncate('1q').dt.offset_by('-1d')
# Year
pl_series.dt.offset_by('1y_saturating').dt.truncate('1y').dt.offset_by('-1d')
# So the missing ceiling now for the series pattern, like with the dataframes, all relates to the time component like hour,
# minute, and second and whatever other `pandas` frequency we have included to ensure alignment.
Hopefully this helps jump start your effort quickly!
from pytimetk.
Will do get_frequency_summary
from pytimetk.
I will check what's possible there
from pytimetk.
Adding this as a running checklist for tracking what has been completed and by whom. Should you wish to contribute to this issue, and there are plenty of functions to work on please just @JustinKurland here and I will add you to the respective function you are working on and when completed make sure it is listed here for ongoing efforts and to get some credit for helping out!
Polars Backend Functions
Wrangling Pandas Time Series DataFrames
-
summarize_by_time
@JustinKurland -
apply_by_time
-
pad_by_time
@JustinKurland -
future_frame
Anomaly Detection
-
anomalize
Adding Features to Time Series DataFrames (Augmenting)
-
augment_timeseries_signature
@JustinKurland -
augment_holiday_signature
@JustinKurland -
augment_lags
@JustinKurland -
augment_leads
@JustinKurland -
augment_rolling
@alexriggio -
augment_rolling_apply
-
augment_expanding
@alexriggio -
augment_expanding_apply
-
augment_fourier
@GTimothee -
augment_hilbert
@tackes -
augment_wavelet
TS Features
-
ts_features
-
ts_summary
@GTimothee
Finance Module
-
augment_ewn
Time Series for Pandas Series
-
make_future_timeseries
-
make_weekday_sequence
@JustinKurland -
make_weekend_sequence
@JustinKurland -
get_date_summary
@JustinKurland -
get_frequency_summary
-
get_diff_summary
-
get_frequency
-
get_seasonal_frequency
-
get_trend_frequency
-
get_timeseries_signature
@JustinKurland -
get_holiday_signature
@JustinKurland
Date Utilities
-
floor_date
@JustinKurland -
ceil_date
@seyf97 -
is_holiday
@JustinKurland -
week_of_month
@JustinKurland -
time_series_unit_frequency_table
@JustinKurland -
time_scale_template
@JustinKurland
Extra Pandas Helpers
-
glimpse
@JustinKurland -
parallel_apply
-
progress_apply
-
flatten_multiindex_column_names
13 Datasets
-
get_available_datasets
@JustinKurland -
load_dataset
@JustinKurland
from pytimetk.
I will do ts_summary at the same time because I need it.
I updated checks.py like this (not yet pushed):
def check_data_type(data, authorized_dtypes: list, error_str=None):
if not error_str:
error_str = f'Input type must be one of {authorized_dtypes}'
if not sum(map(lambda dtype: isinstance(data, dtype), authorized_dtypes)) > 0:
raise TypeError(error_str)
def check_dataframe_or_groupby(data: Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]) -> None:
check_data_type(
data, authorized_dtypes = [
pd.DataFrame,
pd.core.groupby.generic.DataFrameGroupBy
], error_str='`data` is not a Pandas DataFrame or GroupBy object.')
def check_dataframe_or_groupby_polar(data: Union[pl.DataFrame, pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]) -> None:
check_data_type(data, authorized_dtypes = [
pl.DataFrame,
pd.DataFrame,
pd.core.groupby.generic.DataFrameGroupBy
])
It seems more Pythonic to me, if you agree with it. I ran the tests/ it is working :)
I am doing a polars version of augment_fourier, then if possible I plan to merge the polar version with augment_fourier_v2, converting pandas dtypes to polars dtypes, then doing the computations, then converting back. Is that what you intended to do ?
from pytimetk.
I think we can check augment_fourier, no ?
I am now starting to add polars support to ts_summary.
About the speed improvement on calc_fourier, I found a bug in my new implementation so I will have to experiment a bit more and check again that my idea is good. I will be in touch with Justin K about this.
from pytimetk.
Actually the main problem I have is with checking my results. I am trying %timeit in a notebook cell but everytime I run it it gives me different results. And there is also a difference between running my experiments notebook locally and in colab'. Not the same output. I am not sure what I am doing wrong.
But I guess my experimental function is not good enough anyway because in general, even with the variations, the current implementation is faster. I had an implementation leveraging itertools.permutation which was faster but I found that it does not give good results. I switched to itertools.product and now it is slower :/
from pytimetk.
In this function : https://github.com/business-science/pytimetk/blob/master/src/pytimetk/core/ts_summary.py#L398 why is there the comment "# "America/New_York" ?
from pytimetk.
I was wondering if you were expected this particular time zone
from pytimetk.
Actually the main problem I have is with checking my results. I am trying %timeit in a notebook cell but everytime I run it it gives me different results. And there is also a difference between running my experiments notebook locally and in colab'. Not the same output. I am not sure what I am doing wrong.
There are many reasons that running something even just locally could generate different results, I would not expect them to be identical. In fact you may get instances where the time goes down as a function of caching. Do not get thrown off by this. Further and related, I would not expect your results in colab to be the same. Also in colab I do not know what your setup is, but you can choose to take advantage of GPUs. You can check disk information using a command like !df -h
. To see CPU specs, !cat /proc/cpuinfo
. For memory, !cat /proc/meminfo
.
But I guess my experimental function is not good enough anyway because in general, even with the variations, the current implementation is faster. I had an implementation leveraging itertools.permutation which was faster but I found that it does not give good results. I switched to itertools.product and now it is slower :/
Maybe we can connect. I am not sure why you would be using itertools
for pretty much anything we are doing, so deeply curious how you are using this.
from pytimetk.
Yes I will submit my experiments to you asap to get some feedback :) I was using itertools to generate permutations of order x period. It is how I would replace the loops.
from pytimetk.
Can I take ceil_date? @JustinKurland
from pytimetk.
Related Issues (20)
- AUGMENTING `augment_diffs()` HOT 2
- `reduce_memory_usage()`: `np.float16` Precision and Decimal Rounding HOT 1
- MEMORY_UTILITIES `reduce_memory_usage()` HOT 1
- Black - Code Formarting
- Correlation Funnel HOT 3
- add topics HOT 1
- fourier - Does not respect pandas groups sort HOT 1
- Charts not showing HOT 1
- Bug - Correlation Funnel Integration (Plotnine + adjustText) Arrows not adjusting
- Error: 'DataFrame' object has no attribute 'binarize' HOT 2
- Polars Tests Failing - `polars==0.20.7`
- Pandas 2.2.0: Deprecate aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets
- Inconsistent sorting - make more obvious HOT 2
- Mac M1 Users: Polars - The following cpu features were not detected
- Change sort for rolling calcs or enable calcs to work on forward periods
- Doesn't install with python 3.12.2 HOT 2
- Polars engine doesn't maintain sort order
- Time Series Cross Validation HOT 1
- Holiday data augmentation not working for Spain
- Outdated documentation?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytimetk.