goldencheetah / sweatpy Goto Github PK

View Code? Open in Web Editor NEW

73.0 73.0 20.0 3.34 MB

Endurance sports analysis library for Python

Home Page: http://sweatpy.gssns.io

License: MIT License

Makefile 0.70% Python 99.30%

cycling python3 science

sweatpy's Issues

Odd results from PowerDurationRegressor(model="2 param")

I've been playing around with estimating CP along the "2 param" model and found the following oddity: Athlete A has a 3min MMP of 440 and athlete B of 400. Both have more or less the same 10min MMP.

If I feed that to the regressor, I get unexpected values (at least to me) along the following code. Am I using it wrongly, do I have a general misunderstanding or is it a bug?

$ cat cpcalctest.py 
import sweat

durations = [180., 600.] # 600
X = sweat.array_1d_to_2d(durations)

Y = [440, 361.3333333, ]
pdmreg = sweat.PowerDurationRegressor(model="2 param")
pdmreg.fit(X, Y)

print(Y)
print(pdmreg.cp_)
print(pdmreg.w_prime_)

Y = [400, 361.5555556, ]
pdmreg = sweat.PowerDurationRegressor(model="2 param")
pdmreg.fit(X, Y)

print(Y)
print(pdmreg.cp_)
print(pdmreg.w_prime_)

$ python3 cpcalctest.py 
[440, 361.3333333]
327.61904757142855
20228.57143714286
[400, 361.5555556]
345.0793651428571
9885.71427428572

Discussion: project stucture

(involving @sladkovm)
A lot of different algorithms, tools and models will be added so I think is good to already think about where everything is going so we don't end up with a labyrinth. I like this guide on structuring Python projects.

I think the structure below could work but maybe I'm missing things or you know a better structure.

.
+-- sweat
|   +-- __init__.py
|   +-- algorithms
|         +-- __init__.py
|         +-- pdm (power duration models)
|                +-- __init__.py
|                +-- critical_power.py
|                +-- w_prime_balance.py
|         +-- metrics
|                +-- __init__.py
|                +-- power.py
|                +-- heartrate.py
|                +-- speed.py
|                +-- location.py
|   +-- io
|         +-- __init__.py
|         +-- strava.py
|         +-- goldencheetah.py
|         +-- fitfile.py
|   +-- models
|         +-- __init__.py
|         +-- dataframes.py
|         +-- base.py
|         +-- mixins.py
|         +-- utils.py
|   +-- utils.py
+-- tests

Discussion: make 100% test coverage part of automated build checks?

.tcx file: ParseError: XML or text declaration not at start of entity

I am having the following problem with .tcx files:

ParseError: XML or text declaration not at start of entity: line 1, column 10

Seems that all my .tcx files have 10 blank spaces before the <?xml argument.

Is there any possible workaround for reading such .tcx files?

Here's an example: test.txt (file extension was changed to .txt due to GitHub limitations and needs to be changed to .tcx).

Thanks!

Duplicate of mean_max() algorithm

There are currently two versions:

algorithms.main.mean_max_power() - relies on rolling mean calculation
algorithms.metrics.power_duration_curve() - relies on calculating diff of accumulated energy and is x4 faster

Proposal:

Rename algorithm to mean_max() because it can be applied not only to power but to pace as well
Keep faster implementation with data type casting resolved according to the outcome of #5
Move it to main, which will serve as a module for the general purpose calculations or according to the outcome of #7

Add PEP8 checks to automatic builds

I made a start with this here: https://github.com/AartGoossens/sweatpy/tree/feature/isort_pylint
Any ideas on what to check are welcome (e.g. max line length).

Using list comprehensions instead of lamda can be shorter

sweatpy/sweat/io/strava.py

Line 193 in 2513172

y = list(map(lambda x: x['min'], zones[type]["zones"]))

ex.
y = [ x['min'] for x in zones[type]['zones'] ]

Duplicate of weigthed_average_power()

There are 2 + 1 functions:

algorithms.main.weighted_average_power() - should be leading, because the name is right and it is already in the right module.

algorithms.metrics.normalized_power() - currently it implements both NP and xPower calculations. Proposal it to convet it to single purpose only algorithms.main.x_power()

Another aspect - the metrics functions do allow masking with replacement to be specified. It is a useful feature, for data streams with sporadic readings during the standstill. Strava, for example, provides a boolean array for indicating moving/standstill that might be used as a mask.

algorithms.metrics.relative_intensity() - shoud move to algorithms.main and mention RI only.
algorithms.metrics.stress_score() - should be split into stress_score and bike_score.

Discussion: input and output argument type casting

@sladkovm This is the discussion we started before. Shall we continue here?

For algorithms it is possible to accept multiple input argument types and cast them to a desirable type and cast the output to a type similar to the input type (example of a method that does this in the vmpy source code).

Benefits:

Usage of the algorithms is more straightforward when you don't have to worry about the input types
...

Drawbacks:

Input argument type is part of the interface specification. Allowing multiple input types might make the interface vague
It is not possible to accept all input types. How do we communicate what is/is not possible?
Unittesting for multiple input types is a hassle
...

I have a preference for accepting (i.e. developing for) one input type.

Alternative solution: It might be possible to add input and output argument type casting by decorating the algorithms "on the fly" by calling a helper function after import (sort of similar to this). It's not the cleanest way to do it but it might work, I'll try to make a POC for this. For me this would solve the drawbacks and it would still offer input/output type casting for users that do want it.

Calculate athlete Critical Power?

I've been playing with this library for a little bit now and it seems like the majority of models I am trying to run require an athlete to be defined for the df.

I've been able to do this, but can't figure out how to calculate the CP for that athlete.

Is calculating the CP for an athlete based off previous workout data something this library supports, or is it required that you define the CP for the athlete when assigning to a df.

The code shows cp as an optional argument, but leaving it blank causes a runtime error:

session, engine = db_connect()
# May need to filter on max power grouped by interval
df = pd.read_sql(
    sql=session.query(stravaBestSamples.interval.label('time'),
                      stravaBestSamples.heartrate,
                      stravaBestSamples.cadence,
                      stravaBestSamples.velocity_smooth.label('speed'),
                      func.max(stravaBestSamples.best_power).label('power')).
        group_by(stravaBestSamples.interval).
        filter(or_(stravaBestSamples.interval == 180,
                   stravaBestSamples.interval == 420,
                   stravaBestSamples.interval == 720),
               stravaBestSamples.type == 'Ride',
               # stravaBestSamples.timestamp_local >= '2020-02-01'
               ).statement, con=engine)
engine.dispose()
session.close()

wdf = WorkoutDataFrame(df)
wdf.athlete = Athlete(name='Ethan')  # , cp=175, w_prime=20000)
wdf.compute_w_prime_balance()
print(wdf.athlete.cp)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/Ethan/PycharmProjects/fitness/index.py", line 254, in <module>
    test()
  File "C:\Users\Ethan\PycharmProjects\fitness\pages\pandasTesting.py", line 37, in test
    wdf.compute_w_prime_balance()
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\io\models\dataframes.py", line 35, in compute_w_prime_balance
    self.athlete.w_prime, algorithm, *args, **kwargs)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 112, in w_prime_balance
    return method(power, cp, w_prime, *args, **kwargs)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 46, in w_prime_balance_waterworth
    tau = get_tau_method(power, cp, tau_dynamic, tau_value)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 25, in get_tau_method
    static_tau = tau_w_prime_balance(power, cp)
  File "C:\Users\Ethan\PycharmProjects\fitness\venv\lib\site-packages\sweat\pdm\w_prime_balance.py", line 14, in tau_w_prime_balance
    delta_cp = cp - avg_power_below_cp
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

GoldenCheetah OpenData Project compatibility

It would be nice if sweatpy supports working with data from the GoldenCheetah OpenData Project right out of the box. Probably could be a separate module in sweat.io, similar to sweat.io.strava, and sweat.io.fit.

Resources:
https://osf.io/6hfpz/
https://github.com/GoldenCheetah/OpenData

Add from_strava(activity_id) method to WDF

It would be nice to be able to populate WDF from multiple sources. Strava is one very obvious pick.

To make this work efficiently few things must be agreed upon:

Access Strava API directly using requests and json (currently implemented in strava.py and very easy to reason about) or rely on stravalib with all benefits of errors handling, but with an overhead of response-object mapping.
Decide on what to do with the @requires decorator. Strava calls power - watts for example. One option is to get rid of the decorator and allow methods to specify the column name as an input argument (what seaborn or pandas.plot do for example). Another option is to rename the columns on the fly to canonical names so @requires doesn't get confused. I do prefer the first option.

Discussion: documentation

I'd like to start with proper documentation asap. I think updating/fixing the readme could be a quick start but after that I want to start with more extensive documentation. This issue is for discussing the approach.

Examples of documentation

Good examples

Not so good examples

https://docs.scipy.org/doc/numpy/

Tools

I like the combination of Sphynx to write/generate the documentation and Read the Docs to host it. Information on how to get started with this combination can be found here.
An alternative is to use the GitHub wiki or Github pages to write and host the code.
In general I like having the documentation in the same repository as the code: new features and their documentation can be part of 1 commit/PR.

Discussion points

Are there other tools for writing/generating documentation?
Do we want the docs in the same repo as the code?
Do we prefer plain api documentation (more like numpy/pandas) or more written documentation with examples (like vcrpy)?