felixvanoost / stravalyse Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 4.0 1.69 MB

A Python tool to analyze and display Strava activity data.

License: MIT License

Python 100.00%

cycling here-xyz running sports strava

stravalyse's People

Contributors

Stargazers

Watchers

Forkers

victortaloud jac08h priyanka1414 vuillaut

stravalyse's Issues

Ignore virtual activities when generating a geo data file

Summary
When generating a geo data file from a set of activity data that contains activities of type VirtualRide or VirtualRun, the tool will include these activities in the file. Virtual activities contain valid polylines and time information and are otherwise indistinguishable from non-virtual ones, but they should not be included in the geo data file as they did not take place in the real world.

Steps to reproduce

Run the tool with the command line argument -gu (generate and upload geospatial data) with a set of activity data that contains activities of type VirtualRide or VirtualRun.

Observed behaviour
The tool generates and uploads a geo data file that includes the virtual activities.

Expected behaviour
The tool should generate and upload a geo data file that excludes the virtual activities.

Replace the formatted date string with an ISO 8601 object when generating a geo data file

Currently, each activity (LineString) in a generated geo data file contains a formatted text string with the activity start date and time. These text strings should be replaced by ISO 8601-compatible DateTime objects, which are now supported by release 1.4.0 onwards of the HERE CLI. This allows activities to be filtered / grouped by date and time in HERE XYZ Studio.

Remove the use of the shell in subprocess

The tool currently relies on the shell=True argument in all subprocess calls, which are used mianly to interface with the HERE CLI. This presents both a security risk as well as causing issues with cross-platform behaviour and should be refactored to avoid using the shell entirely.

Ignore indoor activities when generating a geo data file

Summary
When generating a geo data file from a set of activity data that contains activities with polylines that are marked as indoor / trainer (trainer = "true"), the tool will include these activities in the file. Activities recorded with certain third-party apps (e.g. Wahoo Fitness) will still enable the GPS and record coordinates for indoor activities, which causes the corresponding polylines to be non-null. The geospatial data for these activities is typically of exceptionally poor quality and should therefore be excluded from the geo data file.

Steps to reproduce

Run the tool with the command line argument -gu (generate and upload geospatial data) with a set of activity data that contains activities with polylines that are marked as trainer = "true".

Observed behaviour
The tool generates and uploads a geo data file that includes the indoor / trainer activities.

Expected behaviour
The tool should generate and upload a geo data file that excludes the indoor / trainer activities.

Parse formatted tables returned by HERE CLI version 1.1.0+

Summary
The nicely-formatted data tables returned by release 1.1.0 onwards of the HERE CLI contain UTF-8 characters that are not handled by the tool, causing an error when trying to upload a geo data file to HERE XYZ. The tool should be updated to handle the formatting used in newer releases of the HERE CLI.

Steps to reproduce

Ensure that the currently installed HERE CLI is version 1.0.2 or older using the command here -V in a terminal window.
Run the tool with the command line argument -gu (generate and upload geospatial data).

Observed behaviour
The HERE XYZ upload process will fail with the following traceback:

Traceback (most recent call last):
  File "run.py", line 83, in <module>
    main()
  File "run.py", line 74, in main
    here_xyz.upload_geo_data(STRAVA_GEO_DATA_FILE)
  File "C:\Users\Felix\Documents\GitHub\Strava-Heatmap-Tool\here_xyz.py", line 76, in upload_geo_data
    space_id = _get_space_id()
  File "C:\Users\Felix\Documents\GitHub\Strava-Heatmap-Tool\here_xyz.py", line 35, in _get_space_id
    line = process.stdout.readline()
  File "C:\Users\Felix\Anaconda3\envs\Felix\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 359: character maps to <undefined>

Expected behaviour
The tool should correctly parse the output from newer releases of the HERE CLI and upload the geo data file as normal.

Incorrect date parsing causing exception when generating geo data file

Summary
When generating a geo data file from a set of activity data that contains activities with certain names that appear like dates or times, the tool incorrectly converts these names into DateTime objects and a ValueError exception occurs when attempting to write the corresponding activities to the geo data file.

Steps to reproduce

Run the tool with the command line argument -g (generate geo data file) with a set of activity data that contains activities with names that appear like dates or times. Two examples that caused this issue are "name": "10/09/2014" and "name": "Sun sun sun sun sun" (interpreted by the DateTime parser as 'Sunday').

Observed behaviour
The following exception occurs when attempting to generate the geo data file:

Traceback (most recent call last):
  File "strava_analysis_tool.py", line 113, in <module>
    main()
  File "strava_analysis_tool.py", line 90, in main
    geo.export_geo_data_file(config['paths']['geo_data_file'], activity_dataframe)
  File "C:\Users\Felix\Documents\GitHub\Strava-Heatmap-Tool\geo.py", line 114, in export_geo_data_file
    activity_map_geodataframe.to_file(file_path, driver='GeoJSON', encoding='utf8')
  File "C:\Users\Felix\Anaconda3\lib\site-packages\geopandas\geodataframe.py", line 504, in to_file
    to_file(self, filename, driver, schema, **kwargs)
  File "C:\Users\Felix\Anaconda3\lib\site-packages\geopandas\io\file.py", line 130, in to_file
    colxn.writerecords(df.iterfeatures())
  File "C:\Users\Felix\Anaconda3\lib\site-packages\fiona\collection.py", line 342, in writerecords
    self.session.writerecs(records, self)
  File "fiona/ogrext.pyx", line 1195, in fiona.ogrext.WritingSession.writerecs
  File "fiona/ogrext.pyx", line 412, in fiona.ogrext.OGRFeatureBuilder.build
ValueError: Invalid field type <class 'datetime.datetime'>

Expected behaviour
The tool should correctly identify and parse only valid ISO 8601 strings into DateTime objects when reading from the activity data file.

Update pandas float display format

Update the pandas float display format to use 2 decimal places with commas to make larger tables more readable.

Add an option to filter results by time

Add an option to select timeframe of the results, e.g. last month, last 3 weeks, April 2019 - September 2019.
This timeframe would then be used to output summary and graphs.

As the dates are already stored in pandas dataframe, filtering them by date should be possible. I'm not sure where the user would input the requested date - another CL argument?

What do you think?

Oh and PS: Thank you for writing this tool! I was hoping to find something similar. I think this app only scratched its potential - GUI, more customizable graphs, you name it (you did, in other issues :) ) and this could be really, really cool.

Rename the main module to be more descriptive

The current main module name run.py is generic and doesn't provide any useful information about its functionality. It should be renamed to something more descriptive, like strava_analysis_tool.py instead.

Add an option to generate a plot of activity counts over time

Add an option to generate a bar plot of activity counts over time for each activity type in the dataset.

Use stream option when uploading a geo data file

The HERE XYZ CLI offers a command-line option to upload GeoJSON files to the server using a streaming method, which results in significantly reduced upload times (3-4x faster) when the geo data file is large (>1000 activities). The tool should use this streaming method by default.

Create a simple GUI

Create a simple GUI for the tool to improve its usability.

Add an option to selectively export the geospatial activity data

The tool currently generates the activity geo data file by default, which may not be necessary every time it is run. An option should be added to choose whether the file is generated or not.

Automatically upload the activity geo data file to the HERE XYZ platform

The tool currently generates a file containing LineStrings and relevant metadata for all Strava activities with geospatial data in GeoJSON format. Using an online mapping platform like HERE XYZ, this file can be used to produce an interactive map of the activities in a similar fashion to the paid 'personal heatmaps' feature on Strava.

HERE XYZ offers both a CLI and an API to automate this (for now) manual uploading process. As an initial step, the CLI should be used to create an XYZ project (if one doesn't already exist) and automatically upload the file after generation.

Check for empty polylines when generating a geo data file

Summary
When generating a geo data file from a set of activity that that contains empty polyline strings, the tool will include the activities with empty polylines (i.e. no geospatial data) in the file. Attempting to upload this file to HERE XYZ via the CLI returns the error coordinates must have at least two elements.

Activities recorded directly with the Strava app and tagged as 'indoor' or 'trainer' have the polyline string value null. However, indoor activities uploaded as FIT files from other devices can cause the polyline string to be empty ('""') instead - a case the tool does not currently check for.

Steps to reproduce

Run the tool with the command line argument -gu (generate and upload geospatial data) with a set of activity data that contains one or more empty polyline strings.

Observed behaviour
The tool generates a geo data file that includes the activities with empty polyline strings (no geospatial data):

"geometry": { "type": "LineString", "coordinates": [ ] } },

Uploading this file to HERE XYZ using the command line option -gu will return the following error:

HERE XYZ: Error uploading geospatial data to space ID "xxxxxxxx"

Expected behaviour
The tool should check for the presence of empty polyline strings when generating a geo data file and prevent the corresponding activities from being included.

Add an option to generate a scatter plot of activity speed vs. distance

Add an option to generate a scatter plot of activity speed vs. distance for each activity type in the dataset.

ValueError with data without commutes

ValueError with data without commutes
When there are no commute data, correct summary is show but the application produces the ValueError

*Correct summary output*

Analysis: No commutes found
/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py:1099: UserWarning: Converting to PeriodArray/Index representation will drop timezone information.
  warnings.warn(
Traceback (most recent call last):
  File "strava_analysis_tool.py", line 101, in <module>
    main()
  File "strava_analysis_tool.py", line 93, in main
    analysis.display_commute_plots(activity_dataframe)
  File "/home/jh/SAT/analysis.py", line 331, in display_commute_plots
    _generate_commute_count_plot(commute_data, ax3, colours)
  File "/home/jh/SAT/analysis.py", line 99, in _generate_commute_count_plot
    sns.barplot(x=data.index.to_period('M'),
  File "/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/seaborn/categorical.py", line 3147, in barplot
    plotter = _BarPlotter(x, y, hue, data, order, hue_order,
  File "/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/seaborn/categorical.py", line 1616, in __init__
    self.establish_colors(color, palette, saturation)
  File "/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/seaborn/categorical.py", line 316, in establish_colors
    lum = min(light_vals) * .6
ValueError: min() arg is an empty sequence

Steps to reproduce

$ python strava_analysis_tool.py

Use new HERE platform

Migrate to HERE's new platform and remove calls to the now deprecated 'Data Hub'. The tool should create a catalog and Interactive Mapping Layer (IML) to store all uploaded geospatial activity data.

For now, the tool will continue to use a naive / wasteful uploading approach by always uploading all the activities (even if they already exist in the IML).

Refactor code to comply with PEP 8

The tool should be refactored in accordance with the naming conventions, formatting, and file structure outlined in PEP 8 - Style Guide for Python Code.

Add an option to generate a set of plots to visualise commute data

Add an option to generate bar plots of the following:

Number of commuting days per year
Total and average commuting distance per year
Number of commutes per month

Replace usage of the HERE CLI with the xyz-spaces-python package

Until recently, the HERE CLI was the easiest way to upload geospatial data to a HERE XYZ space for viewing. Earlier this year, HERE began development of a native Python library for HERE XYZ, xyz-spaces-python, which should make both development and user installation much easier. The tool should be updated to replace usage of the HERE CLI with xyz-spaces-python with basic feature parity.

Add an option to generate a plot of mean activity distance over time

Add an option to generate a bar plot of mean activity distance over time for each activity type in the dataset.

Define a consistent colour palette across plots

The tool currently generates plots using a hard-coded and inconsistent set of colours. A colour palette should be defined and used when generating plots to create a consistent visual theme.

Add an option to refresh the activity data

The tool currently updates the activity data file with any new activities, but does not store any changes made to existing ones. This situation is prone to occur whenever an activity is modified through the Strava platform - for instance, when a follower gives kudos or the athlete adds a description - after it has already been stored locally in the file.

To allow the activity data file to always reflect these latest changes, the tool should have an option to 'refresh' the data by wiping the file and re-requesting it from scratch.

Add an option to generate a histogram of activity distance

Add an option to generate a histogram of activity distance for each activity type in the dataset.

Exclude stationary activity types from the mean activity distance plot

When generating a plot of mean activity distance over time, the tool should ignore activity types that are stationary by nature (CrossFit, rock climbing, weight training, workout, and yoga). This prevents the activities from being included in the plot legend and displaying a distance of 0, which provides no useful information and clutters up the plot.

Add an option to generate a plot of top commutes by name

Add an option to generate a plot of top commutes based on their name.

Display information on dataset health

There is currently no easy way to see an overview of the 'health' of the activity dataset (e.g. how many activities are manual or are flagged) or how many activities contain extended sensor data (e.g. heart rate, cadence, or measured power). The tool should display the following information in a similar format to the summary statistics:

Number of manual activities
Number of flagged activities
Number of activities with heart rate data
Number of cycling activities with cadence data
Number of cycling activities with measured power data
Number of activities with temperature data

Add an option to generate a stacked bar plot of commute vs. non-commute riding distance over time

Add an option to generate a stacked bar plot or area plot of commute vs. non-commute riding distance by month.

Fix average speed calculation in summary statistics

Fix the incorrect average speed calculation reported in the summary statistics. The current calculation is as follows:

Mean speed (m/s) * 3.6

Which is incorrect because it calculates the mean of the mean speed reported for each activity. It should instead be:

Total distance (m) / total moving time (sec) * 3.6

Add an option to generate a plot of ride distance vs. bike used over time

Add an option to generate a plot of ride distance vs. bike used by month for each activity type in the dataset.

Pass geopandas dataframe to HERE XYZ directly

The tool currently generates a .geojson file containing the geospatial activity data and passes it to the add_features_geojson function in the xyz-spaces-python package to upload the data to HERE Studio. This is a holdover from the HERE CLI that was previously being used, with which a file was the only way to upload data to the server.

xyz-spaces-python allows data to be uploaded directly using a GeoPandas dataframe instead using the add_features_geopandas function. This stops the tool from having to generate a .geojson file unnecessarily, which should significantly reduce overall processing and uploading times.

The option to generate a .geojson file should be broken out into a separate command line argument, so that it is only created when requested by the user.

Add an option to generate a plot of activity start locations

Add an option to generate a plot of reverse-geocoded activity start locations (country + city).

Create a setup guide for HERE XYZ Studio

Create a setup guide for HERE XYZ Studio describing the basic procedure for:

Browsing through the data space created by the tool
Creating a new project and adding a data space to it
Changing the base map style
Apply conditional formatting to the line types and colours (e.g. based on activity type)

Implement the use of a tool configuration file

The tool currently relies on a mix of environment variables (for the Strava client ID and secret) and hard-coded values (e.g. file paths) for configuration, which somewhat restricts its flexibility. These configurable values should be imported via a user-modifiable file instead.

Check for an empty DataFrame when generating commute statistics

Summary
When generating commute statistics from a set of activity data that contains no activities marked as a commute on Strava (i.e. only activities with the attribute "commute" = false), the tool displays an empty DataFrame instead of indicating that no commutes are present.

Steps to reproduce

Run the tool with a set of activity data that contains no activities marked as a commute.

Observed behaviour
The tool displays an empty DataFrame:

Commute statistics:

Empty DataFrame
Columns: []
Index: []

Expected behaviour
The tool displays a message indicating that no commutes are present in the activity data:

Analysis: No commutes found

Update to Python 3.8

Update the tool to run on Python 3.8 and generate the necessary requirements.txt (for Pip) and environment.yml (for Anaconda) files.

Add an option to generate a heatmap plot of activity moving time by month

Add an option to generate a heatmap plot of activity moving time by month for each activity type in the dataset.

Store activity data using pandas directly

Store the Strava activity data directly from a pandas DataFrame to a file in JSON format using the built-in pandas to_json and read_json methods. This should help reduce loading and processing times, which will better support larger datasets and allow additional metadata (e.g. addresses obtained by reverse geocoding) to be stored without impacting the responsiveness of the tool.

Delete all HERE IML features when refreshing data

Delete all the features (activities) stored in the HERE interactive mapping layer (IML) when the --refresh-data option is selected. This ensures that the activity data stored on HERE is always up to date.

felixvanoost / stravalyse Goto Github PK

stravalyse's People

Contributors

Stargazers

Watchers

Forkers

stravalyse's Issues

Recommend Projects

Recommend Topics

Recommend Org