felixvanoost / stravalyse Goto Github PK
View Code? Open in Web Editor NEWA Python tool to analyze and display Strava activity data.
License: MIT License
A Python tool to analyze and display Strava activity data.
License: MIT License
Summary
When generating a geo data file from a set of activity data that contains activities of type VirtualRide
or VirtualRun
, the tool will include these activities in the file. Virtual activities contain valid polylines and time information and are otherwise indistinguishable from non-virtual ones, but they should not be included in the geo data file as they did not take place in the real world.
Steps to reproduce
VirtualRide
or VirtualRun
.Observed behaviour
The tool generates and uploads a geo data file that includes the virtual activities.
Expected behaviour
The tool should generate and upload a geo data file that excludes the virtual activities.
Currently, each activity (LineString) in a generated geo data file contains a formatted text string with the activity start date and time. These text strings should be replaced by ISO 8601-compatible DateTime objects, which are now supported by release 1.4.0 onwards of the HERE CLI. This allows activities to be filtered / grouped by date and time in HERE XYZ Studio.
The tool currently relies on the shell=True
argument in all subprocess
calls, which are used mianly to interface with the HERE CLI. This presents both a security risk as well as causing issues with cross-platform behaviour and should be refactored to avoid using the shell entirely.
Summary
When generating a geo data file from a set of activity data that contains activities with polylines that are marked as indoor / trainer (trainer = "true"
), the tool will include these activities in the file. Activities recorded with certain third-party apps (e.g. Wahoo Fitness) will still enable the GPS and record coordinates for indoor activities, which causes the corresponding polylines to be non-null. The geospatial data for these activities is typically of exceptionally poor quality and should therefore be excluded from the geo data file.
Steps to reproduce
-gu
(generate and upload geospatial data) with a set of activity data that contains activities with polylines that are marked as trainer = "true"
.Observed behaviour
The tool generates and uploads a geo data file that includes the indoor / trainer activities.
Expected behaviour
The tool should generate and upload a geo data file that excludes the indoor / trainer activities.
Summary
The nicely-formatted data tables returned by release 1.1.0 onwards of the HERE CLI contain UTF-8 characters that are not handled by the tool, causing an error when trying to upload a geo data file to HERE XYZ. The tool should be updated to handle the formatting used in newer releases of the HERE CLI.
Steps to reproduce
here -V
in a terminal window.-gu
(generate and upload geospatial data).Observed behaviour
The HERE XYZ upload process will fail with the following traceback:
Traceback (most recent call last):
File "run.py", line 83, in <module>
main()
File "run.py", line 74, in main
here_xyz.upload_geo_data(STRAVA_GEO_DATA_FILE)
File "C:\Users\Felix\Documents\GitHub\Strava-Heatmap-Tool\here_xyz.py", line 76, in upload_geo_data
space_id = _get_space_id()
File "C:\Users\Felix\Documents\GitHub\Strava-Heatmap-Tool\here_xyz.py", line 35, in _get_space_id
line = process.stdout.readline()
File "C:\Users\Felix\Anaconda3\envs\Felix\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 359: character maps to <undefined>
Expected behaviour
The tool should correctly parse the output from newer releases of the HERE CLI and upload the geo data file as normal.
Summary
When generating a geo data file from a set of activity data that contains activities with certain names that appear like dates or times, the tool incorrectly converts these names into DateTime objects and a ValueError
exception occurs when attempting to write the corresponding activities to the geo data file.
Steps to reproduce
-g
(generate geo data file) with a set of activity data that contains activities with names that appear like dates or times. Two examples that caused this issue are "name": "10/09/2014"
and "name": "Sun sun sun sun sun"
(interpreted by the DateTime parser as 'Sunday').Observed behaviour
The following exception occurs when attempting to generate the geo data file:
Traceback (most recent call last):
File "strava_analysis_tool.py", line 113, in <module>
main()
File "strava_analysis_tool.py", line 90, in main
geo.export_geo_data_file(config['paths']['geo_data_file'], activity_dataframe)
File "C:\Users\Felix\Documents\GitHub\Strava-Heatmap-Tool\geo.py", line 114, in export_geo_data_file
activity_map_geodataframe.to_file(file_path, driver='GeoJSON', encoding='utf8')
File "C:\Users\Felix\Anaconda3\lib\site-packages\geopandas\geodataframe.py", line 504, in to_file
to_file(self, filename, driver, schema, **kwargs)
File "C:\Users\Felix\Anaconda3\lib\site-packages\geopandas\io\file.py", line 130, in to_file
colxn.writerecords(df.iterfeatures())
File "C:\Users\Felix\Anaconda3\lib\site-packages\fiona\collection.py", line 342, in writerecords
self.session.writerecs(records, self)
File "fiona/ogrext.pyx", line 1195, in fiona.ogrext.WritingSession.writerecs
File "fiona/ogrext.pyx", line 412, in fiona.ogrext.OGRFeatureBuilder.build
ValueError: Invalid field type <class 'datetime.datetime'>
Expected behaviour
The tool should correctly identify and parse only valid ISO 8601 strings into DateTime objects when reading from the activity data file.
Update the pandas
float display format to use 2 decimal places with commas to make larger tables more readable.
Add an option to select timeframe of the results, e.g. last month, last 3 weeks, April 2019 - September 2019.
This timeframe would then be used to output summary and graphs.
As the dates are already stored in pandas dataframe, filtering them by date should be possible. I'm not sure where the user would input the requested date - another CL argument?
What do you think?
Oh and PS: Thank you for writing this tool! I was hoping to find something similar. I think this app only scratched its potential - GUI, more customizable graphs, you name it (you did, in other issues :) ) and this could be really, really cool.
The current main module name run.py
is generic and doesn't provide any useful information about its functionality. It should be renamed to something more descriptive, like strava_analysis_tool.py
instead.
Add an option to generate a bar plot of activity counts over time for each activity type in the dataset.
The HERE XYZ CLI offers a command-line option to upload GeoJSON files to the server using a streaming method, which results in significantly reduced upload times (3-4x faster) when the geo data file is large (>1000 activities). The tool should use this streaming method by default.
Create a simple GUI for the tool to improve its usability.
The tool currently generates the activity geo data file by default, which may not be necessary every time it is run. An option should be added to choose whether the file is generated or not.
The tool currently generates a file containing LineStrings and relevant metadata for all Strava activities with geospatial data in GeoJSON format. Using an online mapping platform like HERE XYZ, this file can be used to produce an interactive map of the activities in a similar fashion to the paid 'personal heatmaps' feature on Strava.
HERE XYZ offers both a CLI and an API to automate this (for now) manual uploading process. As an initial step, the CLI should be used to create an XYZ project (if one doesn't already exist) and automatically upload the file after generation.
Summary
When generating a geo data file from a set of activity that that contains empty polyline strings, the tool will include the activities with empty polylines (i.e. no geospatial data) in the file. Attempting to upload this file to HERE XYZ via the CLI returns the error coordinates must have at least two elements
.
Activities recorded directly with the Strava app and tagged as 'indoor' or 'trainer' have the polyline string value null
. However, indoor activities uploaded as FIT files from other devices can cause the polyline string to be empty ('""') instead - a case the tool does not currently check for.
Steps to reproduce
-gu
(generate and upload geospatial data) with a set of activity data that contains one or more empty polyline strings.Observed behaviour
The tool generates a geo data file that includes the activities with empty polyline strings (no geospatial data):
"geometry": { "type": "LineString", "coordinates": [ ] } },
Uploading this file to HERE XYZ using the command line option -gu
will return the following error:
HERE XYZ: Error uploading geospatial data to space ID "xxxxxxxx"
Expected behaviour
The tool should check for the presence of empty polyline strings when generating a geo data file and prevent the corresponding activities from being included.
Add an option to generate a scatter plot of activity speed vs. distance for each activity type in the dataset.
ValueError with data without commutes
When there are no commute data, correct summary is show but the application produces the ValueError
*Correct summary output*
Analysis: No commutes found
/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py:1099: UserWarning: Converting to PeriodArray/Index representation will drop timezone information.
warnings.warn(
Traceback (most recent call last):
File "strava_analysis_tool.py", line 101, in <module>
main()
File "strava_analysis_tool.py", line 93, in main
analysis.display_commute_plots(activity_dataframe)
File "/home/jh/SAT/analysis.py", line 331, in display_commute_plots
_generate_commute_count_plot(commute_data, ax3, colours)
File "/home/jh/SAT/analysis.py", line 99, in _generate_commute_count_plot
sns.barplot(x=data.index.to_period('M'),
File "/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/seaborn/categorical.py", line 3147, in barplot
plotter = _BarPlotter(x, y, hue, data, order, hue_order,
File "/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/seaborn/categorical.py", line 1616, in __init__
self.establish_colors(color, palette, saturation)
File "/home/jh/Strava-Analysis-Tool/venv/lib/python3.8/site-packages/seaborn/categorical.py", line 316, in establish_colors
lum = min(light_vals) * .6
ValueError: min() arg is an empty sequence
Steps to reproduce
Migrate to HERE's new platform and remove calls to the now deprecated 'Data Hub'. The tool should create a catalog and Interactive Mapping Layer (IML) to store all uploaded geospatial activity data.
For now, the tool will continue to use a naive / wasteful uploading approach by always uploading all the activities (even if they already exist in the IML).
The tool should be refactored in accordance with the naming conventions, formatting, and file structure outlined in PEP 8 - Style Guide for Python Code.
Add an option to generate bar plots of the following:
Until recently, the HERE CLI was the easiest way to upload geospatial data to a HERE XYZ space for viewing. Earlier this year, HERE began development of a native Python library for HERE XYZ, xyz-spaces-python, which should make both development and user installation much easier. The tool should be updated to replace usage of the HERE CLI with xyz-spaces-python with basic feature parity.
Add an option to generate a bar plot of mean activity distance over time for each activity type in the dataset.
The tool currently generates plots using a hard-coded and inconsistent set of colours. A colour palette should be defined and used when generating plots to create a consistent visual theme.
The tool currently updates the activity data file with any new activities, but does not store any changes made to existing ones. This situation is prone to occur whenever an activity is modified through the Strava platform - for instance, when a follower gives kudos or the athlete adds a description - after it has already been stored locally in the file.
To allow the activity data file to always reflect these latest changes, the tool should have an option to 'refresh' the data by wiping the file and re-requesting it from scratch.
Add an option to generate a histogram of activity distance for each activity type in the dataset.
When generating a plot of mean activity distance over time, the tool should ignore activity types that are stationary by nature (CrossFit, rock climbing, weight training, workout, and yoga). This prevents the activities from being included in the plot legend and displaying a distance of 0, which provides no useful information and clutters up the plot.
Add an option to generate a plot of top commutes based on their name.
There is currently no easy way to see an overview of the 'health' of the activity dataset (e.g. how many activities are manual or are flagged) or how many activities contain extended sensor data (e.g. heart rate, cadence, or measured power). The tool should display the following information in a similar format to the summary statistics:
Add an option to generate a stacked bar plot or area plot of commute vs. non-commute riding distance by month.
Fix the incorrect average speed calculation reported in the summary statistics. The current calculation is as follows:
Mean speed (m/s) * 3.6
Which is incorrect because it calculates the mean of the mean speed reported for each activity. It should instead be:
Total distance (m) / total moving time (sec) * 3.6
Add an option to generate a plot of ride distance vs. bike used by month for each activity type in the dataset.
The tool currently generates a .geojson
file containing the geospatial activity data and passes it to the add_features_geojson
function in the xyz-spaces-python
package to upload the data to HERE Studio. This is a holdover from the HERE CLI that was previously being used, with which a file was the only way to upload data to the server.
xyz-spaces-python
allows data to be uploaded directly using a GeoPandas
dataframe instead using the add_features_geopandas
function. This stops the tool from having to generate a .geojson
file unnecessarily, which should significantly reduce overall processing and uploading times.
The option to generate a .geojson
file should be broken out into a separate command line argument, so that it is only created when requested by the user.
Add an option to generate a plot of reverse-geocoded activity start locations (country + city).
Create a setup guide for HERE XYZ Studio describing the basic procedure for:
The tool currently relies on a mix of environment variables (for the Strava client ID and secret) and hard-coded values (e.g. file paths) for configuration, which somewhat restricts its flexibility. These configurable values should be imported via a user-modifiable file instead.
Summary
When generating commute statistics from a set of activity data that contains no activities marked as a commute on Strava (i.e. only activities with the attribute "commute" = false
), the tool displays an empty DataFrame instead of indicating that no commutes are present.
Steps to reproduce
Observed behaviour
The tool displays an empty DataFrame:
Commute statistics:
Empty DataFrame
Columns: []
Index: []
Expected behaviour
The tool displays a message indicating that no commutes are present in the activity data:
Analysis: No commutes found
Update the tool to run on Python 3.8 and generate the necessary requirements.txt
(for Pip) and environment.yml
(for Anaconda) files.
Add an option to generate a heatmap plot of activity moving time by month for each activity type in the dataset.
Store the Strava activity data directly from a pandas DataFrame
to a file in JSON format using the built-in pandas to_json
and read_json
methods. This should help reduce loading and processing times, which will better support larger datasets and allow additional metadata (e.g. addresses obtained by reverse geocoding) to be stored without impacting the responsiveness of the tool.
Delete all the features (activities) stored in the HERE interactive mapping layer (IML) when the --refresh-data
option is selected. This ensures that the activity data stored on HERE is always up to date.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.