The trajminer from trajminer

Adapt haversine distance function to support arrays of points

Adapt the haversine distance function to support two arrays of locations as input, so that the paired distances are computed similarly to how the Euclidean distance works.

Reduce space complexity (memory consumption) of LCSS and EDR

Fix LCSS and EDR so that they do not store the whole matrix of distances/similarities between points.

Manhattan distance (adjusted)

For distances within cities, the Manhattan distance may be a more accurate approximation for trajectory lengths due to the rectangular shapes of the blocks.

There's a challenge, though: the streets may not be aligned with the x/y axes and the Manhattan distance would be distorted if computed without applying a rotation first.

Example: New York City

Notice how the components of the Manhattan distance would match the distance across the streets most of the times if we applied a counter-clockwise rotation of approximately 30º on the image.

This could be done if the user provided a base vector, which indicates the direction of a straight line on the map. As default, the base vector could be simply (0, 1) (north).

Implement sample distance functions for similarity measure attributes

Implement some pre-defined distance functions to be used in trajectory similarity measures. For instance:

Discrete (lambda x, y: 0 if x == y else 1);
Euclidean
Haversine
Etc

Add Foursquare datasets

Add Foursquare NYC and Tokyo datasets from https://sites.google.com/site/yangdingqi/home/foursquare-dataset.

Change similarity measures for better handling attribute thresholds and distance functions

It is not really straightforward to input thresholds, distance functions, and weights to similarity measures considering only the order of the attributes in the dataset. Some suggestions of improvement include:

Passing thresholds, distances, and weights as dictionaries, in which the keys are the attribute names;
Ignoring attributes for which no threshold/distance/weight has been given. Maybe a warning could be issued in this case, as there's a good chance that the user forgot to define something.

Add tool for plotting trajectory data on map

Add a utility for plotting data on a map. Maybe a heatmap as well. See which library/tool is the best option.

Add MUITAS similarity measure

Add the code of MUITAS, recently accepted paper in Transactions in GIS.

Implement UMS similarity

See:

https://www.tandfonline.com/doi/abs/10.1080/13658816.2017.1372763?casa_token=Q1cWMruMn24AAAAA:A5iA4IjPLRkFCJ8cZvhcpJf2ElCM3Tv8SoWkgjUXLp2dlD7vg-ujx3VbB2CLY0wfZBbgoQ-0O0Ty

Implement "Revealing the physics of movement: Comparing the similarity of movement characteristics of different types of moving objects" from [Dodge et al., 2009]

See:

https://www.sciencedirect.com/science/article/pii/S0198971509000556

Improve code quality

Please see all reported quality issues at https://www.codacy.com/app/trajminer/trajminer?utm_source=github.com&utm_medium=referral&utm_content=trajminer/trajminer&utm_campaign=Badge_Grade

Configure circle.ci build and test run

Fix circle.ci configuration for running tests.

Implement agglomerative clustering

Implement the agglomerative clustering algorithm. See:

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering

Implement utility for computing statistics from trajectory datasets

Implement utility for computing statistics from datasets, such as:

number of points
number of trajectories
number of classes
average trajectory length
average number of trajectories per class
etc

Create new landing page for the library and docs

Add treatment for spatial attributes when loading a trajectory dataset

Identify longitude and latitude (or x and y) columns and treat them as a single attribute when loading a trajectory dataset.

Add sample datasets

Include samples of well-known trajectory datasets in the library. See https://scikit-learn.org/stable/datasets/index.html for reference.

Add examples of use of existing library features

Implement "Incorporating Duration Information for Trajectory Classification" from [Patel et al., 2012]

See:

https://ieeexplore.ieee.org/abstract/document/6228162/

Add contribute guide

Add CONTRIBUTE file containing guidelines for contributions.

Implement K-Medoids clustering

Implement the K-Medoids clustering algorithm. See:

Fix CSVTrajectoryLoader to load only required columns in memory

Currently, CSVTrajectoryLoader loads to memory all the columns in the input file. It would be better to load only the columns required by the user.

We need tests!

Implement tests, especially for similarity measures.

Improve docs/code of CSVTrajectoryLoader regarding lat/lon

The docs of the CSVTrajectoryLoader class are not very clear about how the latitude and longitude attributes of datasets are loaded. Maybe we need to change the code/docs to make it more user friendly.

Add methods for POI recommendation

See the works for Point of Interest recommendation (or also next location prediction):

Implement KNN Classifier with similarity measure

Implement other loaders for trajectory data

Load a folder containing files, where each file corresponds to a trajectory in the CSV-like format.
Load data from a JSON file.

Create object for storing loaded trajectory data

Adjust other functions and classes accordingly (trajectory stats, segmentation, etc).

Example CSV

Hi! I am interested in using your library for analyzing animal trajectories. However, I haven't been able to load my trajectory CSVs using "CSVTrajectoryLoader". I suspect I am not loading a correctly formatted file. Could I see and example file you used to test the loader function?

Thank you for all your work in this library!

Implement trajectory segmentation utility

Implement utility for segmenting trajectories based on given columns:

Segment every time a value changes;
Define a threshold for the difference tolerated in the segmentation process.

Add USA AIS dataset

See:
https://marinecadastre.gov/ais/

Implement DBSCAN clustering

Implement the DBSCAN clustering algorithm.

Add NBA players trajectory dataset

Add NBA dataset of players from https://github.com/sealneaward/nba-movement-data

Question about the repository

This repository looks cool :)

I'd love to participate and help, but it's kind of hard to understand what it's really about. Is it a collection of functions? A set of interconnected classes with a major purpose? How can your code help people, exactly?

Best regards.

Optimize MSM similarity

Fix MSM to compute only half of the score matrix.

TrajectorySegmenter does not work as expected

Tried to segment the Starkey dataset whenever the haversine distance between points was greater than 100m. After segmentation, the stats of the new dataset are wrong. For instance, the attribute count before segmenting was 6 (min, max, count), and after the min became 1 and the max 36.

Implement feature extraction utility for raw trajectories

Utility for extracting features such as speed, direction change, etc.

Implement algorithm(s) for stop and moves detection

See:

Implement TraClus algorithm for raw trajectory clustering

See:

https://www.ideals.illinois.edu/bitstream/handle/2142/11301/Trajectory%20Clustering%20A%20Partition-and-Group%20Framework.pdf?sequence=2&isAllowed=y

Implement utility for filtering/preprocessing trajectories

Implement utility with features such as:

Remove duplicate trajectory points (e.g. equal attributes within a time interval);
Remove noise points based on different criteria (e.g. an abrupt speed increase);
Remove trajectories that are too short or too long;
Etc.

Implement MOVELETS for trajectory classification

See

https://dl.acm.org/citation.cfm?id=3167225

Implement trajectory data loader

Implement TraClass algorithm for raw trajectory classification

See:

http://www.vldb.org/pvldb/1/1453972.pdf

Create standard interface for filters and transformations

Currently, preprocessing tools do not follow a standard API. It would be a good practice to create a wrapper with methods like fit, transform, set_params, and make the appropriate changes to existing functions/objects.

Besides improving the code quality, a standard API for preprocessing tools makes it possible to design a pipeline for stacking transformations.

Handle exceptions/validate parameters of most of the methods

There's no exception handling for most of the classes in the library. It would be good to enumerate classes, methods, and parameters that need some sort of validation so that the proper exception handlers are added to them.

Implement Geohash codification for lat/lon

See:

https://en.wikipedia.org/wiki/Geohash

Add method for mining trajectory sequential patterns

See:

https://dl.acm.org/citation.cfm?id=1281230

Add support for inputting distance matrix directly in fit_predict() clustering

Optimize numpy calls

Calling numpy.function(obj) is usually slower than calling obj.function(), if possible. For instance:

$ python -m timeit -s "import numpy as np; m=np.ones((1000, 1000))" "np.transpose(m)"
500000 loops, best of 5: 419 nsec per loop
$ python -m timeit -s "import numpy as np; m=np.ones((1000, 1000))" "m.transpose()"
2000000 loops, best of 5: 128 nsec per loop

I was checking out your repo and I saw that you used the slower call a few times (e.g.: np.transpose). So I think you'd like to change those.

Implement persistence method for TrajectoryData

Implement a method (e.g. to_csv) for persisting TrajectoryData.
There could be methods for persisting data in other formats (e.g. json, kml, etc). Maybe it would be a good idea to create a method called to_file and have the data type passed as a parameter.

trajminer / trajminer Goto Github PK

trajminer's People

Contributors

Stargazers

Watchers

Forkers

trajminer's Issues

Recommend Projects

Recommend Topics

Recommend Org