trajminer / trajminer Goto Github PK
View Code? Open in Web Editor NEWTrajectory Mining Library
Home Page: http://trajminer.github.io
License: MIT License
Trajectory Mining Library
Home Page: http://trajminer.github.io
License: MIT License
Adapt the haversine distance function to support two arrays of locations as input, so that the paired distances are computed similarly to how the Euclidean distance works.
Fix LCSS and EDR so that they do not store the whole matrix of distances/similarities between points.
For distances within cities, the Manhattan distance may be a more accurate approximation for trajectory lengths due to the rectangular shapes of the blocks.
There's a challenge, though: the streets may not be aligned with the x/y axes and the Manhattan distance would be distorted if computed without applying a rotation first.
Example: New York City
Notice how the components of the Manhattan distance would match the distance across the streets most of the times if we applied a counter-clockwise rotation of approximately 30ΒΊ on the image.
This could be done if the user provided a base vector, which indicates the direction of a straight line on the map. As default, the base vector could be simply (0, 1)
(north).
Implement some pre-defined distance functions to be used in trajectory similarity measures. For instance:
Add Foursquare NYC and Tokyo datasets from https://sites.google.com/site/yangdingqi/home/foursquare-dataset.
It is not really straightforward to input thresholds, distance functions, and weights to similarity measures considering only the order of the attributes in the dataset. Some suggestions of improvement include:
Add the code of MUITAS, recently accepted paper in Transactions in GIS.
Please see all reported quality issues at https://www.codacy.com/app/trajminer/trajminer?utm_source=github.com&utm_medium=referral&utm_content=trajminer/trajminer&utm_campaign=Badge_Grade
Fix circle.ci configuration for running tests.
Implement the agglomerative clustering algorithm. See:
Implement utility for computing statistics from datasets, such as:
Identify longitude and latitude (or x and y) columns and treat them as a single attribute when loading a trajectory dataset.
Include samples of well-known trajectory datasets in the library. See https://scikit-learn.org/stable/datasets/index.html for reference.
Add CONTRIBUTE file containing guidelines for contributions.
Implement the K-Medoids clustering algorithm. See:
Currently, CSVTrajectoryLoader loads to memory all the columns in the input file. It would be better to load only the columns required by the user.
Implement tests, especially for similarity measures.
The docs of the CSVTrajectoryLoader class are not very clear about how the latitude and longitude attributes of datasets are loaded. Maybe we need to change the code/docs to make it more user friendly.
See the works for Point of Interest recommendation (or also next location prediction):
Adjust other functions and classes accordingly (trajectory stats, segmentation, etc).
Hi! I am interested in using your library for analyzing animal trajectories. However, I haven't been able to load my trajectory CSVs using "CSVTrajectoryLoader". I suspect I am not loading a correctly formatted file. Could I see and example file you used to test the loader function?
Thank you for all your work in this library!
Implement utility for segmenting trajectories based on given columns:
Implement the DBSCAN clustering algorithm.
Add NBA dataset of players from https://github.com/sealneaward/nba-movement-data
This repository looks cool :)
I'd love to participate and help, but it's kind of hard to understand what it's really about. Is it a collection of functions? A set of interconnected classes with a major purpose? How can your code help people, exactly?
Best regards.
Fix MSM to compute only half of the score matrix.
Tried to segment the Starkey dataset whenever the haversine distance between points was greater than 100m. After segmentation, the stats of the new dataset are wrong. For instance, the attribute count before segmenting was 6 (min, max, count), and after the min became 1 and the max 36.
Utility for extracting features such as speed, direction change, etc.
Implement utility with features such as:
Currently, preprocessing tools do not follow a standard API. It would be a good practice to create a wrapper with methods like fit
, transform
, set_params
, and make the appropriate changes to existing functions/objects.
Besides improving the code quality, a standard API for preprocessing tools makes it possible to design a pipeline for stacking transformations.
There's no exception handling for most of the classes in the library. It would be good to enumerate classes, methods, and parameters that need some sort of validation so that the proper exception handlers are added to them.
Calling numpy.function(obj)
is usually slower than calling obj.function()
, if possible. For instance:
$ python -m timeit -s "import numpy as np; m=np.ones((1000, 1000))" "np.transpose(m)"
500000 loops, best of 5: 419 nsec per loop
$ python -m timeit -s "import numpy as np; m=np.ones((1000, 1000))" "m.transpose()"
2000000 loops, best of 5: 128 nsec per loop
I was checking out your repo and I saw that you used the slower call a few times (e.g.: np.transpose). So I think you'd like to change those.
Implement a method (e.g. to_csv) for persisting TrajectoryData.
There could be methods for persisting data in other formats (e.g. json, kml, etc). Maybe it would be a good idea to create a method called to_file and have the data type passed as a parameter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.