Code Monkey home page Code Monkey logo

pointpats's Issues

Infrastructure

pointpats currently don't have standardised PySAL infrastructure (CI, release action) and I think that the RTD build will also fail because the setup has not been updated to recent versions of sphinx and its extensions.

Since it looks that geopandas may optionally depend on pointpats, I think this needs to be fixed.

I'll give it a first go but may leave some bits to @jGaboardi.

`README` in markdown instead of `.rst`?

While working on the Notebooks book project, I've realised that the README file for this project is structured in .rst. Is there a reason for this? If not, would it be possible to convert it markdown? I'm currently converting with pandoc but would probably be better to be consistent across de federation?

release 2.4.0

Hey,

I see that you are actively working on the new stuff (and then removing it) but I wanted to check if it would be okay to cut 2.4.0 this week? I got new Knox and plot_density in my course starting next Tuesday and I'd prefer if students just created an env in the beginning and did not have to touch it (e.g. updating pointpats before we get to point patterns).

Obviously assuming that Know will make it back before the release :)

Jenv binsize correction is incorrect.

Using the book tokyo data for user 95795770@N00, I can get envelopes for G and F, but not J. I think this means whatever shape fix used here is wrong.

import pandas, pointpats
df = pandas.read_csv('https://raw.githubusercontent.com/gdsbook/book/master/'
                     'data/tokyo/tokyo_clean.csv')
coordinates = df.query('user_id == "95795770@N00"')[['x','y']].values
pattern = pointpats.PointPattern(coordinates)
realizations = pointpats.PoissonPointProcess(pattern.window, pattern.n, samples=1000, asPP=True)

k = 40
genv = pointpats.Genv(pattern, intervals=k, realizations=realizations) # works
fenv = pointpats.Fenv(pattern, intervals=k, realizations=realizations) # works
jenv = pointpats.Jenv(pattern, intervals=k, realizations=realizations) # fails for all k>1 I've checked.

fwiw there are 419 coordinates, and I can compute the Jenv of a random pattern with 419 points.

geopandas-centric api

pointpats predates geopandas and was originally designed around (n,2) arrays of coordinates. It hasnt been updated much over time like the rest of the pysal stack, but today it's much more common to work in geodataframes rather than numpy arrays (even though you can reformat into the structure pointpats expects fairly easily). It might be nice to have something that consumes geodataframes/series and outputs the same.

e.g. instead of (in addition to?) the current weighted_mean_center which takes and returns arrays, i end up using something like

def attribute_weighted_center(gdf, column):
    if column not in gdf.columns:
        raise ValueError(
            f"`attribute` {column} was passed but is not present in the dataframe"
        )
    pt = Point(
        weighted_mean_center(
            df.centroid.get_coordinates()[["x", "y"]].values,
            df[column].fillna(0).values,
        )
    )
    return gpd.GeoSeries([pt], crs=gdf.crs)

would that be of general use? If so,

  • do we want to allow the existing functions to take/return multiple types (probably not? but we do kind of have that pattern with Graph that can accept geodataframes or arrays).
  • do we want a complementary _from_gdf set of functions or something?

Enhancement wishlist

This is a running list of enhancements burbling up of when writing the gdsbook/book chapter on point pattern analysis.

additional hulling measures

  • hull() should be general, with a type argument that supports alpha shapes, minimum bounding rectangles, minimum area rectangles, and minimum bounding circles. This would require putting a dedicated convex_hull function elsewhere. Since we could set hull's default to be the convex hull, this can actually be made API-transparent 😄
  • mbr should be expanded to minimum_bounding_rectangle
  • we should make an import-safe minimum_area_rectangle and borrow/import the opencv implementation of the minimum area rectangle.
  • we should rename skyum to minimum_bounding_circle, or use it as a fallback if we cannot use opencv's minimum bounding circle
  • docstring for skyum needs to be brought in line with other functions/classes (my bad 😥)

API consistency

  • Elsewhere in the library, we avoid exposing everything at root now. Here, though, everything in centrography is available directly in pointpats... is this intentional?
  • skyum should return in the same manner as ellipse. I think they both should give some kind of namedtuple return value that has (center, radius) or (center, semimajor, semiminor, rotation).

Performance

  • mbr does a loop in python through all points and explicitly finds the minimum. We should probably use a sort for a faster solution?
  • skyum can be easily numba-ized, but I'm not sure if it'd matter much.

New statistics

  • local K/F functions

poisson point process with delaunay missing first triangle

was looking at the code and found an unexpected behaviour here:

return shape.find_simplex((x, y)) > 0

I'd expect that it should check for simplex >= 0.

example:

from pointpats.random import poisson
from scipy.spatial import delaunay_plot_2d, Delaunay, ConvexHull
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(10)
points = rng.uniform(0, 10, size=(30, 2))
hull = ConvexHull(points)
vxs = hull.points[hull.vertices]
deln = Delaunay(vxs)

ppp = poisson(hull, size=(100, 1))
fig, ax = plt.subplots(1, 1)
ax.scatter(
    points[..., 0],
    points[..., 1],
    edgecolor="b",
    facecolor="none",
    alpha=0.5,
)
ax.scatter(ppp[:, 0], ppp[:, 1], edgecolor="r", facecolor="none", alpha=0.5)
delaunay_plot_2d(deln)

image

looks like the first triangle is never covered, cause it is indexed as 0:

deln.find_simplex((2,8))
>>> array(0, dtype=int32)

intensity estimates for new ripley functions are very incorrect sometimes

I noticed this problem when writing tests for the new numpy-based performant Ripley statistics. When using a shapely polygon's convex hull, I was getting basically the correct intensity, but when I was using scipy, I was getting extreme overestimates of the intensity.

I narrowed it down to an issue with how scipy.spatial.ConvexHull.area works scipy/scipy#12290.

Depending on the fix there we need to either:

  1. move to using hull.volume when getting the area from a scipy convex hull if upstream decides to adopt docstring clarifications
  2. keep using area if upstream decides to adopt the proposed changes for area/volume of 2-d shapes.

Simulation Envelopes, Low & High

The way the low and high functions are calculated, https://github.com/pysal/pointpats/blob/master/pointpats/distance_statistics.py#L627, it appears to be confusing since it depends on the pct parameter whereas it shouldn't. If pct=1 then the low and high envelopes are the same as can be seen below.

realizations = PoissonPointProcess( spp.window, spp.n, 10, asPP = True)
kenv = Kenv( spp, intervals = 20, realizations = realizations, pct = 1)
kenv.plot()

image

If pct = 0.05 having the same realizations, then,

image

If there is an intention to incorporate some sort of p - value in the graphs, then from what I know this p - value is determined exclusively by the amount realizations and nothing else. Finally, I would suggest to have an assertion about pct valid values.

Thanks,

Vasilis

L or other distance modules are not recognized

Hello,

Thank you very much for creating this package, it is amazing, but I have encountered a problem with it. I imported it as said in the documentation, I checked the requirements, all packages are satisfying them, and I am working with Python 3.7.7.. But still while calling pointpats.L(pp) Python says that there is no attribute L. Have someone encountered this problem? I have attached my import section and the code (not very long). Thank you for your help.

Greetings
Bartosz

import scipy.spatial import pointpats import libpysal as ps

pp=pointpats.PointPattern(points) L_Test=pointpats.L(pp)

Result:

`L_Test=pointpats.L(pp) :

AttributeError: module 'pointpats' has no attribute 'L'`

PoissonPointProcess - ValueError: Length mismatch

Hi, I am occasionally running into the following error (I am not sure yet what causes it exactly ...). Any ideas?

pointpats.version = '2.2.0'
pandas.version = '1.1.3'

random_samples_lambda = PoissonPointProcess(points.window, points.n, conditioning=True,  samples=1000, asPP=True)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pointpats/process.py", line 249, in __init__
    super(PoissonPointProcess, self).__init__(window, n, samples, asPP)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pointpats/process.py", line 113, in __init__
    self.realizations[sample] = PP(points, window=self.window)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pointpats/pointpattern.py", line 90, in __init__
    self.df.columns = col_names
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pandas/core/generic.py", line 5152, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pandas/core/generic.py", line 564, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 227, in set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements

Building docs on 3.11 is broken

± make github

make[1]: Entering directory '/home/serge/para/1_projects/code-pysal-pointpats/pointpats/docs'

Running Sphinx v7.2.6



Exception occurred:

  File "/home/serge/miniconda3/envs/pointpats/lib/python3.11/dataclasses.py", line 815, in _get_field

    raise ValueError(f'mutable default {type(f.default)} for field '

ValueError: mutable default <class 'pybtex.richtext.Text'> for field other is not allowed: use default_factory

Related?

ripley.py vs distance_statistics.py duplication?

Hey,

I wanted to remove old deprecated classes Ripley's functions that are now in _deprecated_distance_statistics.py`.

While doing that, I noticed that the new implementation seems to live in distance_statistics.py but then we have nearly exactly the same code also in ripley.py.

distance_statistics.py has more complete docstrings and is included in API docs, while ripley.py has some additional utils.

Is it safe to remove ripley.py from the codebase? It is not tested (0% coverage) but it is used in the distance_statistics-numpy-oriented.ipynb notebook.

The distance_statistics.ipynb notebooks will also need to be updated to point to new functions from the deprecated classes.

@ljwolf I suppose you will know the answers. I thought that the contents of ripley.py has been moved to distance_statistics.py and never removed from the original location but there are more recent comments to ripley.py than to distance_statistics.py so I am a bit confused.

Deprecated statistics included in the manual

Hello,

thanks for the great package!

I just had to struggle a bit figuring out how to make it work according to the manual:
[https://pointpats.readthedocs.io/_/downloads/en/v2.0.0]

The manual in the "Distance Based Statistics" chapter, as well as most of the examples online that I found, still include the classes present in the "_deprecated_distance_statistics.py" file that is not in the init.py script.
I included the:
"from ._deprecated_distance_statistics import *"
of that package to make it work with the deprecated functions

Do you suggest using the new distance functions that are in the script "distance_statistics.py" even if not indicated in the manual or it is the same?

Thanks!

Qstatistic.plot does not return axes nor take axes

It's challenging to use QStatistic.plot alongside contextily, since the method

  1. does not allow you to pass an ax
  2. it constructs an axis (in self.pp.plot()), rather than requesting an axis using plt.gca()
  3. it calls ffigure.show() rather than returning figure/axis components that can be modified after plotting.

This means we can't add basemaps, as far as I can tell.

How to calculate areas of grid in Quadrat Statistics?

Thank you for developing this excellent package. I have a question regarding the generation of a 3x3 grid based on decimal degree latitude and longitude. Is there a method available to calculate the area of each grid cell?
q_r = qs.QStatistic(pp_juv,shape= "rectangle",nx = 3, ny = 3)

Thanks

`seed` keyword for random distributions?

I would like to gauge interest in adding a seed keyword for the random distributions in random.py. While results are reproducible by setting a seed manually prior to call a distribution function, it would be beneficial to do so in a more automated fashion. With the initial inclusion of pointpats random distribution in geopandas (geopandas/geopandas#2860), the may be of particular use. Within the sample_points() method over there, a seed keyword can be passed in, but is not used when generating points with pointpats.

cc @martinfleis @ljwolf @knaaptime

Kenv is very memory inefficient

Running Kenv on the data as in #51, I exhaust 16GB of ram & peg my CPU at n=419.

I believe this is because of the memory pressure exerted in kdtree.query_pairs

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di)) * 2 / den ) for di in d])

All told, this is a very inefficient way to compute the K function. Each iteration has to re-discover the neighbors from every previous iteration. At the maximum distance, you have to repeatedly re-discover nearly the entire dataset. Doing this for simulations means this is done for thousands of simulations.

The documentation for scipy.spatial.KDTree explicitly warns against using it for this purpose:

The tree also supports all-neighbors queries, both with arrays of points and with other kd-trees. These do use a reasonably efficient algorithm, but the kd-tree is not necessarily the best data structure for this sort of calculation.

A better algorithm would:

  1. compute the distance matrix for the input. (too big for memory? bootstrap it. also, doesn't even have to be the squareform distance matrix)
  2. find the number of distances less than the cutoff ((distances < d).sum(axis=1) using numpy vectorization & broadcasting) and multiply by 2 (?) to capture pairs

pointpats on conda-forge fails pip check

Conda-forge now recommends adding pip check in the recipe. The pointpats package on the conda-forge channel fails pip check with an error pointpats 2.2.0 requires opencv-contrib-python, which is not installed. This is causing issues for all downstream packages that might be submitted to the conda-forge channel in the future. It appears that opencv-contrib-python in the setup.py, but it is not available on the conda-forge channel.

https://github.com/conda-forge/pointpats-feedstock/blob/b433cecfc45129bae5fe956f11f0b7b20efa339d/recipe/meta.yaml#L21

Question on Ripleys stats

Hi,

Just discovered this really cool package!
I have a question regarding the implementation of Ripleys statistics. What I am interested in is computing e.g. Ripley's K for n subsets of points that are sampled from the same area. To do this, I am interested in computing the moving distance threshold across the full area, and not only restricting it on points subsets.
For instance, with pointpats I would do this:

import numpy as np
from sklearn.metrics import pairwise_distances
import seaborn as sns

from pointpats import ripley
from astropy.stats import RipleysKEstimator

sample = np.random.uniform(low=5, high=10, size=(300, 2))
sub1 = sample[0:100,:]
sub2 = sample[100:300,:]

dist1 = pairwise_distances(sub1, metric="euclidean")
dist2 = pairwise_distances(sub2, metric="euclidean")

rip1 = ripley.k_function(sub1, distances=dist1, support=10,)
rip2 = ripley.k_function(sub2, distances=dist2, support=10,)

sns.lineplot(
    rip1[0],
    rip1[1],
)
sns.lineplot(
    rip2[0],
    rip2[1],
)

image

But as you can see the statistics is truncated for one subset. The astropy implementation seems to account for this, and allow to pass the full area.
E.g.

Kest = RipleysKEstimator(area=25, x_max=10, y_max=10, x_min=5, y_min=5)
r = np.linspace(0, 2.5, 100)

sns.lineplot(
    r, 
    Kest(data=sub1, radii=r, mode='none')
)
sns.lineplot(
    r, 
    Kest(data=sub2, radii=r, mode='none')
)

image

I saw that in the pointpats implementation there is a "hull" argument that could be used for that? If not, is it maybe planned to allow such operation, or is it not good practice? (am really naive wrt spatial stats, sorry)

Also, I was wondering whether it's planned to implement the edge corrections features for Ripleys stats.

Thank you!

Giovanni

skyum raises NameError when numba not installed

Calling pointpaths.skyum raises NameError if the user does not have Numba installed. This is caused by a check for Numba in centrography.py on lines 402-405

try:
    from numba import njit, boolean

    HAS_NUMBA = True

which, if the numba import fails, leaves HAS_NUMBA undefined.

Suggested fix: define HAS_NUMBA = False in the except ModuleNotFoundError statement that begins on line 445.

Release v2.3.0

  • bump version to v2.3.0
  • git tag and trigger github actions for package release and github page building
  • update zenodo citation in README

Z coordinates and custom study areas?

Hey,

Your package is awesome, I'm just missing the feature of using z coordinates. Is this planned for the future?

Here is an implementation of Ripley's K that uses xyz coordinates, I wish we could join the best of both packages!

Also, how do your functions deal with study areas for Monte Carlo Simulations? Do they assume a size and shape? Is it possible to give a custom one?

Cheers,
Ricardo

HAS_NUMBA left undefined

In centrography.py, there's a check for if the numba module is available:

try:
    from numba import njit, boolean

    HAS_NUMBA = True

Etc. But if numba isn't installed, then HAS_NUMBA is undefined, which can cause problems when using functions, e.g. minimum_bounding_circle.

I'm not sure, but potentially the fix is as simple as adding HAS_NUMBA = False in the Exception to the above try statement:

except ModuleNotFoundError:

    def njit(func, **kwargs):
        return func

Handle SciPy and libpysal deprecations

The CI is currently showing a deprecations from SciPy, libpysal and a few others that would be good to tackle. See https://github.com/pysal/pointpats/actions/runs/4287451301/jobs/7468323178

/Users/runner/work/pointpats/pointpats/pointpats/geometry.py:134: DeprecationWarning: Please use `ConvexHull` from the `scipy.spatial` namespace, the `scipy.spatial.qhull` namespace is deprecated.
/Users/runner/micromamba-root/envs/test/lib/python3.11/site-packages/libpysal/cg/shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.

Ripley's K function calculation in PointPats:

Dear Professor Rey,

Our names are Max Ratzenboeck and Michael Ecker, MA SCM students at WU Vienna University of Economics and Business. Currently we are writing our master thesis about „Open software and open data for geospatial point pattern analysis in Supply Chain Management“ supervised by Prof. Petra Staufer-Steinnocher (Deputy Head, Institute for Economic Geography and GIScience,; Associate Editor, Journal of Geographical Systems). As the focus of our thesis is on open software, we are using Python, especially your packages pysal/libpysal in order to conduct the analysis. Regarding this, some questions occurred during the usage of PySAL which we allow ourselves to ask:

To calculate Ripley’s K function, your script „distance_statistics.py“ (pysal/pointpats, version 2.1.0, also attached to this email, changed/notes inserted at lines 478-482) is used where we got some strange error regarding the output: The values only seemed to be exactly 1/4 of the results of our manual calculations and calculations in R. This lead us to inspecting the code and we changed.

den = pp.lambda_window * pp.n * [2]
distance_statistics.txt

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))/den) for di in d])

to

note: changed original code

den = pp.lambda_window * pp.n

den = pp.lambda_window * pp.n *2

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))*2/den) for di in d])

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))/den) for di in d])

which then delivered exactly the results what we got in other calculations. Our question is: Is our way of thinking correct and/or was there a bug in your source code or is your approach a different? If your approach is different, can you please tell us the literature in order that we understand this issue better?

A second question about KDE will be posted in a different post.

Thanks for your help in advance.

Documentation for `pointpats` version 2.2.0 in Google Colab

Hi,

I noticed that Google Colab imports by default version 2.2.0 of pointpats. However, I couldn't find any documentation for that version on the pysal site. Could this default be changed? Or is there anywhere where I can find the corresponding documentation?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.