pysal / pointpats Goto Github PK

Planar Point Pattern Analysis in PySAL

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 97.31% Python 2.69%

pointpats's Issues

Infrastructure

pointpats currently don't have standardised PySAL infrastructure (CI, release action) and I think that the RTD build will also fail because the setup has not been updated to recent versions of sphinx and its extensions.

Since it looks that geopandas may optionally depend on pointpats, I think this needs to be fixed.

I'll give it a first go but may leave some bits to @jGaboardi.

Move to shapely 2 for minimum bounding rectangle (rotated) and minimum bounding circle

This would address #70, #69

`README` in markdown instead of `.rst`?

While working on the Notebooks book project, I've realised that the README file for this project is structured in .rst. Is there a reason for this? If not, would it be possible to convert it markdown? I'm currently converting with pandoc but would probably be better to be consistent across de federation?

release 2.4.0

Hey,

I see that you are actively working on the new stuff (and then removing it) but I wanted to check if it would be okay to cut 2.4.0 this week? I got new Knox and plot_density in my course starting next Tuesday and I'd prefer if students just created an env in the beginning and did not have to touch it (e.g. updating pointpats before we get to point patterns).

Obviously assuming that Know will make it back before the release :)

Tree types is missing now from Ripley.py

from Giulia, the TREE_TYPES needs to be imported into ripley.py

Cannot import name 'G' from 'pointpats' (/usr/local/lib/python3.7/dist-packages/pointpats/init.py)

Hi, I have an issue. I am working Google Colab and when I try to import 'G' module from 'pointpats' there is an error. I do not have idea why this is happening. Can you help me please?

hexbins are very slow for quadrat plots

Is it just me, or is the hexagonal binning very slow to plot?

Jenv binsize correction is incorrect.

Using the book tokyo data for user 95795770@N00, I can get envelopes for G and F, but not J. I think this means whatever shape fix used here is wrong.

import pandas, pointpats
df = pandas.read_csv('https://raw.githubusercontent.com/gdsbook/book/master/'
                     'data/tokyo/tokyo_clean.csv')
coordinates = df.query('user_id == "95795770@N00"')[['x','y']].values
pattern = pointpats.PointPattern(coordinates)
realizations = pointpats.PoissonPointProcess(pattern.window, pattern.n, samples=1000, asPP=True)

k = 40
genv = pointpats.Genv(pattern, intervals=k, realizations=realizations) # works
fenv = pointpats.Fenv(pattern, intervals=k, realizations=realizations) # works
jenv = pointpats.Jenv(pattern, intervals=k, realizations=realizations) # fails for all k>1 I've checked.

fwiw there are 419 coordinates, and I can compute the Jenv of a random pattern with 419 points.

geopandas-centric api

pointpats predates geopandas and was originally designed around (n,2) arrays of coordinates. It hasnt been updated much over time like the rest of the pysal stack, but today it's much more common to work in geodataframes rather than numpy arrays (even though you can reformat into the structure pointpats expects fairly easily). It might be nice to have something that consumes geodataframes/series and outputs the same.

e.g. instead of (in addition to?) the current weighted_mean_center which takes and returns arrays, i end up using something like

def attribute_weighted_center(gdf, column):
    if column not in gdf.columns:
        raise ValueError(
            f"`attribute` {column} was passed but is not present in the dataframe"
        )
    pt = Point(
        weighted_mean_center(
            df.centroid.get_coordinates()[["x", "y"]].values,
            df[column].fillna(0).values,
        )
    )
    return gpd.GeoSeries([pt], crs=gdf.crs)

would that be of general use? If so,

do we want to allow the existing functions to take/return multiple types (probably not? but we do kind of have that pattern with Graph that can accept geodataframes or arrays).
do we want a complementary _from_gdf set of functions or something?

Enhancement wishlist

This is a running list of enhancements burbling up of when writing the gdsbook/book chapter on point pattern analysis.

additional hulling measures

hull() should be general, with a type argument that supports alpha shapes, minimum bounding rectangles, minimum area rectangles, and minimum bounding circles. This would require putting a dedicated convex_hull function elsewhere. Since we could set hull's default to be the convex hull, this can actually be made API-transparent 😄
mbr should be expanded to minimum_bounding_rectangle
we should make an import-safe minimum_area_rectangle and borrow/import the opencv implementation of the minimum area rectangle.
we should rename skyum to minimum_bounding_circle, or use it as a fallback if we cannot use opencv's minimum bounding circle
docstring for skyum needs to be brought in line with other functions/classes (my bad 😥)

API consistency

Elsewhere in the library, we avoid exposing everything at root now. Here, though, everything in centrography is available directly in pointpats... is this intentional?
skyum should return in the same manner as ellipse. I think they both should give some kind of namedtuple return value that has (center, radius) or (center, semimajor, semiminor, rotation).

Performance

mbr does a loop in python through all points and explicitly finds the minimum. We should probably use a sort for a faster solution?
skyum can be easily numba-ized, but I'm not sure if it'd matter much.

New statistics

local K/F functions

poisson point process with delaunay missing first triangle

was looking at the code and found an unexpected behaviour here:

pointpats/pointpats/ripley.py

Line 129 in 0ca9085

return shape.find_simplex((x, y)) > 0

I'd expect that it should check for simplex >= 0.

example:

from pointpats.random import poisson
from scipy.spatial import delaunay_plot_2d, Delaunay, ConvexHull
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(10)
points = rng.uniform(0, 10, size=(30, 2))
hull = ConvexHull(points)
vxs = hull.points[hull.vertices]
deln = Delaunay(vxs)

ppp = poisson(hull, size=(100, 1))
fig, ax = plt.subplots(1, 1)
ax.scatter(
    points[..., 0],
    points[..., 1],
    edgecolor="b",
    facecolor="none",
    alpha=0.5,
)
ax.scatter(ppp[:, 0], ppp[:, 1], edgecolor="r", facecolor="none", alpha=0.5)
delaunay_plot_2d(deln)

looks like the first triangle is never covered, cause it is indexed as 0:

deln.find_simplex((2,8))
>>> array(0, dtype=int32)

intensity estimates for new ripley functions are very incorrect sometimes

I noticed this problem when writing tests for the new numpy-based performant Ripley statistics. When using a shapely polygon's convex hull, I was getting basically the correct intensity, but when I was using scipy, I was getting extreme overestimates of the intensity.

I narrowed it down to an issue with how scipy.spatial.ConvexHull.area works scipy/scipy#12290.

Depending on the fix there we need to either:

move to using hull.volume when getting the area from a scipy convex hull if upstream decides to adopt docstring clarifications
keep using area if upstream decides to adopt the proposed changes for area/volume of 2-d shapes.

Simulation Envelopes, Low & High

The way the low and high functions are calculated, https://github.com/pysal/pointpats/blob/master/pointpats/distance_statistics.py#L627, it appears to be confusing since it depends on the pct parameter whereas it shouldn't. If pct=1 then the low and high envelopes are the same as can be seen below.

realizations = PoissonPointProcess( spp.window, spp.n, 10, asPP = True)
kenv = Kenv( spp, intervals = 20, realizations = realizations, pct = 1)
kenv.plot()

If pct = 0.05 having the same realizations, then,

If there is an intention to incorporate some sort of p - value in the graphs, then from what I know this p - value is determined exclusively by the amount realizations and nothing else. Finally, I would suggest to have an assertion about pct valid values.

Thanks,

Vasilis

L or other distance modules are not recognized

Hello,

Thank you very much for creating this package, it is amazing, but I have encountered a problem with it. I imported it as said in the documentation, I checked the requirements, all packages are satisfying them, and I am working with Python 3.7.7.. But still while calling pointpats.L(pp) Python says that there is no attribute L. Have someone encountered this problem? I have attached my import section and the code (not very long). Thank you for your help.

Greetings
Bartosz

import scipy.spatial import pointpats import libpysal as ps

pp=pointpats.PointPattern(points) L_Test=pointpats.L(pp)

Result:

`L_Test=pointpats.L(pp) :

AttributeError: module 'pointpats' has no attribute 'L'`

add pypi maintainer

@weikang9009 or @shaohu can you add @sjsrey as a maintainer on pypi so he can do releases there?

PoissonPointProcess - ValueError: Length mismatch

Hi, I am occasionally running into the following error (I am not sure yet what causes it exactly ...). Any ideas?

pointpats.version = '2.2.0'
pandas.version = '1.1.3'

random_samples_lambda = PoissonPointProcess(points.window, points.n, conditioning=True,  samples=1000, asPP=True)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pointpats/process.py", line 249, in __init__
    super(PoissonPointProcess, self).__init__(window, n, samples, asPP)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pointpats/process.py", line 113, in __init__
    self.realizations[sample] = PP(points, window=self.window)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pointpats/pointpattern.py", line 90, in __init__
    self.df.columns = col_names
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pandas/core/generic.py", line 5152, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pandas/core/generic.py", line 564, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/ubuntu/dj36/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 227, in set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements

usage of `intensity` within `random.py` distributions?

Within random.parse_size_and_intensity() the intensity variable is returned when called within each available distribution (poisson, normal, cluster_poisson, and cluster_normal). However, intensity does not appear to used outside parse_size_and_intensity().

Does intensity actually need to be returned from parse_size_and_intensity()?

Building docs on 3.11 is broken

± make github

make[1]: Entering directory '/home/serge/para/1_projects/code-pysal-pointpats/pointpats/docs'

Running Sphinx v7.2.6



Exception occurred:

  File "/home/serge/miniconda3/envs/pointpats/lib/python3.11/dataclasses.py", line 815, in _get_field

    raise ValueError(f'mutable default {type(f.default)} for field '

ValueError: mutable default <class 'pybtex.richtext.Text'> for field other is not allowed: use default_factory

Related?

ripley.py vs distance_statistics.py duplication?

Hey,

I wanted to remove old deprecated classes Ripley's functions that are now in _deprecated_distance_statistics.py`.

While doing that, I noticed that the new implementation seems to live in distance_statistics.py but then we have nearly exactly the same code also in ripley.py.

distance_statistics.py has more complete docstrings and is included in API docs, while ripley.py has some additional utils.

Is it safe to remove ripley.py from the codebase? It is not tested (0% coverage) but it is used in the distance_statistics-numpy-oriented.ipynb notebook.

The distance_statistics.ipynb notebooks will also need to be updated to point to new functions from the deprecated classes.

@ljwolf I suppose you will know the answers. I thought that the contents of ripley.py has been moved to distance_statistics.py and never removed from the original location but there are more recent comments to ripley.py than to distance_statistics.py so I am a bit confused.

Deprecated statistics included in the manual

Hello,

thanks for the great package!

I just had to struggle a bit figuring out how to make it work according to the manual:
[https://pointpats.readthedocs.io/_/downloads/en/v2.0.0]

The manual in the "Distance Based Statistics" chapter, as well as most of the examples online that I found, still include the classes present in the "_deprecated_distance_statistics.py" file that is not in the init.py script.
I included the:
"from ._deprecated_distance_statistics import *"
of that package to make it work with the deprecated functions

Do you suggest using the new distance functions that are in the script "distance_statistics.py" even if not indicated in the manual or it is the same?

Thanks!

(BUG) L function and its Simulation Envelope under the null (CSR) not close to 0

This issue is raised by a user of pointpats - Wei Xie.

The Simulation Envelope under the null of CSR built for the L function is nowhere near 0 (see the following figure), which is not right based on its definition.

BUG: statistics are now lower case. Notebooks and tutorials need to be updated

f98fe66

Convert notebooks for new import structure

Currently the notebooks are using imports assuming this is a contrib module. We need to convert them to use the reorganized import structure.

Qstatistic.plot does not return axes nor take axes

It's challenging to use QStatistic.plot alongside contextily, since the method

does not allow you to pass an ax
it constructs an axis (in self.pp.plot()), rather than requesting an axis using plt.gca()
it calls ffigure.show() rather than returning figure/axis components that can be modified after plotting.

This means we can't add basemaps, as far as I can tell.

operands could not be broadcast together with shapes (16,) (18,)

Hi,
Running ripley.j_test(points, support=20) give rise to a value error like below where other tests (G, F, etc.) works well:

ValueError: operands could not be broadcast together with shapes (16,) (18,)

Thanks for your consideration.

How to calculate areas of grid in Quadrat Statistics?

Thank you for developing this excellent package. I have a question regarding the generation of a 3x3 grid based on decimal degree latitude and longitude. Is there a method available to calculate the area of each grid cell?
q_r = qs.QStatistic(pp_juv,shape= "rectangle",nx = 3, ny = 3)

Thanks

CI: TypeError: float() argument must be a string or a number, not 'PointPattern'

Our CI started failing today with

TypeError: float() argument must be a string or a number, not 'PointPattern'

across all envs except minimal. https://github.com/pysal/pointpats/actions/runs/5407366393

`seed` keyword for random distributions?

I would like to gauge interest in adding a seed keyword for the random distributions in random.py. While results are reproducible by setting a seed manually prior to call a distribution function, it would be beneficial to do so in a more automated fashion. With the initial inclusion of pointpats random distribution in geopandas (geopandas/geopandas#2860), the may be of particular use. Within the sample_points() method over there, a seed keyword can be passed in, but is not used when generating points with pointpats.

cc @martinfleis @ljwolf @knaaptime

release 2.2.0 to incorporate space time interaction tests

the new release should happen before Dec 31, 2019

Kenv is very memory inefficient

Running Kenv on the data as in #51, I exhaust 16GB of ram & peg my CPU at n=419.

I believe this is because of the memory pressure exerted in kdtree.query_pairs

pointpats/pointpats/distance_statistics.py

Line 486 in bceba7f

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di)) * 2 / den ) for di in d])

All told, this is a very inefficient way to compute the K function. Each iteration has to re-discover the neighbors from every previous iteration. At the maximum distance, you have to repeatedly re-discover nearly the entire dataset. Doing this for simulations means this is done for thousands of simulations.

The documentation for scipy.spatial.KDTree explicitly warns against using it for this purpose:

The tree also supports all-neighbors queries, both with arrays of points and with other kd-trees. These do use a reasonably efficient algorithm, but the kd-tree is not necessarily the best data structure for this sort of calculation.

A better algorithm would:

compute the distance matrix for the input. (too big for memory? bootstrap it. also, doesn't even have to be the squareform distance matrix)
find the number of distances less than the cutoff ((distances < d).sum(axis=1) using numpy vectorization & broadcasting) and multiply by 2 (?) to capture pairs

Move testing to github actions

Follow libpysal

cannot import name 'G' from 'pointpats'

Hi there,

Importing G, F, J, K, L, I get an import error like below:
ImportError: cannot import name 'G' from 'pointpats'

Thanks.

pointpats on conda-forge fails pip check

Conda-forge now recommends adding pip check in the recipe. The pointpats package on the conda-forge channel fails pip check with an error pointpats 2.2.0 requires opencv-contrib-python, which is not installed. This is causing issues for all downstream packages that might be submitted to the conda-forge channel in the future. It appears that opencv-contrib-python in the setup.py, but it is not available on the conda-forge channel.

https://github.com/conda-forge/pointpats-feedstock/blob/b433cecfc45129bae5fe956f11f0b7b20efa339d/recipe/meta.yaml#L21

Question on Ripleys stats

Hi,

Just discovered this really cool package!
I have a question regarding the implementation of Ripleys statistics. What I am interested in is computing e.g. Ripley's K for n subsets of points that are sampled from the same area. To do this, I am interested in computing the moving distance threshold across the full area, and not only restricting it on points subsets.
For instance, with pointpats I would do this:

import numpy as np
from sklearn.metrics import pairwise_distances
import seaborn as sns

from pointpats import ripley
from astropy.stats import RipleysKEstimator

sample = np.random.uniform(low=5, high=10, size=(300, 2))
sub1 = sample[0:100,:]
sub2 = sample[100:300,:]

dist1 = pairwise_distances(sub1, metric="euclidean")
dist2 = pairwise_distances(sub2, metric="euclidean")

rip1 = ripley.k_function(sub1, distances=dist1, support=10,)
rip2 = ripley.k_function(sub2, distances=dist2, support=10,)

sns.lineplot(
    rip1[0],
    rip1[1],
)
sns.lineplot(
    rip2[0],
    rip2[1],
)

But as you can see the statistics is truncated for one subset. The astropy implementation seems to account for this, and allow to pass the full area.
E.g.

Kest = RipleysKEstimator(area=25, x_max=10, y_max=10, x_min=5, y_min=5)
r = np.linspace(0, 2.5, 100)

sns.lineplot(
    r, 
    Kest(data=sub1, radii=r, mode='none')
)
sns.lineplot(
    r, 
    Kest(data=sub2, radii=r, mode='none')
)

I saw that in the pointpats implementation there is a "hull" argument that could be used for that? If not, is it maybe planned to allow such operation, or is it not good practice? (am really naive wrt spatial stats, sorry)

Also, I was wondering whether it's planned to implement the edge corrections features for Ripleys stats.

Thank you!

Giovanni

skyum raises NameError when numba not installed

Calling pointpaths.skyum raises NameError if the user does not have Numba installed. This is caused by a check for Numba in centrography.py on lines 402-405

try:
    from numba import njit, boolean

    HAS_NUMBA = True

which, if the numba import fails, leaves HAS_NUMBA undefined.

Suggested fix: define HAS_NUMBA = False in the except ModuleNotFoundError statement that begins on line 445.

Update docs for 2.2.0

migrate docs over to http://pysal.org/pointpats
turn off rtd (or redirect)

distance_statistics.ipynb is corrupted

The distance_statistics.ipynb seems to be corrupted and can't be opened from neigther Github or my IDE

Release v2.3.0

bump version to v2.3.0
git tag and trigger github actions for package release and github page building
update zenodo citation in README

Z coordinates and custom study areas?

Hey,

Your package is awesome, I'm just missing the feature of using z coordinates. Is this planned for the future?

Here is an implementation of Ripley's K that uses xyz coordinates, I wish we could join the best of both packages!

Also, how do your functions deal with study areas for Monte Carlo Simulations? Do they assume a size and shape? Is it possible to give a custom one?

Cheers,
Ricardo

HAS_NUMBA left undefined

In centrography.py, there's a check for if the numba module is available:

try:
    from numba import njit, boolean

    HAS_NUMBA = True

Etc. But if numba isn't installed, then HAS_NUMBA is undefined, which can cause problems when using functions, e.g. minimum_bounding_circle.

I'm not sure, but potentially the fix is as simple as adding HAS_NUMBA = False in the Exception to the above try statement:

except ModuleNotFoundError:

    def njit(func, **kwargs):
        return func

spacetime.py requires geopandas and shapely that are not among requirements

conda-forge feedstock is failing due to missing geopandas and shapely in the dependencies while being imported on top of spacetime.py.

Are we fine with the hard dependency on geopandas here? Or shall I make those optional?

Handle SciPy and libpysal deprecations

The CI is currently showing a deprecations from SciPy, libpysal and a few others that would be good to tackle. See https://github.com/pysal/pointpats/actions/runs/4287451301/jobs/7468323178

/Users/runner/work/pointpats/pointpats/pointpats/geometry.py:134: DeprecationWarning: Please use `ConvexHull` from the `scipy.spatial` namespace, the `scipy.spatial.qhull` namespace is deprecated.

/Users/runner/micromamba-root/envs/test/lib/python3.11/site-packages/libpysal/cg/shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.

Ripley's K function calculation in PointPats:

Dear Professor Rey,

Our names are Max Ratzenboeck and Michael Ecker, MA SCM students at WU Vienna University of Economics and Business. Currently we are writing our master thesis about „Open software and open data for geospatial point pattern analysis in Supply Chain Management“ supervised by Prof. Petra Staufer-Steinnocher (Deputy Head, Institute for Economic Geography and GIScience,; Associate Editor, Journal of Geographical Systems). As the focus of our thesis is on open software, we are using Python, especially your packages pysal/libpysal in order to conduct the analysis. Regarding this, some questions occurred during the usage of PySAL which we allow ourselves to ask:

To calculate Ripley’s K function, your script „distance_statistics.py“ (pysal/pointpats, version 2.1.0, also attached to this email, changed/notes inserted at lines 478-482) is used where we got some strange error regarding the output: The values only seemed to be exactly 1/4 of the results of our manual calculations and calculations in R. This lead us to inspecting the code and we changed.

den = pp.lambda_window * pp.n * [2]
distance_statistics.txt

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))/den) for di in d])

note: changed original code

den = pp.lambda_window * pp.n

den = pp.lambda_window * pp.n *2

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))*2/den) for di in d])

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))/den) for di in d])

which then delivered exactly the results what we got in other calculations. Our question is: Is our way of thinking correct and/or was there a bug in your source code or is your approach a different? If your approach is different, can you please tell us the literature in order that we understand this issue better?

A second question about KDE will be posted in a different post.

Thanks for your help in advance.

pointpats.random not in the API docs

While working on geopandas/geopandas#2363, I've noticed that the random module is not part of the API docs. Can we make a quick PR and a release to ensure it is there and geopandas can link to it?

update yaml and travis post reorganization

currently the travis tests are failing due to the yaml file not being correct.

argument of type 'builtin_function_or_method' is not iterable

Hi,

we met this error when using the sample data in notebooks/distance_statistics-numpy-oriented.ipynb, thanks!

Typo in `random.normal` method that results in errors

There is a small typo in the random.normal method that prevents the use of a 2x2 numpy array for the cov parameter.

When checking if cov is a numpy array, isinstance is misspelled as isnstance.

[ENH] Notebooks/polymorphism for shapely shapes?

geopandas/geopandas#811 prompts this. I'll need to look at the code, but using libpysal.cg.asShape should do the trick at the simplest.

Documentation for `pointpats` version 2.2.0 in Google Colab

Hi,

I noticed that Google Colab imports by default version 2.2.0 of pointpats. However, I couldn't find any documentation for that version on the pysal site. Could this default be changed? Or is there anywhere where I can find the corresponding documentation?

Thank you!

Relative links to notebooks on pypi landing page are broken

https://pypi.python.org/pypi/pointpats/1.0.0

Might have to update the readme to have absolute links so notebook links work on both the GitHub homepage and on pypi

pysal / pointpats Goto Github PK

pointpats's Issues

additional hulling measures

API consistency

Performance

New statistics

note: changed original code

den = pp.lambda_window * pp.n *2

kcdf = np.asarray([(di, len(pp.tree.query_pairs(di))/den) for di in d])

Recommend Projects

Recommend Topics

Recommend Org