sahilm89 / lhsmdu Goto Github PK

This is an implementation of Deutsch and Deutsch, "Latin hypercube sampling with multidimensional uniformity", Journal of Statistical Planning and Inference 142 (2012) , 763-772

License: MIT License

Python 1.05% Jupyter Notebook 44.61% HTML 54.34%

latin-hypercube latin-hypercube-sampling python python3 sampling statistics

lhsmdu's People

Contributors

Stargazers

Watchers

Forkers

iampritishpatil amangitcode substage ahinoamp magatz philstopford johnrobertlawson wangjianglou-tech dilawar duyiming-sau yutaozhou scalet98 riazcseiu temp3rr0r skylocust easyr

lhsmdu's Issues

lhsmdu to work with scipy distributions (and ranges)

Hi there,

When creating samples with lhsmdu, is there a way to do it not uniformly but according to some/all of the scipy distributions (e.g., normal, alpha, beta, lognormal, ... about 100 different distributions are in scipy).

Ideally would be to have something that sets the distribution and range for each of the variables, then an amount of samples which is obviously equal for all variables and to do latin-hypercube or MC based on that. I'm just thinking out loud by the way as someone who uses it to do simulation analyses (not as a programmer).

Kind regards

numpy.AxisError: axis -1 is out of bounds for array of dimension 0 after package update

Hi S,

I updated the lhsmdu-package today and tested the code from the 'basics'-section you provided, but I immediately got the following error (it never happened with the previous install of your package I had installed):

Traceback (most recent call last):
File "C:\Program Files\Python 3.5\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('C:/Users/myvhove/Documents/CE_70/Python/LHS_MDU_test.py', wdir='C:/Users/myvhove/Documents/CE_70/Python')
File "C:\Program Files\JetBrains\PyCharm 2019.2.3\helpers\pydev_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2019.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/myvhove/Documents/CE_70/Python/LHS_MDU_test.py", line 6, in
k = lhsmdu.sample(2, 20) # Latin Hypercube Sampling with multi-dimensional uniformity
File "C:\Program Files\Python 3.5\lib\site-packages\lhsmdu_init_.py", line 109, in sample
matrixOfStrata = eliminateRealizationsToStrata(distance_1D, matrixOfRealizations, numSamples)
File "C:\Program Files\Python 3.5\lib\site-packages\lhsmdu_init_.py", line 45, in eliminateRealizationsToStrata
realizations = sort(averageDistance.keys())
File "<array_function internals>", line 6, in sort
File "C:\Program Files\Python 3.5\lib\site-packages\numpy\core\fromnumeric.py", line 970, in sort
a.sort(axis=axis, kind=kind, order=order)
numpy.AxisError: axis -1 is out of bounds for array of dimension 0

The problem is in the sample-function:
k = lhsmdu.sample(2, 20) # Latin Hypercube Sampling with multi-dimensional uniformity 

Incremental/Nested latin hypercube sampling

I had another question concerning the sampling. In your sample code, at a certain moment, you define resampling with the same strata of the previous sample to achieve an additional nested sampling.

I'm wondering. This function just generates a new list of sampling points within the same selection of latin hypercube-squares in the example (2 variables).
Is there a way to create an incremental nested sampled set, so that the first time you sample, you get a set of lhs-mdu-sampled points (e.g. 20). And you do this then multiple time but the next 20 points take into account the position of the 20 previously sampled points to generate then 20 more lhs-mdu-sampled points and together you have 40, 60, 80 sampled points then.

In this way a single generated set of 40, 60, 80 lhs-mdu-sampled points should the be the same as an incremental nested set of 2,3,4 times 20 points but the advantage is that you can split up the calculations you have to run in steps of 20 for ex. untill you achieve convergence. So that you don't have to run the whole sampled set at once but can break it down and stop when convergence is achieved. Is there a way to do this with this package?

Attached is an example.
Two sampled sets, generated with the same seed-number. In the first (blue) 120 samples are generated. In the second (red) 240 samples are generated but the second set includes the datapoints from the first set in such a way that if I select the first 120 sample-points from the red dataset, these are the same as the blue dataset. Is this something which is possible to generate with this package?
Nested sample.pdf

lhsmdu crash for large samples (500, 1000) or/and multiple variables (10, 50, 100)

Hi there,

I was wondering what caused the lhsmdu-function to crash sometimes for larger samples or/and multiple variables. Other times is works but takes a long time (e.g., 2 variables, 500 samples in 20min. times).
see: https://stackoverflow.com/questions/26137195/latin-hypercube-sampling-with-python/59049597#59049597

Kind regards

Implementation has exponential runtime (unusable for real-world cases)

The LHS implementation in this package has exponential runtime (depending on number of samples). This makes this package unusable if one needs > 20 samples. Just posting this here as a warning if one wants to use this with larger samples sizes which will not work.

From my understanding of the LHS algorithm this should scale with O(n) in number of the samples.

import lhsmdu

def lhsmdu_runtime():
    runtimes = []
    import time
    for k in range(0, 9):
        ts = time.time()
        samples = 2**k
        lhsmdu.sample(2, samples)  # Latin Hypercube Sampling of two variables, and 10 samples each
        te = time.time()
        res = {'samples': samples, 'time': te-ts}
        print(res)
        runtimes.append(
            res
        )
    df = pd.DataFrame(runtimes)
    fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(5, 10))
    for ax in axes:
        ax.plot(df.samples, df.time, '-o', markersize=10)
        ax.set_xlabel("sample size")
        ax.set_ylabel("runtime [s]")
        ax.grid(True)

    axes[1].set_xscale("log")
    axes[1].set_yscale("log")
    fig.savefig("lhsmdu_runtime.png")
    plt.show()

if __name__ == "__main__":
    # example1()
    lhsmdu_runtime()

So basically it takes 5 minutes to create 256 samples, around 1 hr for 500 samples. The current implementation is not usable for more than 20 samples.

{'samples': 1, 'time': 0.0008461475372314453}
{'samples': 2, 'time': 0.0018656253814697266}
{'samples': 4, 'time': 0.007106781005859375}
{'samples': 8, 'time': 0.030531644821166992}
{'samples': 16, 'time': 0.1478424072265625}
{'samples': 32, 'time': 0.8095762729644775}
{'samples': 64, 'time': 5.116601467132568}
{'samples': 128, 'time': 36.78028750419617}
{'samples': 256, 'time': 283.7775731086731}

Process finished with exit code 0

PendingDeprecationWarning

pytest returns this:

XXX/lib/python3.10/site-packages/lhsmdu/init.py:90:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
return matrix(matrixOfSamples)

Resampling of realization matrix is incorrect

lhsmdu/lhsmdu/__init__.py

Line 81 in 925efa8

sortedIndicesOfStrata = argsort(ravel(matrixOfStrata[row,:]))

Resampling should be done by determining (per dimension) the index of a realization after sorting all realizations in that dimension. However, this is not what argsort does. For example [0.62, 0.19, 0.92, 0.22, 0.07] should result in [3, 1, 4, 2, 0], but argsort gives [4, 1, 3, 0, 2] instead. Apply argsort again on sortedIndicesOfStrata to get the correct vector of indices. In this example argsort([4, 1, 3, 0, 2]]) gives the correct result [3, 1, 4, 2, 0].

The effect of this incorrect resampling is that the resulting Latin hypercube samples are far from uniformly distributed, as intended by LHSMDU.

Sample code needs to be updated.

The example in the Readme.md is not working with Python3.6. The k and l are 2-D numpy matrices that need to be converted to np.arrays before sending to matplotlib. In addition, the 'col' parameter name needs to be replaced with 'c' or 'color'.

import lhsmdu
import matplotlib.pyplot as plt
import numpy as np

k = lhsmdu.sample(2, 20) # Latin Hypercube Sampling with multi-dimensional uniformity
print(k)
l = lhsmdu.createRandomStandardUniformMatrix(2, 20) # Monte Carlo sampling

k = np.array(k)
l = np.array(l)

fig = plt.figure()
ax = fig.gca()
ax.set_xticks(np.arange(0,1,0.1))
ax.set_yticks(np.arange(0,1,0.1))
plt.scatter(k[0], k[1], color="g", label="LHS-MDU")
plt.scatter(l[0], l[1], color="r", label="MC")
plt.grid()
plt.show()

m = lhsmdu.resample()
n = lhsmdu.resample()
o = lhsmdu.resample()
m = np.array(m)
n = np.array(n)
o = np.array(o)

fig = plt.figure()
ax = fig.gca()
ax.set_xticks(np.arange(0,1,0.1))
ax.set_yticks(np.arange(0,1,0.1))
plt.title("LHS-MDU")
plt.scatter(k[0], k[1], c="g", label="sample 1")
plt.scatter(m[0], m[1], c="r", label="resample 2")
plt.scatter(n[0], n[1], c="b", label="resample 3")
plt.scatter(o[0], o[1], c="y", label="resample 4")
plt.grid()
plt.show()

lhsmdu not work in python 3

hi there. i trying to run the exact sample Code which include in readme.rst (in python3) and taking various Errors. is lhsmdu only support python2 ?

error:
` ValueError Traceback (most recent call last)
in
10 ax.set_xticks(numpy.arange(0,1,0.1))
11 ax.set_yticks(numpy.arange(0,1,0.1))
---> 12 plt.scatter(k[0], k[1], color="b", label="LHS-MDU")
13 plt.scatter(l[0], l[1], color="r", label="MC")
14 plt.grid()

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, data, **kwargs)
2860 vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths,
2861 verts=verts, edgecolors=edgecolors, **({"data": data} if data
-> 2862 is not None else {}), **kwargs)
2863 sci(__ret)
2864 return __ret

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib_init_.py in inner(ax, data, *args, **kwargs)
1808 "the Matplotlib list!)" % (label_namer, func.name),
1809 RuntimeWarning, stacklevel=2)
-> 1810 return func(ax, *args, **kwargs)
1811
1812 inner.doc = _add_data_doc(inner.doc,

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
4260 x, y, s, c, colors, edgecolors, linewidths =
4261 cbook.delete_masked_points(
-> 4262 x, y, s, c, colors, edgecolors, linewidths)
4263
4264 scales = s # Renamed for readability below.

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\cbook_init_.py in delete_masked_points(*args)
1030 if isinstance(x, np.ma.MaskedArray):
1031 if x.ndim > 1:
-> 1032 raise ValueError("Masked arrays must be 1-D")
1033 else:
1034 x = np.asarray(x)

ValueError: Masked arrays must be 1-D
`

`None` seed is not working

lhsmdu.setRandomSeed(None) leads to

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-70-04ad35276bcd> in <module>
----> 1 lhsmdu.setRandomSeed(None)

AttributeError: module 'lhsmdu' has no attribute 'setRandomSeed'

in python 3.

Repeated sampling

Hi Sahil,

I am using your (extremely useful) script for sampling from parameter space, but this needs to be done identically on repeated simulations. I noticed that supplying randomSeed to sample() has no effect, so have modified my version to have randomSeed = None by default, and include

if randomSeed is not None: random.seed(randomSeed)

within sample()

This solved my issue. Reporting here in case it should be included into the main commit.

Apologies if this is the wrong place to log this issue; I'm new to github.

Thanks,

Sam

Question: pseudorandom

Hi,
does this package also support creating pseudorandom sequences?
https://en.wikipedia.org/wiki/Low-discrepancy_sequence
Thanks,
Matthis