jonghyunharrylee / pypcga Goto Github PK

View Code? Open in Web Editor NEW

23.0 11.0 15.0 196.16 MB

pyPCGA: fast and scalable inverse modeling approach

License: BSD 3-Clause "New" or "Revised" License

Python 99.61% Makefile 0.39%

inversion inversion-method uncertainty-quantification uncertainty-estimation

pypcga's Introduction

pyPCGA

Python library for Principal Component Geostatistical Approach

version 0.1

updates

Exact preconditioner construction (inverse of cokriging/saddle-point matrix) using generalized eigendecomposition [Lee et al., WRR 2016, Saibaba et al, NLAA 2016]
Fast hyperparameter tuning and predictive model validation using cR/Q2 criteria [Kitanidis, Math Geol 1991] ([Lee et al., 2021 in preparation])
Fast posterior variance/std computation using exact preconditioner

version 0.2 will include

automatic covariance model parameter calibration with nearshore application example
link with FMM and HMatrix to support unstructured grids

Installation

python -m pip install git+https://github.com/jonghyunharrylee/pyPCGA.git

Courses

UH CEE696 course on data assimilation

Example Notebooks

1D linear inversion example below will be helpful to understand how pyPCGA can be implemented. Please check Google Colab examples.

1D linear inversion example (from Stanford 362G course) Google Colab example
1D nonlinear inversion example (from Stanford 362G course) Google Colab example
Hydraulic conductivity estimation example using USGS-FloPy (MODFLOW) [Lee and Kitanidis, 2014] Google Colab example
Tracer tomography example using Crunch (with Mahta Ansari from UIUC Druhan Lab)
Bathymetry estimation example using STWAVE (with USACE-ERDC-CHL)
Permeability estimation example using TOUGH2 (with Amalia Kokkianki, USFCA)
Electrical conductivity estimation example using magnetotelluric (MT) survey with MARE2DEM (with Niels Grobbe, UHM)
DNAPL plume estimation using hydraulic head, self-potential (SP) and partitioning tracer data (with Xueyuan Kang et al.)
ERT example using E4D will be completed soon.
MODFLOW-USG/SEAWAT/MODFLOW6 examples coming soon!

Credits

pyPCGA is based on Lee et al. [2016] and currently used for Stanford-USACE ERDC project led by EF Darve and PK Kitanidis and NSF EPSCoR `Ike Wai project.

Code contributors include:

Jonghyun Harry Lee
Matthew Farthing
Ty Hesser (STWAVE example)

FFT-based matvec code is adapted from Arvind Saibaba's work (https://github.com/arvindks/kle).

FMM-based code (https://arxiv.org/abs/1903.02153) will be incorporated in version 0.2

References

J Lee, H Yoon, PK Kitanidis, CJ Werth, AJ Valocchi, "Scalable subsurface inverse modeling of huge data sets with an application to tracer concentration breakthrough data from magnetic resonance imaging", Water Resources Research 52 (7), 5213-5231
AK Saibaba, J Lee, PK Kitanidis, Randomized algorithms for generalized Hermitian eigenvalue problems with application to computing Karhunen–Loève expansion, Numerical Linear Algebra with Applications 23 (2), 314-339
J Lee, PK Kitanidis, "Large‐scale hydraulic tomography and joint inversion of head and tracer data using the Principal Component Geostatistical Approach (PCGA)", WRR 50 (7), 5410-5427
PK Kitanidis, J Lee, Principal Component Geostatistical Approach for large‐dimensional inverse problems, WRR 50 (7), 5428-5443

Applications

T. Kadeethum, D. O'Malley, JN Fuhg, Y. Choi, J. Lee, HS Viswanathan and N. Bouklas, A framework for data-driven solution and parameter estimation of PDEs using conditional generative adversarial networks, Nature Computational Science, 819–829, 2021
J Lee, H Ghorbanidehno, M Farthing, T. Hesser, EF Darve, and PK Kitanidis, Riverine bathymetry imaging with indirect observations, Water Resources Research, 54(5): 3704-3727, 2018
J Lee, A Kokkinaki, PK Kitanidis, Fast large-scale joint inversion for deep aquifer characterization using pressure and heat tracer measurements, Transport in Porous Media, 123(3): 533-543, 2018
PK Kang, J Lee, X Fu, S Lee, PK Kitanidis, J Ruben, Improved Characterization of Heterogeneous Permeability in Saline Aquifers from Transient Pressure Data during Freshwater Injection, Water Resources Research, 53(5): 4444-458, 2017
S. Fakhreddine, J Lee, PK Kitanidis, S Fendorf, M Rolle, Imaging Geochemical Heterogeneities Using Inverse Reactive Transport Modeling: an Example Relevant for Characterizing Arsenic Mobilization and Distribution, Advances in Water Resources, 88: 186-197, 2016

pypcga's People

Contributors

Stargazers

Watchers

Forkers

awais307 rabbl amaliak mahtaw esiwgnahz c8zhuang dasallfolks dagangwood163 leixinma adaiwhu antoinecollet5 wudicool mhjiang99

pypcga's Issues

link stwave example with ERDC stwave scripts

See if we can use ERDC stwave_utils to drive the STWAVE forward problem

problem using linear trend in stwave example

@jonghyunharrylee when I try to specify a linear trend using the 'drift':'linear' flag in example_inv_stwave.py I get the following traceback

***** Iteration 1 ******
computed Jacobian-Matrix products in 189.261032 secs
solve saddle point (co-kriging) systems with Levenberg-Marquardt
Traceback (most recent call last):
  File "example_inv_stwave.py", line 90, in <module>
    s_hat, simul_obs, post_diagv, iter_best = prob.Run()
  File "/Users/rdchlmwf/Public/code/pyPCGA/pyPCGA/pcga.py", line 1254, in Run
    s_hat, simul_obs, post_diagv, iter_best = self.GaussNewton()
  File "/Users/rdchlmwf/Public/code/pyPCGA/pyPCGA/pcga.py", line 1157, in GaussNewton
    s_cur, beta_cur, simul_obs_cur, obj = self.LinearIteration(s_past, simul_obs)
  File "/Users/rdchlmwf/Public/code/pyPCGA/pyPCGA/pcga.py", line 1071, in LinearIteration
    s_hat, beta, simul_obs_new = self.IterativeSolve(s_cur, simul_obs, precond = precond)
  File "/Users/rdchlmwf/Public/code/pyPCGA/pyPCGA/pcga.py", line 902, in IterativeSolve
    x, info = gmres(Afun, b, tol=itertol, restart=restart, maxiter=solver_maxiter, callback=callback, M=P)
  File "<decorator-gen-5>", line 2, in gmres
  File "/Users/rdchlmwf/anaconda2/envs/enkf/lib/python2.7/site-packages/scipy/_lib/_threadsafety.py", line 46, in caller
    return func(*a, **kw)
  File "/Users/rdchlmwf/anaconda2/envs/enkf/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 460, in gmres
    work[slice1] = psolve(work[slice2])
  File "/Users/rdchlmwf/anaconda2/envs/enkf/lib/python2.7/site-packages/scipy/sparse/linalg/interface.py", line 219, in matvec
    y = self._matvec(x)
  File "/Users/rdchlmwf/anaconda2/envs/enkf/lib/python2.7/site-packages/scipy/sparse/linalg/interface.py", line 471, in _matvec
    return self.__matvec_impl(x)
  File "/Users/rdchlmwf/Public/code/pyPCGA/pyPCGA/pcga.py", line 892, in Pmv
    S = np.dot(HX.T, invPsi(HX)) # p by p matrix
  File "/Users/rdchlmwf/Public/code/pyPCGA/pyPCGA/pcga.py", line 878, in invPsi
    return np.multiply(np.multiply((1./alpha[i]),self.invR),v) - np.dot(Psi_U_i, np.multiply(Dvec[:Psi_U_i.shape[1]].reshape(Psi_UTv.shape), Psi_UTv))
ValueError: cannot reshape array of size 69 into shape (69,3)

I get a similar complaint execept the size is (69,2) when I try to manually set X. Do I need to set another flag? Thanks

Decide how we want to provide access to the adh executables?

The pre-compiled adh executables in the adh_red_inset application won't run on our blade server. We may want to point users to AdH website for the binary versions they distribute.

I suppose if we distribute in a docker container we would be ok (either just adh as a docker executable or the whole PCGA application).

No mf.hds output file

Amazing project!
I just compiled mf2005 on debian 11 and can run it just fine but I can't run the inversion in this example, because Modflow doesn't create a head file (I am using Python 3.9.13). All the input files are properly generated, but I think that the simulation doesn't terminate.
Have you noted this before?
Bernard

FileNotFoundError Traceback (most recent call last)
Cell In [2], line 1
----> 1 s_hat, simul_obs, post_diagv, iter_best = prob.Run()

File ~/anaconda3/envs/hlflopy/lib/python3.9/site-packages/pyPCGA/pcga.py:1334, in PCGA.Run(self)
1331 if self.priorU is None or self.priord is None:
1332 self.ComputePriorEig()
-> 1334 s_hat, simul_obs, post_diagv, iter_best = self.GaussNewton()
1335 #start = time()
1336 print("** Total elapsed time is %f secs" % (time()-start))

File ~/anaconda3/envs/hlflopy/lib/python3.9/site-packages/pyPCGA/pcga.py:1219, in PCGA.GaussNewton(self)
1216 print('##### 4. Start PCGA Inversion #####')
1217 print('-- evaluate initial solution')
-> 1219 simul_obs_init = self.ForwardSolve(s_init)
1220 self.simul_obs_init = simul_obs_init
1221 RMSE_init = np.linalg.norm(simul_obs_init-self.obs)/np.sqrt(n)

File ~/anaconda3/envs/hlflopy/lib/python3.9/site-packages/pyPCGA/pcga.py:379, in PCGA.ForwardSolve(self, s)
377 else:
378 with HiddenPrints():
--> 379 simul_obs = self.forward_model(s,par)
381 simul_obs = simul_obs.reshape(-1,1)
383 return simul_obs
...
--> 218 f = open(filename, "rb")
219 f.seek(0, 2)
220 totalbytes = f.tell()

FileNotFoundError: [Errno 2] No such file or directory: '/home/berna/modflow/pyPCGA/examples/modflow_flopy/simul/simul0000/mf.hds'

[ENH] Get reproducible results with PCGA (set the initial vector used by scipy.sparse.linalg.eigsh ...)

Up to now, when running pyPCGA twice or more with the same parameters, the results are slightly different each time.

This is what I attribute to the covariance matrix low-rank approximation relying on scipy.sparse.linalg.eigsh beauce the initial vector v0 is not provided and consequently chosen randomly.

The solution would be to let the possibility for the user set a seed (aka random_state) to generate a reproducible v0.

See: https://stackoverflow.com/a/52403508

add Red River Inset jupyter notebook

Need a notebook for guiding the simulation of the red river inset example

Missing stwave_utils

I have been trying to run the stwave example and I cant find stwave_utils to install. Here is the trace back

Thanks

import stwave as st

File "stwave.py", line 4, in
from stwave_utils.stwave_forward_problem import STWaveProblem

ImportError: No module named stwave_utils.stwave_forward_problem

Release on pypi ?

Hi !

That's a great piece of code you have there. Do you consider adding a tag for v0.1.0 and pushing it to PyPI one day ?

model size

I have a 2d model which produces 11*11 result, I resize it to 1d.
I run and I get the following error

PCGA Inversion

1. Initialize forward and inversion parameters

------------ Inversion Parameters -------------------------
Number of unknowns : 121
Number of observations : 121
Number of principal components (n_pc) : 50
Prior model : def kernel(r): return (prior_std ** 2) * np.exp(-r)

Prior variance : 4.000000e-02
Prior scale (correlation) parameter : [71. 71.]
Posterior cov computation : diag
Posterior variance computation : Direct
Number of CPU cores (n_core) : 4
Maximum GN iterations : 10
machine precision (delta = sqrt(precision)) : 1.000000e-08
Tol for iterations (norm(sol_diff)/norm(sol)) : 1.000000e-02
Levenberg-Marquardt (LM) : True
LM solution range constraints (LM_smin, LM_smax) : None, None
Line search : True

2. Construct Prior Covariance Matrix

time for covariance matrix construction (m = 121) is 0 sec

3. Eigendecomposition of Prior Covariance

time for eigendecomposition with k = 50 is 0 sec
1st eigv : 4.46991, 50-th eigv : 0.000505994, ratio: 0.0001132

4. Start PCGA Inversion

-- evaluate initial solution
obs. RMSE (norm(obs. diff.)/sqrt(nobs)): 18.3563, normalized obs. RMSE (norm(obs. diff./sqrtR)/sqrt(nobs)): 1835.63
***** Iteration 1 ******
Traceback (most recent call last):
File "/xxxx/scripts/inv_GaussPlume.py", line 57, in
s_hat, simul_obs, post_diagv, iter_best = prob.Run()
File "/usr/local/lib/python3.8/dist-packages/pyPCGA-0.1.0-py3.8.egg/pyPCGA/pcga.py", line 1332, in Run
s_hat, simul_obs, post_diagv, iter_best = self.GaussNewton()
File "/usr/local/lib/python3.8/dist-packages/pyPCGA-0.1.0-py3.8.egg/pyPCGA/pcga.py", line 1235, in GaussNewton
s_cur, beta_cur, simul_obs_cur, obj = self.LinearIteration(s_past, simul_obs)
File "/usr/local/lib/python3.8/dist-packages/pyPCGA-0.1.0-py3.8.egg/pyPCGA/pcga.py", line 1149, in LinearIteration
s_hat, beta, simul_obs_new = self.IterativeSolve(s_cur, simul_obs, precond = precond)
File "/usr/local/lib/python3.8/dist-packages/pyPCGA-0.1.0-py3.8.egg/pyPCGA/pcga.py", line 773, in IterativeSolve
HX, HZ, Hs, U_data = self.JacMat(s_cur, simul_obs, Z)
File "/usr/local/lib/python3.8/dist-packages/pyPCGA-0.1.0-py3.8.egg/pyPCGA/pcga.py", line 555, in JacMat
Htemp = self.JacVect(temp,s_cur,simul_obs,precision)
File "/usr/local/lib/python3.8/dist-packages/pyPCGA-0.1.0-py3.8.egg/pyPCGA/pcga.py", line 444, in JacVect
raise ValueError("size of simul_obs_purturbation (%d,%d) is not nruns %d" % (simul_obs_purturbation.shape[0], simul_obs_purturbation.shape[1], nruns))
ValueError: size of simul_obs_purturbation (121,1) is not nruns 52

problem with mf2005 file on Mac

Hi,

I have been trying to run the PCGA modflow example on Mac but have been getting an error. I'm not sure but from what I've seen about similar errors this is because the mf2005 binary is not compatible with my operating system (osx 10.13). Can I get information about which operating systems are compatible with this package?

Here is the traceback that I get.

Traceback (most recent call last):

File "", line 1, in
runfile('/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/example_inv_mf.py', wdir='/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy')

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/example_inv_mf.py", line 106, in
s_hat, simul_obs, post_diagv, iter_best = prob.Run()

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/pyPCGA-0.1.0-py3.6.egg/pyPCGA/pcga.py", line 1309, in Run
s_hat, simul_obs, post_diagv, iter_best = self.GaussNewton()

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/pyPCGA-0.1.0-py3.6.egg/pyPCGA/pcga.py", line 1196, in GaussNewton
simul_obs_init = self.ForwardSolve(s_init)

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/pyPCGA-0.1.0-py3.6.egg/pyPCGA/pcga.py", line 375, in ForwardSolve
simul_obs = self.forward_model(s,par)

File "/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/example_inv_mf.py", line 81, in forward_model
simul_obs = model.run(s, parallelization)

File "/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/mf.py", line 176, in run
simul_obs.append(self(item))

File "/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/mf.py", line 184, in call
return self.run_model(args[0], args[1])

File "/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/mf.py", line 155, in run_model
simul_obs = self.run_model_single(HK, Q_loc, idx)

File "/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/mf.py", line 129, in run_model_single
success, buff = mymf.run_model(silent=True)

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/flopy/mbase.py", line 976, in run_model
normal_msg=normal_msg)

File "/Users/mahtag2/anaconda3/lib/python3.6/site-packages/flopy/mbase.py", line 1401, in run_model
stdout=sp.PIPE, stderr=sp.STDOUT, cwd=model_ws)

File "/Users/mahtag2/anaconda3/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)

File "/Users/mahtag2/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)

OSError: [Errno 8] Exec format error: '/Users/mahtag2/Downloads/pyPCGA-master/examples/modflow_flopy/input_files/mf2005'

add post-processing notebook for red river inset example

We could also turn the plot_red_inset.py script into a jupyter notebook.

Wrong packaging as setup.py imports pyPCGA

The packaging seems wrong as setup.py import pyPCGA and consequently its dependencies before they are even installed:

$ pip install git+https://****@github.com/jonghyunharrylee/pyPCGA.git
Collecting git+https://****@github.com/jonghyunharrylee/pyPCGA.git
  Cloning https://****@github.com/jonghyunharrylee/pyPCGA.git to /tmp/pip-req-build-ko5042bo
  Running command git clone --filter=blob:none -q 'https://****@github.com/jonghyunharrylee/pyPCGA.git' /tmp/pip-req-build-ko5042bo
  Resolved https://****@github.com/jonghyunharrylee/pyPCGA.git to commit 8294452a1c67b6cdc0c36d492c7bb2a769f76fb3
  Preparing metadata (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /tmp/test/venv/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-ko5042bo/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-ko5042bo/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-x9bt3f13
       cwd: /tmp/pip-req-build-ko5042bo/
  Complete output (9 lines):
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-req-build-ko5042bo/setup.py", line 8, in <module>
      from pyPCGA import __name__, __author__
    File "/tmp/pip-req-build-ko5042bo/pyPCGA/__init__.py", line 8, in <module>
      from .pcga import PCGA
    File "/tmp/pip-req-build-ko5042bo/pyPCGA/pcga.py", line 3, in <module>
      import numpy as np
  ModuleNotFoundError: No module named 'numpy'
  ----------------------------------------
WARNING: Discarding git+https://****@github.com/jonghyunharrylee/pyPCGA.git. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Workaround: define the author and the version in the setup.py and retrieve it in __init__.py