e-kotov / gridsample Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cran/gridsample

0.0 0.0 0.0 241 KB

This is a fixed version of the latest read-only snapshot of gridsample — Tools for Grid-Based Survey Sampling Design

License: Other

R 100.00%

gridsample's Introduction

I am currently a Doctoral Researcher at the Department of Digital and Computational Demography, Laboratory of Migration and Mobility at the Max Planck Institute for Demographic Research (MPIDR) in Rostock and at Sociodemography Research Group at the Universitat Pompeu Fabra (UPF) in Barcelona.
I analyse spatial digital trace data to study human mobility patterns and the implications for disease dynamics and inequalities.
My latest paper is:
- Kotov, E., & Denecke, E. (2024). Expanding the Lifespan of Software for Demographic Analysis with Containers: An Application of Spatial Sampling. The Denominator. Population Dynamics Lab. .
- See all papers.
I worked on many R packages: I developed the rJavaEnv package to manage Java environment for Java-dependent R packages, wikimapR for getting data using Wikimapia API, enhanced rang for reproducible research, and made contributions to layer, OpenTripPlanner, geofacet, esri2sf and more R packages and other open source software.
I used to teach reproducible data analysis in R and computational spatial morphology to urban researchers and planners.

Research interests

Intra-urban and inter-regional mobility, commuting and migration; determinants, external effects and spillover effects
Implications of human mobility for disease dynamics and inequalities

gridsample's People

Contributors

gridsample's Issues

raster reprojection generates non-integer strata labels and leads to an error

gridsample/R/gs_sample.r

Lines 276 to 279 in d8a7fb9

    
           if (sp::proj4string(strata_raster) != "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0") { 
        
             strata_raster <- projectRaster(strata_raster, 
        
               crs = sp::CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) 
        
           }

This reprojection code generates non-integer values that lead to an error.

library(gridsample)
library(raster)
#> Loading required package: sp


population_raster <- raster(system.file("extdata", "RWA_ppp_2010_adj_v2.tif",
                                        package="gridsample"))
plot(population_raster)

data(RWAshp)
proj4string(RWAshp) <- CRS(proj4string(population_raster))
strata_raster <- rasterize(RWAshp,population_raster,field = "STL.2")
plot(strata_raster)

total_pop <- cellStats(population_raster, stat = "sum")
urban_pop_value <- total_pop * .16
pop_df <- data.frame(index = 1:length(population_raster[]),
                     pop = population_raster[])
pop_df <- pop_df[!is.na(pop_df$pop), ]
pop_df <- pop_df[order(pop_df$pop,decreasing = T), ]
pop_df$cumulative_pop <- cumsum(pop_df$pop)
pop_df$urban <- 0
pop_df$urban[which(pop_df$cumulative_pop <= urban_pop_value)] <- 1
urban_raster <- population_raster >= min(subset(pop_df,urban == 1)$pop)
values(urban_raster)[is.na(values(urban_raster))] <- 0 # setting all NA to 0 for the unfixed version of the package where NAs also cause critical error
plot(urban_raster)

psu_polygons <- gs_sample(
  population_raster = population_raster,
  strata_raster = strata_raster,
  urban_raster = urban_raster,
  cfg_desired_cell_size = NA,
  cfg_hh_per_stratum = 416,
  cfg_hh_per_urban = 26,
  cfg_hh_per_rural = 26,
  cfg_min_pop_per_cell = 0.01,
  cfg_max_psu_size = 10,
  cfg_pop_per_psu = 610,
  cfg_sample_rururb = TRUE,
  cfg_sample_spatial = FALSE,
  # cfg_sample_spatial_scale = 100,
  output_path = tempdir(),
  sample_name = "rwanda_psu")

#> [1] "Beginning gs_sample() Process:"
#> [1] "Randomly Stratifying PSU Cell IDs:"
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 1 ..."
#> [1] "Processing stratum ID 1 ..."
#> [1] "Processing stratum ID 1 ..."
#> [1] "Processing stratum ID 1 ..."
#> [1] "Processing stratum ID 4 ..."
#> [1] "Processing stratum ID 4 ..."
#> [1] "Processing stratum ID 4 ..."
#> [1] "Processing stratum ID 1.44650786031469 ..."
#> Error in sample.int(length(x), size, replace, prob): invalid first argument

^{Created on 2023-11-29 by the reprex package (v0.3.0)}

implement sampling with a quadtree algorithm?

"[Population sampling frame using a quadtree algorithm] it allows an accurate calculation of sampling probability weights compared to GridSample". (Qader et al. 2020)

Perhaps worth updating and reimplementing the code (Qader et al. 2020) from the article supplement in gridsample itself or in a separate package:

## Quadtree-based decomposition of rasters (gridded population dataset)
## Required packages
install.packages("raster")
install.packages("maptools")
install.packages("rgdal")
install.packages("sp")
install.packages("doSNOW")
## Set directory
setwd("C:/")
## Define the raster data (Gridded population data)
RasataData=raster("Raster data")
## Define the shape file to identfy the adminstrative regions or settelment types. 
reg=readOGR("shape file")
RasterData.Region1=subset(reg, ID=="First Region name")
RasterData.Region1=subset(reg, grepl("Second Region name", ID))
Awdal.rast=crop(som, som.Awdal)
Banaadir.rast=crop(som, som.Banaadir)
## Regions to process
provs <- as.character(reg$ID)
## help function: split images into quads
splitim <- function(im){
  r <- nrow(im)
  c <- ncol(im)
  dr <- round(r/2)
  dc <- round(c/2)
    quads <- list()
  quads[[1]] <- im[1:dr, 1:dc, drop=FALSE]
  quads[[2]] <- im[(dr+1):r, 1:dc, drop=FALSE]
  quads[[3]] <- im[(dr+1):r, (dc+1):c, drop=FALSE]
  quads[[4]] <- im[1:dr, (dc+1):c, drop=FALSE]
    return(quads)
}
## Pre-define parameters for population size and area
minSize <- 3500 # Maximum population size
gridSize <- 32 # maximum grid dimension
### main loop 
for(p in provs[1:18]){
  print(p)
    grid_c<- crop(som, subset(reg, ID==p))
  grid<-mask(grid_c, subset(reg, ID==p))
  #grid=mask(grid, UR)
    dim(grid)
    iml <- list() # store the list of images to process
  iml[[1]] <- grid # first in the image list is the full grid
  res <- FALSE # results, start false to begin with a split
  res2 <- FALSE
  imS <- list() # output list of split grids
    # main processing loop
  while (prod(res)==0 | prod(res2)==0) {
    iml <- unlist(lapply(iml, splitim))
        res <- unlist(lapply(iml, function(i){ cellStats(i, stat='sum') < minSize } ))
    res2 <- unlist(lapply(iml, function(i){ max(dim(i)) < gridSize } ))
        imS <- c(imS, iml[which(res==1 & res2==1)])
    iml <- iml[which(res==0 | res2==0)]
  }
   ## drop subscenes with no data
  keep <- lapply(1:length(imS), function(i){ cellStats(imS[[i]],'mean')>0 & !is.nan(cellStats(imS[[i]], 'mean')) })
   imS <- imS[unlist(keep)==T]
  
  ## output split grids
  if(!dir.exists(paste0( "subscenes"))){dir.create(paste0( "subscenes"))}
  if(!dir.exists(paste0( "subscenes/", p,"/"))){dir.create(paste0( "subscenes/", p,"/"))}
  
  ## output extent polygons
  plylist <- lapply(1:length(imS), 
                    function(i){ 
                      if(!is.null(imS[[i]])){
                        pop=cellStats(imS[[i]], sum)
                        e <- extent(imS[[i]])
                        sp <- as(e, 'SpatialPolygons')
                        crs(sp) <- crs(imS[[i]])
                        polydf <- SpatialPolygonsDataFrame(sp, data=data.frame(id=i, pop=pop))
                        return(polydf)}
                    })
  ## merge all to a single feature
  joinpoly <- do.call(bind, plylist) 
    if(!dir.exists(paste0( "subshapes"))){dir.create(paste0( "subshapes"))}
  if(!dir.exists(paste0( "subshapes/", p,"/"))){dir.create(paste0( "subshapes/", p,"/"))}
  shapefile(joinpoly, paste0( "subshapes/", p, "/", p, "_ply_all3.shp"), overwrite=T)
  pdf(paste0( "subshapes/", p, "/", p, "_plot.pdf"))
  plot(grid, main=p)
  plot(joinpoly, add=T)
  plot(reg, add=T)
  dev.off()
  }

References

Qader, S.H., Lefebvre, V., Tatem, A.J. et al. Using gridded population and quadtree sampling units to support survey sample design in low-income settings. Int J Health Geogr 19, 10 (2020). https://doi.org/10.1186/s12942-020-00205-5

gs_sample fails if the raster contains NA, even the original vignette is not reproducible

NA values in rasters lead to critical errors:

sessionInfo()

#> R version 4.0.1 (2020-06-06)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.0.1 magrittr_1.5 tools_4.0.1 htmltools_0.5.0
#> [5] yaml_2.2.1 stringi_1.4.6 rmarkdown_2.2 highr_0.8
#> [9] knitr_1.28 stringr_1.4.0 xfun_0.14 digest_0.6.25
#> [13] rlang_0.4.6 evaluate_0.14

library(gridsample)
library(raster)
#> Loading required package: sp
population_raster <- raster(system.file("extdata", "RWA_ppp_2010_adj_v2.tif",
                                        package="gridsample"))
plot(population_raster)

data(RWAshp)
proj4string(RWAshp) <- CRS(proj4string(population_raster))
strata_raster <- rasterize(RWAshp,population_raster,field = "STL.2")
plot(strata_raster)

total_pop <- cellStats(population_raster, stat = "sum")
urban_pop_value <- total_pop * .16
pop_df <- data.frame(index = 1:length(population_raster[]),
                     pop = population_raster[])
pop_df <- pop_df[!is.na(pop_df$pop), ]
pop_df <- pop_df[order(pop_df$pop,decreasing = T), ]
pop_df$cumulative_pop <- cumsum(pop_df$pop)
pop_df$urban <- 0
pop_df$urban[which(pop_df$cumulative_pop <= urban_pop_value)] <- 1
urban_raster <- population_raster >= min(subset(pop_df,urban == 1)$pop)
plot(urban_raster)

psu_polygons <- gs_sample(
  population_raster = population_raster,
  strata_raster = strata_raster,
  urban_raster = urban_raster,
  cfg_desired_cell_size = NA,
  cfg_hh_per_stratum = 416,
  cfg_hh_per_urban = 26,
  cfg_hh_per_rural = 26,
  cfg_min_pop_per_cell = 0.01,
  cfg_max_psu_size = 10,
  cfg_pop_per_psu = 610,
  cfg_sample_rururb = TRUE,
  cfg_sample_spatial = FALSE,
  # cfg_sample_spatial_scale = 100,
  output_path = tempdir(),
  sample_name = "rwanda_psu"
)

#> [1] "Beginning gs_sample() Process:"
#> [1] "Randomly Stratifying PSU Cell IDs:"
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> [1] "Processing stratum ID 6 ..."
#> Error in while (total_stratum < cfg_hh_per_stratum) {: missing value where TRUE/FALSE needed

^{Created on 2023-11-29 by the reprex package (v0.3.0)}

The problematic code part is

gridsample/R/gs_sample.r

Lines 411 to 418 in d8a7fb9

    
           if (total_stratum < cfg_hh_per_stratum) { 
        
             while (total_stratum < cfg_hh_per_stratum) { 
        
               newid <- sample(psu_data[selected == 0]$id, 1) 
        
               total_stratum <- total_stratum + 
        
                 (urban_vec[newid] == 0) * cfg_hh_per_rural + 
        
                 (urban_vec[newid] == 1) * cfg_hh_per_urban 
        
               psu_data[id == newid]$selected <- 1 
        
             }

When urban_vec contains NAs, the code breaks, because NA is then written into total_stratum.

e-kotov / gridsample Goto Github PK

gridsample's Introduction

gridsample's People

Contributors

gridsample's Issues

raster reprojection generates non-integer strata labels and leads to an error

implement sampling with a quadtree algorithm?

References

gs_sample fails if the raster contains NA, even the original vignette is not reproducible

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if (sp::proj4string(strata_raster) != "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0") {
	strata_raster <- projectRaster(strata_raster,
	crs = sp::CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
	}

	if (total_stratum < cfg_hh_per_stratum) {
	while (total_stratum < cfg_hh_per_stratum) {
	newid <- sample(psu_data[selected == 0]$id, 1)
	total_stratum <- total_stratum +
	(urban_vec[newid] == 0) * cfg_hh_per_rural +
	(urban_vec[newid] == 1) * cfg_hh_per_urban
	psu_data[id == newid]$selected <- 1
	}