Code Monkey home page Code Monkey logo

gpkg's Issues

dynamic GeoPackage file writing

Creation of a geopackage object should collect or create all the required information to take any associated R data objects and user input and create a GeoPackage file. There is no mechanism to do the whole process currently, the user has to build a new file incrementally with gpkg_write(<object>, ...), possibly appending to an existing file.

Currently you can also create geopackage S3 objects that are not "backed up by" (or derived from) any file. This is not currently a useful thing to do, the object is really only "read only" by the time the R user sees it. No changes they make to the R object can be done to the geopackage without going through the SQLiteConnection.

I would like to have a gpkg_write(<geopackage>) method that will essentially figure out the sequence of layer writes, table updates etc. that are necessary to create a (possibly complex) GeoPackage directly from an R object (i.e. a list of layers)

  • This could potentially be automatically invoked, creating a geopackage that exists only as a temporary file
    • The user should not necessarily have to be concerned with how the file is put together or where it is just that they had some R objects that they wanted to be able to get from it.
  • It also should be possible (at least in theory) to "roundtrip" arbitrary geopackages in to the S3 object representation and back out as a new geopackage file with identical contents. I am sure there are some possible wrinkles here, but should be achievable for simple cases.

add spatial views

Add a convenience method for the creation of spatial views: https://gdal.org/drivers/vector/gpkg.html#spatial-views

For example:

CREATE VIEW my_view AS SELECT foo.fid AS OGC_FID, foo.geom, ... FROM foo JOIN another_table ON foo.some_id = another_table.other_id
INSERT INTO gpkg_contents (table_name, identifier, data_type, srs_id) VALUES ( 'my_view', 'my_view', 'features', 4326)
INSERT INTO gpkg_geometry_columns (table_name, column_name, geometry_type_name, srs_id, z, m) values ('my_view', 'my_geom', 'GEOMETRY', 4326, 0, 0)

GeoPackage validation routine

  • check for required tables
  • check for consistency between tables (foreign key relationships)
  • check for duplicate entries within tables (violated primary key constraints)
  • ?

    gpkg/R/gpkg-validate.R

    Lines 1 to 12 in c813561

    #' Validate a GeoPackage
    #'
    #' Checks for presence of required tables, valid values and other constraints.
    #'
    #' @param x Path to GeoPackages
    #' @param diagnostics Return a list containing diagnostics (missing table names, invalid values, other errors)
    #'
    #' @return `TRUE` if valid. `FALSE` if one or more problems are found. For full diagnostics run with `diagnostics = TRUE` to return a list containing results for each input GeoPackage.
    #' @export
    gpkg_validate <- function(x, diagnostics = FALSE) {
    stop("This is not implemented yet")
    }

`.gpkg_process_sources()` should be more robust to various OGR sources

Currently just a small number of file path extensions for raster and vector data are used to identify paths to possible spatial data sources.

Nobody has complained, and I guess it has worked fine for me, but I should have generalized this a long time ago...

I am now thinking perhaps the functionality to support and classify arbitrary file paths in the same call to gpkg_write() might be too general. The main reason using file extensions is necessary is because either terra::rast() or terra::vect(), or a non-spatial table function, needs to be called based on whether a file input is determined to be is a raster, vector, or attribute.

  • Can I create a lookup table of file extensions from file based sources found in terra::gdal(drivers=TRUE)?

  • Could I pass an argument so that user needs to specify whether they are writing raster, vector, or attributes? Meaning 1 type per call, or a vector of equal length to the input list denoting data type? Both of these alternate options seem rather cumbersome in comparison to how it works now.

connection with DBI

I work with sql files in rstudio, where you need to specify the type of connection in the first row as a comment, like so:
-- !preview conn=DBI::dbConnect(RSQLite::SQLite(),"some_file_name.gpkg")
using RSQLite doesn't allow for spatial function as gpkg allows.
I can't figure out what is the right SQLiteConnection to pass to that row.
Anyhow this could be a nice feature as a standalone function.

consider use of R6 for geopackage

Early on I decided I didn't want to use the R6 system for geopackage, but my attempt for in-place modification (#9) for connecting existing objects (that has been reverted) has renewed consideration.

I might need to make a draft PR to test this out to be sure it is not what I want.

R6 could allows for better maintenance of state within existing objects, allowing for proper in-place modification by e.g. definition of a <geopackage>$connect() method

beautify `print()` method

  1. Simple way of listing registered gpkg_contents tables
  2. Identify presence of standard/core tables (gpkg_contents, gpkg_spatial_ref_sys, gpkg_geometry_columns, etc.) as a separate list
  3. Add symbols or some other notation for optional associated features (e.g. presence absence of rtrees associated with specific features, metadata tables, table relationships (?))

Item 3 intended as a way of using as markup to enhance item 1 rather than listing all (long, possibly numerous) names

replace core GDAL functionality with {gdalraster}

I recently added some updates to use {vapour} for things {terra} could not directly provide.

It may be worth going even closer to the source with {gdalraster}. This change be pending the addition of the OGR vector-related bindings https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html. Since I have several things to address between now and the next release, there is some time to see how this develops, and perhaps make suggestions for anything missing from {gdalraster}

refactor `dplyr.frame()` and `lazy.frame()` methods

I would like to avoid the "lazy.frame" terminology, and also "dplyr.frame" is not useful shorthand for tbl(). I was toying with the ideas, which now work fine, and now would like to have an API more consistent with the rest of the package.

  • lazy.frame() will no longer be exported and will be renamed gpkg_table_pragma() -- this is useful information but not a substitute for table contents.
  • dplyr.frame() will no longer be exported, it will be renamed internally and will be used inside gpkg_table() and gpkg_tables() unless new argument pragma is TRUE.
  • gpkg_table() and gpkg_tables(): the new argument pragma=FALSE will require suggested package {dbplyr}, pragma=TRUE will use gpkg_table_pragma() instead of gpkg_table().
    • This is requiring the user to opt in to avoid the {dbplyr} dependencies rather than relying on what namespaces could be loaded to determine behavior (yuck)
    • gpkg_table() current behavior is to use dbGetQuery() to materialize full table in memory, which was not behavior of gpkg_tables(); "table" and "tables" will now be consistent and the old gpkg_table() code to make a table in memory will be available as gpkg_get_table() (or gpkg_collect_table() (?), or collect=TRUE argument to gpkg_table())

The lazy.frame/dplyr.frame functions will be removed from the namespace as I prepare for an initial release of gpkg v0.1.0.

`gpkg_create_dummy_features()` upgrades and deprecation

  • Use new gpkg_create_spatial_ref_sys() function internally
  • Abstract out gpkg_geometry_columns table create/insert code as new functions
  • Deprecate gpkg_create_dummy_features() function name and replace with e.g. gpkg_create_empty_features()
  • Deprecate gpkg_contents(template=) argument, provide documented arguments for each data element (srs_id, bounding box)
  • Add a gridded analog e.g. gpkg_create_empty_grid()
  • Add an option to add empty tables to gpkg_contents (default to TRUE, changing existing behavior)

refactor `gpkg_write()`

  • vector write not working properly anymore
  • better handling of list input for naming feature / tile sets / data_null
  • basic post-processing/validating of result
  • tests

in-place modification for `gpkg_connect()`

Often there isn't a need to manually connect a geopackage object because connections are created and destroyed on the fly. However, when performing many database operations it may be beneficial to make use of a persistent connection to avoid the extra overhead of repeatedly closing and re-opening.

Currently, you need to overwrite the object to connect to the database and have that connection "stick".

library(gpkg)

g <- geopackage()
g
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------

# doesnt work
gpkg_connect(g)
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------
#> <SQLiteConnection>
#>   Path: /tmp/RtmpWPANIS/Rgpkg277b54d212bf9.gpkg
#>   Extensions: TRUE
g
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------

# works
g <- gpkg_connect(g)
g
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------
#> <SQLiteConnection>
#>   Path: /tmp/RtmpWPANIS/Rgpkg277b54d212bf9.gpkg
#>   Extensions: TRUE

# works
gpkg_disconnect(g)
g
#> <geopackage>
#> Error: Invalid or closed connection

Connecting to a database and not saving the result prevents from being able to reference the pointer in the future to be disconnected (which causes warnings: Warning message: call dbDisconnect() when finished working with a connection).

It would be nice to come up with a way that in-place modification of a geopackage object could be used to update the connection field.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.