hypertidy / silicate Goto Github PK

View Code? Open in Web Editor NEW

53.0 4.0 4.0 37.69 MB

A general form for complex data

Home Page: https://hypertidy.github.io/silicate/

R 4.57% HTML 95.30% Makefile 0.02% CSS 0.11% Rebol 0.01%

rstats triangulation topology hierarchical-data structural-primitives simplicial-complex spatial-data

silicate's Introduction

Overview

The goal of silicate is to bridge planar geospatial data types with flexible mesh data structures and visualization.

We aim to provide

a common-form for representing hierarchical data structures
a universal converter between complex data types
topological primitives for analysis and exploration.

The core of silicate are worker functions that are generic and work with any kind of data that has hierarchical structure. These functions work on models, that include various formats (sf, sp, trip) and also on silicate models themselves.

Functions

We have the following worker verbs, designed to work with many spatial data formats and with silicate’s own structures.

sc_object() - highest level properties, the “features”
sc_coord() - all instances of coordinates, labelled by vertex if the source model includes them
sc_vertex() - only unique coordinates (in some geometric space)
sc_path() - individual paths, sequential traces
sc_edge() - unique binary relations, unordered segments (segments and edges are currently under review, and may change)
sc_segment() - all instances of edges
sc_arc() - unique topological paths, arcs either meet two other arcs at a node, or include no nodes
sc_node() - unique nodes

The idea is that each function can return the underlying entities of a data object, no matter its underlying format. This interoperability design contrasts with major spatial packages that require format peculiarities in order to work, even when these details are not relevant.

Models

Silicate defines a number of key models to represent various interpretations of hierarchical (usually spatial) data. These are SC, PATH, ARC and TRI. Most models have a counterpart structurally-optimized version with a similar name: SC0, PATH0, TRI0. Other models are possible, and include DEL in the in-development anglr package that extends TRI.

Each model is composed of a type of primitive, and each provides a normalization (de-duplication) of geometry for efficiency and or topology. Silicate quite deliberately separates the concepts of geometry and topology completely. Primitives define the topology (edges, paths, arcs, or triangles) and the vertex table defines the geometry. We reserve the names x_, y_, t_ (time) and z_ (elevation) for the usual geometric dimensions, and these are treated specially by default. No limit is put geometric dimension however, it’s possible to store anything at all on the vertex table. Some models include an object table, and this represents a higher level grouping of primitives (and corresponds to features in SF).

The most general model is SC, composed of three tables vertex, edge and object and all entities are explicitly labelled. Indexes between tables are unique and persistent and arbitrary, they can be arbitrarily accessed. This is closely related to the more bare-bones SC0 model, composed of only two tables vertices, and objects. These are related structurally by nesting the relations within the object table. Here the relations are not persistent, so we can subset the objects but we cannot change the vertex table with updating these indexes.

SC0 can deal with 0-dimensional topology types (points) as well as 1-dimensional types (edges), but SC is strictly for edges.

Further models PATH, ARC, and TRI cover a broad range of complex types, and each is fundamental and distinct from the others. SC can be used to represent any model, but other models provide a better match to specific use-cases, intermediate forms and serve to expand the relationships between the model types.

SC is the universal model, composed of binary relationships, edges defined by pairs of vertices (a structural primitive model)
TRI also a structural primitive model, for triangulations
PATH a sequential model, for the standard spatial vector types, shapes defined by paths
ARC a sequential model, for arc-node topology a shared-boundary decomposition of path models
SC0 is a stripped down structural model analogous to SC, there are only implicit relations of object to vertices, with a nested list of edge indexes

The models PATH0 and ARC0 are in-development. By analogy to SC0 they will be composed of two tables, object and vertex with nested structural-index tables on object holding the path and arc indexes that are row numbers of vertex. It’s not clear yet if this vertex table should be de-duplicated.

Earlier versions included a mix of these models, and the definitions have changed many times. Still a work-in-progress.

An extension of the TRI model DEL is provided in anglr which builds high-quality triangulations, but the structural representation is the same.

Each model is created by using a set of generic verbs that extract the underlying elements of a given model. This design means that the models themselves are completely generic, and methods for worker verbs can be defined as needed for a given context. Our ideal situation would be for external packages to publish methods for these verbs, keeping package-specific code in the original package. We think this provides a very powerful and general mechanism for a family of consistent packages.

There is another important function unjoin() use to normalize tables that have redundant information. The unjoin() isthe opposite of the database join, and has a nearly identical counterpart in the dm package with its decompose_table(). Unjoin is the same as tidyr::nest() but returns two tables rather than splitting one into the rows of other.

The unjoin is a bit out of place here, but it’s a key step when building these models, used to remove duplication at various levels. It’s the primary mechanism for defining and building-in topology, which is precisely the relationships between entities in a model. This function is published in the CRAN package unjoin.

Arbitrarily re-composable hierarchies

The common “well-known” formats of encoding geometry (WKB/WKT for binary/text) represent (pre-)aggregated data, yet the input levels of aggregation are often not directly relevant to desired or desirable levels of aggregation for analysis. A key stage in many GIS analyses is thus an initial disaggregation to some kind of atomic form followed by re-aggregation.

We propose a common form for spatial data that is inherently disaggregated, that allows for maximally-efficient on-demand re-aggregation (arbitrarily re-composable hierarchies), and that covers the complexity of geometric and topological types widely used in data science and modelling. We provide tools in R for more general representations of spatial primitives and the intermediate forms required for translation and analytical tasks. These forms are conceptually independent of R itself and are readily implemented with standard tabular data structures.

There is not one single normal form that should always be used. There is one universal form that every other model may be expressed in, but also other forms that are better suited or more efficient for certain domains. We show that conversion between these forms is more straightforward and extensible than from SF or related types, but is also readily translated to and from standard types. The most important forms we have identified are “universal” (edges and nodes), “2D primitives” (triangles), “arcs” (shared boundaries), and “paths” (normalized forms of SF types).

Installation

# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("hypertidy/silicate")

Usage

Convert a known external model to a silicate model.

library(silicate)
#> 
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#> 
#>     filter
x <- SC(minimal_mesh) ## convert simple features to universal form

y <- ARC(minimal_mesh) ## convert simple features to "arc-node" form

Obtain the elements of a known model type.

sc_vertex(x)
#> # A tibble: 14 × 3
#>       x_    y_ vertex_
#>    <dbl> <dbl> <chr>  
#>  1  0     0    j93IuM 
#>  2  0     1    pxzwso 
#>  3  0.2   0.2  ebba8X 
#>  4  0.2   0.4  DAVqdd 
#>  5  0.3   0.6  SgNLdS 
#>  6  0.5   0.2  dkC63w 
#>  7  0.5   0.4  MOQyty 
#>  8  0.5   0.7  A8y5AJ 
#>  9  0.69  0    10knzw 
#> 10  0.75  1    4jcQa9 
#> 11  0.8   0.6  TtwoyP 
#> 12  1     0.8  Avp6eS 
#> 13  1.1   0.63 Hvtd7u 
#> 14  1.23  0.3  AjMxRC

sc_edge(x)
#> # A tibble: 15 × 4
#>    .vx0   .vx1   path_ edge_ 
#>    <chr>  <chr>  <int> <chr> 
#>  1 j93IuM pxzwso     1 8PUDnw
#>  2 pxzwso 4jcQa9     1 lzQAhR
#>  3 4jcQa9 Avp6eS     1 xa0Dzz
#>  4 A8y5AJ Avp6eS     1 FXq8du
#>  5 A8y5AJ TtwoyP     1 qN0CQJ
#>  6 10knzw TtwoyP     1 puOMxs
#>  7 j93IuM 10knzw     1 hbTBNZ
#>  8 ebba8X dkC63w     2 hnk6TQ
#>  9 dkC63w MOQyty     2 fhL5Yn
#> 10 SgNLdS MOQyty     2 eNQjZr
#> 11 DAVqdd SgNLdS     2 BfnrBy
#> 12 ebba8X DAVqdd     2 vwpscd
#> 13 TtwoyP Hvtd7u     3 xRaI6h
#> 14 Hvtd7u AjMxRC     3 xUTlye
#> 15 10knzw AjMxRC     3 8Yx3Wj

sc_node(y)
#> # A tibble: 2 × 1
#>   vertex_
#>   <chr>  
#> 1 5aVdqW 
#> 2 YVlBOr

sc_arc(y)
#> # A tibble: 4 × 2
#>   arc_   ncoords_
#>   <chr>     <int>
#> 1 D2HjBn        7
#> 2 qRhFbW        2
#> 3 SiNxtc        5
#> 4 SLxSpo        4

silicate models

There are two kinds of models, primitive and sequential.

Primitive-based models are composed of atomic elements that may be worked with arbitrarily, by identity and grouping alone.

Sequential-based models are bound to ordering and contextual assumptions. We provide the PATH and ARC models as generic, relational forms that provide a convenient intermediate between external forms and primitives models. Further intermediate models exist, including monotone and convex decompositions of polygons.

There is one universal primitives-based model, an edge-only model with two tables at its core. Higher level structures are described by grouping tables, with as many levels as required. Any other model can be expressed in this form.

We also differentiate structural primitives, which are specializations that are more convenient or more efficient in certain cases. These include triangulations (2D primitives), and segment structures (1D primitives), and could provide higher dimensional forms (3D primitives, etc. ).

Currently, we provide support for the universal model SC, the sequential models PATH (simple features belongs here, amongst many others) and ARC (arc-node topology, TopoJSON-like, OpenStreetMap), and structural primitives TRI.

In practice a segment model is trivial to generate, “SEG” but we haven’t done that. This would be analogous to the format used by rgl::rgl.lines or spatstat::psp.

We take care to allow for labelling (identity) of component elements, without taking full responsibility for maintaining them. Random IDs are created as needed, but any operation that works with existing IDs should be stable with them.

Context, and some related projects

The polymer (arbitrary multi-layer polygonal overlays) and sphier (generic hierarchies from atomic forms) show two different approaches to the problem of hierarchical data and flexible representations.

The key difference between the silicate approach and simple features is the separation of geometry and topology. This allows for normalization (de-duplication) of the entities that are present or that can be identitied. Simple features has no capacity to de-duplicate or otherwise identify vertices, edges, paths or arcs, though tools that work with simple features do construct these schemes routinely in order to perform operations. When these richer, topological structures are built they are usually then discarded and the vertices are again de-normalized and again expressed explicitly without recording any of the relationships. In this sense, simple features can be described as an explicitly-stored PATH analogue, and is no different from the model used by shapefiles, binary blobs in databases, and many other spatial vector formats. There are a number of notable exceptions to this including TopoJSON, Eonfusion, PostGIS, QGIS geometry generators, Fledermaus, Mapbox, WebGL, Threejs, D3, AFrame, Lavavu but unfortunately there’s no overall scheme that can unify these richer structures.

The silicate family is composed of a small number of packages that apply the principles here, either to read from path forms or primitive forms. As work continues some of these will be incorporated into the silicate core, when that is possible without requiring heavy external dependencies.

Looking for a music reference? I always am: Child’s Play, by Carcass.

Please note that the ‘silicate’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

silicate's People

Contributors

Stargazers

Watchers

Forkers

vlulla cuulee milesmcbain mpadge

silicate's Issues

ear clipping

See rgl approach in #29

With lawn

## sf to list form
x <- geojsonio::geojson_list(minimal_mesh)

purrr::map(x$features, lawn::lawn_tesselate)

rgl triangulate nesting problem

the translator vignette needs ear clipping to demonstrate the differences with delaunay-constrained

Why does the nesting in the examples work but not with minimal_mesh?

(is it the shared boundary across features?)

models and verbs

Some notes, maybe something in it

library(silicate)
f <- list(sc_coord = sc_coord, sc_vertex = sc_vertex, 
          sc_node = sc_node, sc_arc = sc_arc, 
          sc_edge = sc_edge, sc_segment = sc_segment, 
          sc_path = sc_path, sc_object = sc_object, 
          sc_uid = sc_uid)

library(purrr)

## any of these entities that aren't clear might be *residual*
## i.e. a record of the objects they came from?
## - can we sensibly defined which entities fade-out? More important
## is when invalidated

## TRUE means it's currently an error, undefined


# sc_node - any vertex linked to by three or more edges
# sc_arc  - we need to traverse between nodes
# sc_segment - default to edges, but not well defined without grouping
# sc_path - without grouping these are the arcs
# sc_uid - should this work at all on SC? (the labels of edges?)
x <- SC(minimal_mesh)
purrr::map_lgl(map(f, function(a) safely(a)(x)), ~!is.null(.x$error))
#sc_coord  sc_vertex    sc_node     sc_arc    sc_edge sc_segment 
#FALSE      FALSE       TRUE       TRUE      FALSE       TRUE 
#sc_path  sc_object     sc_uid 
#TRUE      FALSE       TRUE 

## sc_uid - the object labels?
x <- PATH(minimal_mesh)
purrr::map_lgl(map(f, function(a) safely(a)(x)), ~!is.null(.x$error))
# sc_coord  sc_vertex    sc_node     sc_arc    sc_edge sc_segment 
# FALSE      FALSE      FALSE      FALSE      FALSE      FALSE 
# sc_path  sc_object     sc_uid 
# FALSE      FALSE       TRUE 

# sc_edge - this should be trivial
# sc_segment - implies a grouping record
# sc_path - same as arcs, but may be arc instances, or groups of arcs, or groups of edges
# sc_uid - ??
x <- ARC(minimal_mesh)
# purrr::map_lgl(map(f, function(a) safely(a)(x)), ~!is.null(.x$error))
# sc_coord  sc_vertex    sc_node     sc_arc    sc_edge sc_segment 
# FALSE      FALSE      FALSE      FALSE       TRUE       TRUE 
# sc_path  sc_object     sc_uid 
# TRUE      FALSE       TRUE 

# sc_node - does this mean anything? 
# sc_arc - I think these must be cycles in edges
#   with only one triangle, but really?
# sc_edge - obvious
# sc_segment - ordered cycles around triangles
# sc_path - invalid
# sc_uid - ??
x <- TRI(minimal_mesh)
purrr::map_lgl(map(f, function(a) safely(a)(x)), ~!is.null(.x$error))
sc_coord  sc_vertex    sc_node     sc_arc    sc_edge sc_segment 
# FALSE      FALSE       TRUE       TRUE       TRUE       TRUE 
# sc_path  sc_object     sc_uid 
# TRUE      FALSE       TRUE

duplicated vertex bogey

Thoughts:

duplicated and dplyr::distinct() give different answers
group_indices(x, y) and factor(paste(x, y)) are closely related (are they the same?)

Help wanted: some definitive answer on the difference between distinct and duplicated for numeric inputs. Especially in light of this (which might have changed with later dplyr versions)

hypertidy/anglr#7 (comment)

what in wrld_simpl crashes insert constraint?

Some combination of objects in wrld_simpl will crash

#context("segment-constraint")
library(laridae)
library(silicate)
library(sp)
library(sf)
library(dplyr)
prepare_sf_ct <- function(x) {
  tabs <- silicate::SC(x)
  segment <-  tibble::tibble(vertex_ = c(t(as.matrix(tabs$edge %>% dplyr::select(.vertex0, .vertex1))))) %>%
    inner_join(tabs$vertex) %>% mutate(segment_ = (row_number()-1) %/% 2)
  segs <- split(match(segment$vertex_, tabs$vertex$vertex_) - 1, segment$segment_)

  structure(list(x = tabs$vertex$x_, y = tabs$vertex$y_, segs = segs),
           class = "psat")
}
plot.psat <- function(x, ...) {
  plot(cbind(x$x, x$y))
  ind <- index(x)
  segments(x$x[ind[,1]], x$y[ind[,1]], x$x[ind[,2]], x$y[ind[,2]])
}
index <- function(x) {
  ind <- do.call(rbind, x$segs)
  t(apply(ind, 1, sort)) + 1
}


dat <- minimal_mesh
data("wrld_simpl", package = "maptools")
dat <- st_as_sf(wrld_simpl[3, ])
dat <- st_transform(dat, "+proj=laea")
psat <- prepare_sf_ct(dat)
plot(psat)
anyDuplicated(as.data.frame(index(psat)))
anyDuplicated(as.data.frame(cbind(psat$x, psat$y)))
system.time(segment_constraint(psat$x, psat$y, psat$segs))

fluidSpline

Hooray: https://github.com/yonicd/fluidSpline

simplification by edge collapse

This code is just rough, the key lessons are

remove duplicated arcs (reverse traversals are currently kept)
sc_arc should return unique arcs, another function for the arc_link_vertex (which gives opportunity to summarize the paths similarity and find non-topological neighbours)
nodes are always kept here, which is important, but other "important vertices" could be defined too, which ones are responsible for large corners etc.

library(silicate)
library(ozmaps)
x <- ozmaps::ozmap_data("state")
plot(rmapshaper::ms_simplify(x))

p <- silicate::PATH(x)
## all arcs here
arc <- sc_arc(p)
arc
library(dplyr)

## function to sub-sample an arc-vertex mapping, keeping the first and last (nodes)
## and returning early if we already have too few points
sub_group <- function(x, prop = 0.05, ...) {
  n <- max(c(5, nrow(x) * prop))
  if (n > (nrow(x)-2)) return(x)
  idx <- sort(sample(seq(2, nrow(x) - 1), n))
  dplyr::slice(x, c(1, idx, nrow(x)))
}

## remove duplicated arcs (they are non-unique when reversed, or sorted)
dps <- duplicated(arc %>% split(.$arc_) %>% purrr::map_chr(~paste(sort(.x$vertex_), collapse = "")))
a1 <- bind_rows((arc %>% 
                   split(.$arc_))[!dps])

## split the unique arcs and sub-sample
a1 <- a1 %>%  
  split(.$arc_) %>% 
  purrr::map_df(sub_group, prop = 0.05)

## plot what's left
plot(p$vertex[c("x_", "y_")], pch = "")
a1 %>% split(.$arc_)  %>% purrr::walk(~lines(inner_join(.x, p$vertex, "vertex_") %>% select(x_, y_)))

Constructors

PATH should take x,y DF possibly with groupings/order, so easy to build path lines/polys/points and flip to segment versions. This would be useful for crosstalk testing and straightforward trip structure

the name

sc
prim

generic constructor

I think this is finally a workable general solution for path-based structures. A build_type function needs to exist for all types, but each can take the exact same inputs:

geometry map - a table summarizing the paths, one row each
coordinates - the vertex instances in order as implied by the geometry map

Each specific form will need to be able to pass in other information, such as the crs, and each will need to be able to have an override type argument - so we can force path interpretation to be a particular kind, rather than per path or per object.

A prototype for sf is below, this started life in gibble (when the first attempt was used for mapedit), and was used within scsf a little - but all that now belongs here.

A round trip function below shows the utility of this, by decomposing an input to g-map/coord form and recomposing.

#' a pattern for building sf objects from 
#' - a gibble, the map of the paths
#' - the coordinates
#' the map is an encoding of the structural 
build_sf <- function(gm, coords_in, crs = NULL) {
  glist <- vector("list", length(unique(gm$object)))
  coords_in <- gm %>% dplyr::select(-type, -ncol, -nrow) %>%
    dplyr::slice(rep(seq_len(nrow(gm)), gm$nrow)) %>% dplyr::bind_cols(coords_in)
  ufeature <- unique(gm$object)
  for (ifeature in seq_along(ufeature)) {
    gm0 <- gm %>% dplyr::filter(object == ufeature[ifeature])
    type <- gm0$type[1]
    coord0 <- coords_in %>% dplyr::filter(object == ifeature)
    ## object becomes sub-feature element (not a hole, that is "part")
    coord0$path <- rep(seq_len(nrow(gm0)), gm0$nrow)
    glist[[ifeature]] <- switch(type,
                                POINT = sf::st_point(unlist(coord0 %>% dplyr::select(x_, y_))),
                                MULTIPOINT = sf::st_multipoint(as.matrix(coord0 %>% dplyr::select(x_, y_))),
                                LINESTRING = sf::st_linestring(as.matrix(coord0 %>% dplyr::select(x_, y_))),
                                MULTILINESTRING = sf::st_multilinestring(lapply(split(coord0 %>% dplyr::select(x_, y_), coord0$path), as.matrix)),
                                POLYGON = sf::st_polygon(lapply(split(coord0 %>% dplyr::select(x_, y_), coord0$path), as.matrix)),
                                MULTIPOLYGON = sf::st_multipolygon(lapply(split(coord0 %>% dplyr::select(x_, y_, path), coord0$subobject),
                                                                          function(path) lapply(split(path %>% select(x_, y_), path$path), as.matrix)))
    )
  }
  if (is.null(crs)) crs <- NA_crs_
  out <-   st_sfc(glist, crs = crs)
  out
}


## this version simply decomposes and then recomposes
round_trip <- function(x) {
  gm <- gibble::gibble(x)
  coord <- sc_coord(x)
  ## might be no object identifier, because this is only one
  if (is.null(gm[["object"]])) gm[["object"]] <- 1
  build_sf(gm, coord)
}

# this version also extract unique vertices, allows an operation on them (e.g. jitter) 
# and then restores the vertex identity, then recomposes
round_trip_topology <- function(x, ...) {
  coord <- sc_coord(x)
  gm <- gibble::gibble(x)
  path <- silicate::PATH(x)
  coord_unique <- path$vertex %>% dplyr::select(-vertex_) 
  
  #coord_unique[] <- lapply(coord_unique, jitter, ...)
  
  coord_unique[["vertex_"]] <- path$vertex[["vertex_"]]
  coord_out <- dplyr::inner_join(path$path_link_vertex, coord_unique) %>% dplyr::select(-vertex_)
  st_buffer(build_sf(gm, coord_out,  crs = st_crs(x)), dist = 0)
}

round_trip(sfzoo$multipolygon)
round_trip(sfzoo$polygon)
round_trip(sfzoo$multilinestring)
round_trip(sfzoo$linestring)
round_trip(sfzoo$multipoint)
round_trip(sfzoo$point)


data("holey", package = "spbabel")
plot(round_trip(st_as_sf(spbabel::sp(holey))), col = viridis::viridis(3))

DB prototype

A dplyr-back-end in SQLite as a prototype - with an in-memory data frame for the features, with a list-column of queries that can individually obtain their branch/primitive/vertex tables, inner-joined up to a single tibble as required.

I prototyped this idea in a different form here:

https://gist.github.com/mdsumner/f6d846497aeaf0c0b2fde186b6f81947

That relied on the standard database blob-form, of one table with the simple feature structure serialized into one blob, and a user-driven SQL-like query at the object-level. A list-column could store the right SQL for the actual feature itself, using the object-ID and filters to drill down into the tables.

Also see rough test here:

https://gist.github.com/mdsumner/e10c6e24e4bcd5f123c7a108f50d7048

prop

support silicate BGM

From @mdsumner on September 14, 2016 6:2

BGM needs

mixed quads and triangles (use extrude3d though later use RTriangle)
bgm data structure probably should be rangl from the start

Copied from original issue: hypertidy/anglr#33

need CRS round trip where available

Something like the rbgm "extra" table, what did rangl/gris do here?

mdsumner/scsf#4

good spdep example

https://stat.ethz.ch/pipermail/r-sig-geo/2017-June/025756.html

segment broken for single row

sc_segment(PATH(nc[1, ]))
Error in `[[<-.data.frame`(`*tmp*`, "path_", value = c("cea36531", "cea36531",  : 
  replacement has 26 rows, data has 0

what the shit

various formats obj, etc.

OBJ: https://gis.stackexchange.com/questions/245881/how-to-clip-mesh-files-with-spatial-masks

Somewhere I have a bunch of rgl-ones that it can read ...

sp lines not supported by SC

See partial solution to a SO question here, should not need the st_as_sf:

http://rpubs.com/cyclemumner/359943

OP here https://stackoverflow.com/questions/47598259/add-new-node-to-spatiallinesnetwork-in-stplanr

missing arc links

This should not be zero rows:

ARC(simpleworld)$object_link_arc %>% group_by(arc_) %>% tally() %>% filter(n > 1)
# A tibble: 0 x 2
# ... with 2 variables: arc_ <chr>, n <int>

ARC(minimal_mesh)$object_link_arc %>% group_by(arc_) %>% tally() %>% filter(n > 1)

names of objects

extending from this osmdata discussion, rownames are not a solution for naming objects, and more memory-intensive alternatives are almost certainly the only general way. Where and how to store these in a way that still allows for convenient dplyr join operations is the question? Also note that edges in sc::PRIMITIVEs should also be considered named/nameable objects in their own right - it seems at the moment that they just get dumped with a generic integer index, right?

sc_ funs for sp

rangl can now be consolidated across sp/sf, we can abandon spbabel::map_table and go with silicate

improve accumulation

This is better overall, to not-normalize as we go, but it breaks at a few Gb - is it when generating ids?

library(dplyr)
library(sf)
library(silicate)
f <- raadfiles::thelist_files("shp", "parcel") %>% 
  slice(c(14, 15, 18))

obv <- function(x, ...) {
  ## get the main stuff
  o <- sc_object(x)
  o[["object_"]] <- sc_uid(nrow(o))
  b <- sc_path(x, ids = o[["object_"]])
  
  v <- sc_coord(x)
  list(object = o, path = b, vertex = v)
}
x <- purrr::map(f$fullname, function(file) obv(read_sf(file)))
x <- purrr::map(purrr::transpose(x, ), bind_rows)
x$vertex <- dplyr::mutate(x$vertex, path_ = rep(x$path$path_, x$path$ncoords_))
key_col <- "vertex_"
V_names <- c("x_", "y_")


data <- x$vertex
main <- dplyr::select(data, V_names)
data <- dplyr::select(data, setdiff(names(data), names(main)))
# return(list(out, data))
idx <- dplyr::group_indices(group_by(main, !!!rlang::syms(names(main))))
#system.time(aa <- as.integer(factor(purrr::reduce(main[V_names], paste))))
main[[key_col]] <- idx
main <- distinct(main)
data[[key_col]] <- idx

id <- silicate::sc_uid(n = nrow(main))
x$vertex <- dplyr::mutate(data, vertex_ = id[data[[key_col]]])
x$path_link_vertex <- dplyr::mutate(main, vertex_ = id[main[[key_col]]])

key examples

Key examples

round-tripping
multi level features as custom types
trip data as grouped points with measurements, also indexed as lines
Level-3 bins for ocean colour
BGM doubly-connected edge-list model
continuous variation across polygon surfaces
vector data fusion
wireframe curvilinear rasters

Lossless round-tripping

we can decompose a wide range of types and convert to other forms
sf to TopoJSON to leaflet projected map to topologically-sound sf object

Multi-level custom types

state and county sharing vertices, note rules on internal rings (discard them, leave one level for simple features)
pre-computed topologically-sound resolutions (i.e. district precision for a state boundary vs. country precision)
a voyage track, intervals within it, fine-resolution underway measures, specific stations
a fused surface, a combination of contour lines with a constant Z fused with polygons without elevation

topological fixes for simple features: https://github.com/r-gris/toposhop
loose collection of examples from gris: https://github.com/r-gris/grisexamples

review faster_as_tibble

This is real hack to speed up some cases of building tibbles, and it caused problems here

hypertidy/anglr#55

Code was assuming a list with a valid length(x[[1]]) vector, but it might be a 0-column data frame. dim() works for one, and length works for the other.

Hopefully remove this hack with a fast-enough replacement

Centroids

We need box, inner, and weighted at the very least.

Maybe also:

locally projected laea (or conformal?) centroid box/inner/weighted

Does weighted mean by the coordinates, or the lines - somehow?

Mangrove TDS

https://github.com/davidcanino/mangrovetdslib

crs metadata

silicate must keep the metadata from the beginning, rangl was performing this task

summary functions

Try the skimr approach, a good partnership with the spdep extractions.

ropensci/skimr#88

See ideas here: https://twitter.com/jakub_nowosad/status/872263367599194117

monotone polygons

These will be interesting as a multiple-groupings example.

Start here: https://rpubs.com/cyclemumner/pmonotone

Four coords in triangle

Update rationale

aggregation of PATH/PRIMITIVE objects

Might be jumping ahead of things, but dodgr has made me realise there'll be a world of applications which involve analysing relatinoships between objects. Classic example is graph/network relationships and associated metrics. In these cases, computational efficiency scales directly with numbers of objects. Numbers of PATH/PRIMITIVE (hereon, "PP") objects will often be very large, but many of these may be effectively redundant. Vertices lying on a single line, for example, can all be represented by start and end points alone (along with line length, or whatever metric), and so aggregated into a single object. Relationships between lines can then be analysed much more efficiently.

As always, an OSM example: Lines tend on average to be composed of around 10 nodes (vertices), so aggregating PP objects into higher-level line objects results in network-bsaed analyses able to be run around 10 times faster. That kind of speed up is hugely important. The dodgr issue that dicussed some of this stuff reveals the trade-offs between analytical exactness of the PP approach versus computational efficiency of more conventional sp/sf approach. The latter is inexact yet efficient; the former the oppposite. Making PP analysis more efficient simply requires identifying the appropriate level of aggregation, and constructing higher level objects.

The good news: I've done all the work here for point-to-line aggreagation, via pretty highly optimised C++ code, in dodgr. The not-so-good-news: There's be an unlimited number of other use cases, and other means and forms of aggregation that have yet to be identified, and that we'd have to identify. Aggregation of triangles is an obviuosly necssary extension, but maybe there'd be a neat CGAL solution there? Anywa, I could at least start by porting the dodgr approach to aggregation of PP objects into higher-level lines. These also very neatly correspond to sf::LINESTRING objects, so would be a neat way to convert silicate into sf in a way that is much more compact than current direct conversion.

Lots of words, maybe not much can be done at the mo with this issue, but keen to hear your thoughts.

to_sf - conversion to other formats from generic data

Discussion here: UrbanAnalyst/dodgr#43 (comment)

dodgr has dodgr_sf (creates sfc, not sf)
osmdata has osmdata_sf (creates sf)
spbabel has sp (creates sp from fortity-equivalent)
silicate has :::build_sf (creates sfc, grew out of gibble/mapedit exploration)
ggplot2 now has x %>% group_by(path) %>% ggplot(aes(x, y, id = )) ## no need for group =
trip dev has trip2(df, x, y, time, id) - using tidyeval https://github.com/Trackage/trip/blob/aes/R/gg-aesthetics.R#L20

I wrote that list because I can't keep track of where all this is, and I know there's a lot more.

Converting silicate forms to sf is actually pretty straightforward, but we want a more general language of specifiying coordinates and relations and groupings for richer forms like transport and GPS data.

I will try

write a stripped down trip based on the tidyeval trip2 idea
hack a geom_trip method for it, order is implicit within groups, coordinates are nominated
extracting trip-likes and sf-forms from ggplot_build forms

@mpadge and @Robinlovelace this might be unintelligible but will appreciate any discussion if there's relevant examples of code/idioms this triggers

core silicate (write definitions out)

We've established the worker verbs

sc_coord, sc_path, sc_edge, sc_segment, sc_vertex, sc_node, sc_arc, sc_uid

the universal model

and the exemplary models

PATH, ?

Working in pslg branch

remove PRIMITIVE
create SC, consider removing PATH
fix sc_arc to return the entities, replace with contracted graph
make sc_uid generic
make sc_uid able to retain incoming labels, or create them if needed
incorporate dodgr workers
write about the core

translators with the universal central form
relation to interactive graphics
examples with representing gg objects
relevance to tracking and routing
relevance to rgl, svg,

generic TODO/explore list

This uses very similar idioms: https://github.com/cran/RArcInfo

It's designed for .e00 format which is natively "arc-node" topology, i.e. the same as the underlying structures in the maps package, and pretty much what polygons in D3 uses. There's also code to read .adf coverages (but that might be raster data or TINs, or either).

WIP sc/igraph

library(sf)
library(dplyr)
library(sc)
library(igraph)
b1 <- function(x) {
  op <- par(mfrow = c(3, 1), mar = rep(0.2, 4))
  #on.exit(op)
  x %>% st_sfc() %>% st_sf() %>% plt() %>% 
  sc::PRIMITIVE() %>%   txt() %>% `[[`("segment") %>% 
  select(.vertex0, .vertex1) %>% as.matrix() %>%  
  graph() %>% plot()
}
plt <- function(x) {plot(x); x}
txt <- function(x) {text(x$vertex[, c("x_", "y_"), ], lab = x$vertex$vertex_); x}

st_linestring(cbind(c(0, 0), c(0, 1)))  %>%  b1() 

st_linestring(cbind(c(0, 0, 1), c(0, 1, 1))) %>% b1()

st_linestring(cbind(c(0, 0, 1, 0), c(0, 1, 1, 0))) %>% b1()

st_polygon(list(cbind(c(0, 0, 1, 0), c(0, 1, 1, 0)))) %>% plt() %>%  b1() 

p_self %>% b1()


  viridis::viridis(n)
}
p_self <- st_sfc(st_polygon(list(cbind(c(0, 0, 1, -0.1, -0.1, 1.5, 0), c(0, 1, 1, 0.2, 1.2, 1.2, 0)))))

p_self %>% b1()
plot(sfct::ct_triangulate(st_sf(p_self)), col = scales::alpha("grey", 0.3))

Ggplot2 object - retrieving creation path

https://gist.github.com/hrbrmstr/8bb810fb33646717ff4a1091a8c9fe91

https://twitter.com/JennyBryan/status/755417584359632896

https://twitter.com/hrbrmstr/status/755424521079914496

the UID thing

Some types come in with names of things at every level (OSM) and some things come in with names at no levels (sf), and others with names only at the first object-level (sp, and sf sometimes).

The generic sc_uid should handle these by either preserving existing ones and creating new ones as needed.

It would be nice to be able to override these and nominate UIDs from a column or direct input.

The question of how to maintain a global pool of ever-unique UIDs remains,

1D topology as paths

This is a very general requirement, it sits right in the middle of the limitations of simple features.

Polygons as paths are a very specialist structure, they don't really occur naturally - we tend to draw them. Patches of areas within a plane are very natural however, but they occur in less direct ways such as clusters of finite elements with either implicit boundaries (i.e. rasters with contiguous values, or non-vary measurments within point patters).

Paths though do occur very naturally, any sensor data stream measured over time is a path. It might not be appropriate to consider it as continuous between samples, but it still makes sense in that it's an ordered set of observations.

tidy grammars excel at deriving structure from users in this way

d %>% 
   group_by(ID)  %>% ## grouping is optional, might be only one group
   arrange(datetime)  %>% ## traditionally we'd arrange(group, datetime), treating "group" especially
   ggplot(aes(x = lon, y = lat, colour = depth))   ## etc. etc.

At the end of the pipeline above the only option we have currently is to create visualizations, but better would be to be able to format the user-expressed organization into anything we want, something like osmdata_sf does from its data streams, but we want every other kind of format as well, including rgl, trip, any number of htmlwidgets inputs, and very many modelling frameworks like adehabitatLT and move and so on.

I think ggvis was a better stab at this idea, but its idea of "primitives" is very much about the objects in the browser rather than those intermediate forms that so many other data structures are composed of. The tidyverse really needs this.

This concept pops up in so many places I've got a lot of fragmented pieces, so this list exists to hopefully help me pull them all together eventually. The crux is having facilities for paths as directed segments, i.e. topology independent of any geometry (data frames can store any number of dimensions of coordinates).

See rebuilding polygons from filtered segment-graph here:
https://rpubs.com/cyclemumner/291559
https://rpubs.com/cyclemumner/291560
https://rpubs.com/cyclemumner/291561

This is where I first did a "cycle-detection" to rebuild polygons from primitives:
https://github.com/mdsumner/gris/blob/master/R/trianglePrimitives.R

Segments for paths to help with vector field / trajectory comparison

Trackage/pathos#1

https://github.com/r-gris/rangl/blob/master/R/rangl-trip.r

https://github.com/Trackage/trip/blob/aes/R/gg-aesthetics.R

https://github.com/r-gris/pathos

mdsumner/sctrip#2

Readme notes

The Lens: add no true surfaces as 4) with TIN, multipatch and the "exotic" types as limited exceptions. Regular rasters and texture mapping and conflating dimension with fields, topology with geometry.

triangulation solution

This looks like a good candidate: https://github.com/mapbox/earcut.hpp

That might be relatively simple to bundle into Rcpp, @mpadge any opinion?

rgl's ear-clipping has been split out into https://github.com/mdsumner/earclip.rgl, but it's not fast or robust
lawn has ear-clipping but it's a bit deep in conversions and complicated to get data in and out
RTriangle is pretty fast and robust, but has non-CC licensing
CGAL is fast as hell but super-complicated

tidygraph solutions

tidygraph solves the unique naming problem for vertices:

library(scgraph)
library(scsf)
g <- as_tbl_graph(scgraph::sc_as_igraph( PRIMITIVE(minimal_mesh)))
g %>% mutate(name = 1:14)
g %>% mutate(name = 1:14) %>% plot()

check sp objects

Why does this fail when run on wrld_simpl without convertiing it to sf?

data("wrld_simpl", package = "maptools")
#edges are unique segments (undirected)
#segments are directed instances of edges, belonging to arcs/ways or feature boundaries
library(silicate)
library(sf)
edges <- sc_edge(st_as_sf(wrld_simpl))

sc_edge(wrld_simpl)
# Error in `[[<-.data.frame`(`*tmp*`, "object_", value = c("cf17bd61", "d857331e",  : 
#  replacement has 246 rows, data has 3768

TDAmapper

https://github.com/paultpearson/TDAmapper/

sc_edge fails on single row sf

From @mdsumner on December 18, 2017 12:12

sc_edge(nc[1, ])


## causes anglr to fail
anglr(nc[1, ])
 Error in `[[<-.data.frame`(`*tmp*`, "path_", value = c("3f24a604", "3f24a604",  : 
  replacement has 26 rows, data has 0

Copied from original issue: hypertidy/anglr#58

arc-node topolojson

sc_node returns the nodes

sc:::sc_edge returns the unique edges, from which the nodes are derived

still need sc_arc to rebuild the arcs themselves from the unique-edge paths for each ring

It's not obvious yet what these functions apply to, probably always only to the PRIMITIVE model directly, since the unique IDs need to be kept in synch.

ESRI's terminology

Interesting stuff here, the terminology is terribly confusing but might give some clues about what happened

https://gis.stackexchange.com/questions/747/what-is-arcgis-for-desktop-and-geodatabase-equivalent-to-regions-of-arcinfo-wo

https://gis.stackexchange.com/questions/249581/how-is-the-arc-coverage-regions-data-format-represented-in-shapefiles-and-geod

boundary adventures

This is a good one:

https://gis.stackexchange.com/questions/258184/four-states-each-of-six-pairs-of-which-have-a-common-boundary

Merge logic with this: https://rpubs.com/cyclemumner/neighbours02

normalization refactor

This the key steps to path normalization.

devtools::load_all()
obj <- polymesh
x <- PATH(obj)


## PATH is created by doing these steps

## non-normalized vertex pool
coord <- sc_coord(obj)
## map of instances of vertices in the original structure
gmap <- gibble::gibble(obj)

## normalize the vertices (a coordinate is an instance of a vertex)
coordspace <- c("x_", "y_")
## classify all coordinates by unique instances (not robust, assumes 100% accurate)
pastey <- function(...) paste(..., sep = "-")
coordclass <- factor(do.call(pastey, lapply(coord[coordspace], function(x) as.character(x))))
library(dplyr)
vertex <- coord[!duplicated(coordclass), ] %>% 
  dplyr::mutate(vertex = row_number())
path_link_vertex <- tibble::tibble(path = rep(sc_uid(nrow(gmap)), gmap[["nrow"]]), vertex = match(coordclass, coordclass[!duplicated(coordclass)]))


library(ggplot2)
library(ggpolypath)
path_link_vertex %>% 
  inner_join(vertex) %>%
  ggplot() + 
  geom_polypath(aes(x_, y_, group = path, fill = path))

Performance

Open questions:

situations when fastest to triangulate per feature (ID is given) versus dataset wide (using triangle-centroid to ID), perhaps different depending on shared boundaries
should we remove reverse-direction segments in the PSLG? I don't think Triangle cares, needs benchmarking
should we give the hole centroids to RTriangle?

It takes about 8 seconds to triangulate wrld_simpl to a triangle GC. It's less than a second for nc, after the fix below:

st_polygon is slow, so avoid when possible, just use structure(list(mat), class = c("XY", "POLYGON", "sfg")) when you know what you're doing.
1848ce0

reproj

we really need this! We can't be converting to weird types all the time, in this code

rgl::rgl.clear()
topo <- raster::raster(system.file("extdata", "gebco1.tif", package = "anglr"))
poly <- subset(wrld_simpl, NAME == "India")
prj <- "+proj=laea +lon_0=80 +lat_0=45 +datum=WGS84"
poly_z <- anglr(poly, z = topo * 68, max_area = 0.02)
poly_z$v[c("x_", "y_")] <- rgdal::project(as.matrix(poly_z$v[c("x_", "y_")]), prj)
poly_z$meta$proj <- prj
plot(poly_z)
rgl::rglwidget()

What really should happen is that disparate types get put together and we define out output projection. It's crazy to have to reproject afterwards, just because the x_, y_ need to be in longlat for the raster

Dig up reproj ...