Code Monkey home page Code Monkey logo

od's Introduction

od

CRAN status rstudio mirror downloads Codecov test coverage R build status Dependencies R-CMD-check

The goal of od is to provide functions and example datasets for working with origin-destination (OD) datasets. OD datasets represent “the volume of travel between zones or locations” (Carey et al. 1981) and are central to modelling city to global scale transport systems (Simini et al. 2012).

Installation

You can install the released version of od from CRAN with:

install.packages("od")

Install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("itsleeds/od", build_vignettes = TRUE)

The examples below provide a gentle introduction to the package. For a more detailed introduction to the package and OD data in general, see the od vignette online at itsleeds.github.io/od or by executing the following command after installing the package:

vignette("od")

You can find descriptions of each of the package’s functions with reproducible examples on the package’s web page: https://itsleeds.github.io/od/reference/index.html

Motivation

The package originated as a set of functions in the package stplanr for working with origin-destination data. The od2line() function, for example, takes a data frame and a spatial object as inputs and outputs geographic lines representing movement between origins and destinations:

library(od) # load example datasets
od_data_df # OD data as data frame
#>   geo_code1 geo_code2  all train bus taxi car_driver car_passenger bicycle foot
#> 1 E02002384 E02006875  966    14 153   14         69            18      13  679
#> 2 E02002404 E02006875 1145     6 174   17         96            38      10  798
#> 3 E02006875 E02006875 1791    21  38    5         69             7       8 1637
#> 4 E02006876 E02006875 1035    11 132    6         97            24      10  749
#> 5 E02006861 E02002392  453     1  51    0         51             6      26  317
#> 6 E02006875 E02002392  286     2  15    5         16             2      10  235
#> 7 E02002392 E02006875  753    10  91   21         33             7      19  571
od_data_centroids[1:2, ]
#>    geo_code             geometry
#> 1 E02002407 -1.609934, 53.790790
#> 2 E02002336   -1.62463, 53.88605
desire_lines_stplanr = stplanr::od2line(od_data_df, od_data_centroids)
desire_lines_stplanr[1:2, 1:9]
#> Simple feature collection with 2 features and 9 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -1.545708 ymin: 53.7923 xmax: -1.518911 ymax: 53.80925
#> Geodetic CRS:  WGS 84
#>   geo_code1 geo_code2  all train bus taxi car_driver car_passenger bicycle
#> 1 E02002384 E02006875  966    14 153   14         69            18      13
#> 2 E02002404 E02006875 1145     6 174   17         96            38      10
#>                         geometry
#> 1 LINESTRING (-1.545094 53.80...
#> 2 LINESTRING (-1.518911 53.79...

It works great, and is plenty fast enough for most applications, but there are some issues with stplanr::od2line() (which also affect the other od_*() functions in stplanr):

  • The function is a commonly needed and low-level function, buried in a large package, reducing ‘findability’
  • To get the function you must install stplanr plus its numerous dependencies
  • The function has not been optimised
  • It has no class definition of ‘od’ data

The od package addresses the first three of these issues (it may at some point define a class for od objects but there are no immediate plans to do so).

The equivalent code in the od package is as follows:

desire_lines_od = od_to_sf(od_data_df, od_data_centroids)
#> 0 origins with no match in zone ids
#> 0 destinations with no match in zone ids
#>  points not in od data removed.

The result is an sfc object that has the same geometry as the output from od2line:

desire_lines_od$geometry[1:2]
#> Geometry set for 2 features 
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -1.545708 ymin: 53.7923 xmax: -1.518911 ymax: 53.80925
#> Geodetic CRS:  WGS 84
#> LINESTRING (-1.545094 53.80925, -1.545708 53.79...
#> LINESTRING (-1.518911 53.7923, -1.545708 53.79593)
desire_lines_stplanr$geometry[1:2]
#> Geometry set for 2 features 
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -1.545708 ymin: 53.7923 xmax: -1.518911 ymax: 53.80925
#> Geodetic CRS:  WGS 84
#> LINESTRING (-1.545094 53.80925, -1.545708 53.79...
#> LINESTRING (-1.518911 53.7923, -1.545708 53.79593)

These are ‘desire lines’ representing the shortest (straight line) path between two centroids and can plotted using geographic data and mapping packages such as sf, mapview, tmap and mapdeck, e.g.:

plot(desire_lines_od)
plot(desire_lines_stplanr$geometry)

By default the package uses the sfheaders package to create sf objects for speed. You can can also specify sf outputs as follows:

desire_lines_od_sf1 = od_to_sf(od_data_df, od_data_centroids)
#> 0 origins with no match in zone ids
#> 0 destinations with no match in zone ids
#>  points not in od data removed.

Performance

Benchmark on a small dataset:

nrow(od_data_df)
#> [1] 7
bench::mark(check = FALSE, max_iterations = 100,
  stplanr = stplanr::od2line(od_data_df, od_data_zones),
  od = od_to_sfc(od_data_df, od_data_zones),
  od_sf1 = od_to_sf(od_data_df, od_data_zones),
  od_sf2 = od_to_sf(od_data_df, od_data_zones, package = "sf", crs = 4326)
)
#> # A tibble: 4 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 stplanr      5.12ms   5.84ms      171.   607.1KB     4.16
#> 2 od           2.49ms   2.73ms      358.    78.1KB     3.62
#> 3 od_sf1       3.62ms   4.07ms      241.    77.8KB     4.92
#> 4 od_sf2       3.53ms   4.02ms      246.    90.5KB     5.01

Related open source projects

  • stplanr is an R package package designed to support transport planning, with a focus on geographic transport datasets and many functions for working with OD data in the od function family.
  • cartography is an R package with functions for working with OD data, including getLinkLayer()
  • gravity is an R package for developing ‘gravity models’ to estimate flow between zones
  • flowmap.gl, a JavaScript package for visualising OD data
  • Arabesque is another JavaScript project for working with OD data

Code of Conduct

Please note that the od project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

od's People

Contributors

joeytalbot avatar mem48 avatar robinlovelace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

od's Issues

maxdist argument for points_to_od

Happy to contribute this if you think it would be useful. The ideas is that you have points and want to make an OD datasets but cap the maximum distance considered. e.g. how we capped the PCT at 30 km.

Would need to add geodists as a dependancy.

Disaggregation code failing

Heads-up @natesheehan I'm working on this as fix for the bg traffic issue. Reproducible code:

# Aim: test the od_disaggregate() function

library(tidyverse)
library(tmap)

u = "https://github.com/cyipt/actdev/raw/main/data-small/poundbury/desire-lines-many.geojson"
desire_lines_many = sf::read_sf(u)

zones_msoa_national = pct::get_pct(layer = "z", geography = "msoa", national = TRUE)
centroids_lsoa_national = pct::get_pct(layer = "c", national = TRUE)
zones_lsoa_national = pct::get_pct(layer = "z", national = TRUE, geography = "lsoa")
zones_many = zones_msoa_national %>% filter(geo_code %in% desire_lines_many$geo_code2)
centroids_lsoa_many = centroids_lsoa_national[zones_many, ]
zones_lsoa_many = zones_lsoa_national %>% filter(geo_code %in% centroids_lsoa_many$geo_code)

qtm(zones_many, borders.lwd = 3) +
  qtm(desire_lines_many) +
  tm_shape(zones_lsoa_many) + tm_borders("red", lty = 3)

desire_lines_many = od::od_disaggregate(od = desire_lines_many %>% select(geo_code1:all_base), z = zones_many, subzones = zones_lsoa_many)

Result

image

onewayid to support more than just sum

Issues / note to self

in Stplanr onewayid family can only sum. I've come accross a usecase for wanting another function.

I have some od data where the total flow between A and B is reported for both A > B and B > A, but due to counting errors the number are slightly different. In this case the max or min value would be needed not the sum, which would double the overall flow.

Also is na.rm supported?

CRAN submission issues

See here: https://win-builder.r-project.org/Q79aW1rb3yrs/00check.log

Only minor issue is:

* checking R code for possible problems ... [8s] NOTE
Found if() conditions comparing class() to string:
File 'od/R/oneway.R': if (class(x) == "factor") ...
File 'od/R/oneway.R': if (class(y) == "factor") ...
File 'od/R/oneway.R': if (class(x) == "factor") ...
File 'od/R/oneway.R': if (class(y) == "factor") ...
Use inherits() (or maybe is()) instead.

cartography::getLinkLayer()

FYI,
cartography::getLinkLayer() does something similar to stplanr::od2line() and od::od_to_sf() but is clearly slower...

library(od)
bench::mark(check = FALSE, max_iterations = 100,
            stplanr = stplanr::od2line(od_data_df, od_data_zones),
            cartography = cartography::getLinkLayer(x = od_data_zones,df = od_data_df),
            od = od_to_sfc(od_data_df, od_data_zones),
            od_sf1 = od_to_sf(od_data_df, od_data_zones),
            od_sf2 = od_to_sf(od_data_df, od_data_zones, package = "sf", crs = 4326)
)
#> # A tibble: 5 x 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 stplanr        7.2ms   7.41ms      134.    75.7MB     2.84
#> 2 cartography   7.95ms   8.09ms      121.   490.4KB     2.05
#> 3 od            2.69ms   2.74ms      362.   515.9KB     0   
#> 4 od_sf1        3.16ms   3.21ms      308.    25.5KB     3.11
#> 5 od_sf2        3.35ms   3.42ms      289.   104.9KB     2.92

Created on 2020-02-12 by the reprex package (v0.3.0)

`od_disaggregate()` should avoid repeated use of sample points

Example highlighting the issue:

set.seed(2021)
disag = od::od_disaggregate(od::od_data_df[1:2, ], z = od::od_data_zones_min)
#> Creating randomly sampled origin and destination points.
coords_origin = lwgeom::st_startpoint(disag)
#> Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1
coords_destination = lwgeom::st_endpoint(disag)
summary(duplicated(coords_origin))
#>    Mode   FALSE    TRUE 
#> logical      26      17
summary(duplicated(coords_destination))
#>    Mode   FALSE    TRUE 
#> logical      29      14
plot(disag[0])

Created on 2021-11-28 by the reprex package (v2.0.1)

Add Malcolm to author list

Any objections to being added here @mem48 ?

od/DESCRIPTION

Lines 4 to 8 in 185bb31

Authors@R: c(
person("Robin", "Lovelace", email = "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-5679-6536")),
person("David", "Cooley", role = c("ctb"))
)

Many thanks for the contribution, next step: upstream this functionality to simodels (although of course the steps can be separated and we can just mention max_dist in the documentation.

od_aggregate: make sure aggzones are not intersecting

when using od_aggregate, st_join sometimes joins 2 polygons to an o or d, due to the fact that aggzones are overlapping, and the o or d resides in both.
the function should throw an error when aggzones are intersecting.

I'll supply a reprex if requested

Review and merge with `odf`(?)

Hi Robin (cc @agila5 and @rogerbeecham),

Great to see your progress on the od package!

Finally, I have made some progress on 'my' od package, which I called odf. As you will see below, odf doen't have as much features as od. I have put the emphasis on the data structure, for which I used a graph structure. However, I have a different approach than I had in mind during my visit in Leeds: now I'm following a much more light-weighted approach.

Could you take a look and see if it's useful for your package? I've rendered a pdf from the vigette and also uploaded our draft paper. See also the latest version of the 'halfline' doughnut visualization, for which I've partly used odf.

I have the feeling that we can merge our approaches quite well. Let me know what you think.

od_disaggregate fails with hard-to-interpret error messages when passed non-numeric columns

E.g. from project with @joeytalbot 👍

od_intrazonal_disag1 = od::od_disaggregate(od = od_intrazonal_original, z = zones, subpoints = highways_osm_centroids_ed, population_column = "all", population_per_od = 100)
although coordinates are longitude/latitude, st_intersects assumes that they are planar
although coordinates are longitude/latitude, st_intersects assumes that they are planar
33 locations outside zones removed
although coordinates are longitude/latitude, st_intersects assumes that they are planar
although coordinates are longitude/latitude, st_intersects assumes that they are planar
Error in floor(x) : non-numeric argument to mathematical function
In addition: Warning message:
In Ops.factor(x, nrow(od_new)) : ‘/’ not meaningful for factors

Link to gravity package

@pachamaltese is the maintainer of the really useful gravity package which provides a perfect partner for this package idea - this package could then provide a perfect interface and data-representation system to directly and simply fit a variety of interaction models onto OD data. gravity current accepts arbitrary input and requires users to specify which columns represent the flow and distance columns, along with possible additional regressors, so interface would be easily extended to here.

od_interzone and od_intrazone

Thinking these are pretty fundamental functions for working with od data, for extracting only inter-zone and intra-zone pairs respectively. What do you think of this implementation @mem48 ?:

# todo: export at some point
od_interzone = function(x) {
 x[!x[[1]] == x[[2]], ]
}
od_intrazone = function(x) {
 x[x[[1]] == x[[2]], ]
}
library(od)
od_data = points_to_od(od_data_centroids)
nrow(od_data)
#> [1] 11449
nrow(od_interzone(od_data))
#> [1] 11342
nrow(od_intrazone(od_data))
#> [1] 107

Created on 2020-03-09 by the reprex package (v0.3.0)

Dev version of od causing simodels to fail

Changes to worse in reverse depends:

Package: simodels
Check: examples
New result: ERROR
Running examples in ‘simodels-Ex.R’ failed
The error most likely occurred in:

base::assign(".ptime", proc.time(), pos = "CheckExEnv")

Name: si_calculate

Title: Calculate flow using a pre-existing function

Aliases: si_calculate

** Examples

od = si_to_od(si_zones, si_zones, max_dist = 4000)
Converting p to centroids
1695 OD pairs remaining after removing those with a distance greater than 4000 meters:
15% of all possible OD pairs
Error in dplyr::inner_join():
! Join columns in x must be present in the data.
✖ Problem with O.
Backtrace:

  1. └─simodels::si_to_od(si_zones, si_zones, max_dist = 4000)
  2. ├─dplyr::inner_join(od_df, origins_to_join, by = "O")
  3. └─dplyr:::inner_join.data.frame(od_df, origins_to_join, by = "O")
  4. └─dplyr:::join_mutate(...)
    
  5.   └─dplyr:::join_cols(...)
    
  6.     └─dplyr:::check_join_vars(by$x, x_names, by$condition, "x", error_call = error_call)
    
  7.       └─rlang::abort(bullets, call = error_call)
    

Execution halted

Package: simodels
Check: re-building of vignette outputs
New result: ERROR
Error(s) in re-building vignettes:
...
--- re-building ‘simodels.Rmd’ using rmarkdown
--- finished re-building ‘simodels.Rmd’

--- re-building ‘sims-first-principles.Rmd’ using rmarkdown
** Processing: /home/hornik/tmp/CRAN/simodels.Rcheck/vign_test/simodels/vignettes/sims-first-principles_files/figure-html/inputs-1.png
288x288 pixels, 3x8 bits/pixel, RGB
Input IDAT size = 42755 bytes
Input file size = 42893 bytes

Trying:
zc = 9 zm = 8 zs = 0 f = 0 IDAT size = 32240
zc = 9 zm = 8 zs = 1 f = 0
zc = 1 zm = 8 zs = 2 f = 0
zc = 9 zm = 8 zs = 3 f = 0
zc = 9 zm = 8 zs = 0 f = 5
zc = 9 zm = 8 zs = 1 f = 5
zc = 1 zm = 8 zs = 2 f = 5
zc = 9 zm = 8 zs = 3 f = 5

Selecting parameters:
zc = 9 zm = 8 zs = 0 f = 0 IDAT size = 32240

Output IDAT size = 32240 bytes (10515 bytes decrease)
Output file size = 32318 bytes (10575 bytes = 24.65% decrease)

Quitting from lines 99-108 [unnamed-chunk-3] (sims-first-principles.Rmd)
Error: processing vignette 'sims-first-principles.Rmd' failed with diagnostics:
ℹ In argument: O.
Caused by error:
! object 'O' not found
--- failed re-building ‘sims-first-principles.Rmd’

Row sums do not add up to totals

library(od)
od = od_data_df[1:2, ]
zones = od::od_data_zones_min
subzones = od_data_zones_small
od_disag = od_disaggregate(od, zones, subzones)
#> Converting subzones to centroids
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
ncol(od_disag) -1 == ncol(od) # same number of columns (except disag data gained geometry)
#> [1] FALSE
sum(od_disag[[3]]) == sum(od[[3]])
#> [1] TRUE
sum(od_disag[[4]]) == sum(od[[4]])
#> [1] TRUE
plot(rowSums(sf::st_drop_geometry(od_disag)[4:10]), od_disag[[3]])

Created on 2021-04-30 by the reprex package (v2.0.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.