Code Monkey home page Code Monkey logo

osmextract's Introduction

osmextract

R build status Codecov test coverage peer-review Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN status

The goal of osmextract is to make it easier for people to access OpenStreetMap (OSM) data for reproducible research. OSM data is the premier source of freely available, community created geographic data worldwide. We aim to enable you to extract it for data-driven work in the public interest.

osmextract matches, downloads, converts and imports bulk OSM data hosted by providers such as Geofabrik GmbH and bbbike. For information on alternative providers and how to add them see the providers vignette.

Why osmextract?

The package answers a common question for researchers who use OSM data: how to get it into a statistical environment, in an appropriate format, as part of a computationally efficient and reproducible workflow? Other packages answer parts of this question. osmdata, for example, is an R package that provides an R interface to the Overpass API, which is ideal for downloading small OSM datasets. However, the API is rate limited, making it hard to download large datasets. As a case study, try to download all cycleways in England using osmdata:

library(osmdata)
cycleways_england = opq("England") %>% 
  add_osm_feature(key = "highway", value = "cycleway") %>% 
  osmdata_sf()
# Error in check_for_error(doc) : General overpass server error; returned:
# The data included in this document is from www.openstreetmap.org. The data is made available under ODbL. runtime error: Query timed out in "query" at line 4 after 26 seconds. 

The query stops with an error message after around 30 seconds. The same query can be made with osmextract as follows, which reads-in almost 100k linestrings in less than 10 seconds, after the data has been downloaded in the compressed .pbf format and converted to the open standard .gpkg format. The download-and-conversion operation of the OSM extract associated to England takes approximately a few minutes, but this operation must be executed only once. The following code chunk is not evaluated.

library(osmextract)

cycleways_england = oe_get(
  "England",
  quiet = FALSE,
  query = "SELECT * FROM 'lines' WHERE highway = 'cycleway'"
)
par(mar = rep(0.1, 4))
plot(sf::st_geometry(cycleways_england))

The package is designed to complement osmdata, which has advantages over osmextract for small datasets: osmdata is likely to be quicker for datasets less than a few MB in size, provides up-to-date data and has an intuitive interface. osmdata can provide data in a range of formats, while osmextract only returns sf objects. osmextract’s niche is that it provides a fast way to download large OSM datasets in the highly compressed pbf format and read them in via the fast C library GDAL and the popular R package for working with geographic data sf.

Installation

You can install the released version of osmextract from CRAN with:

install.packages("osmextract")

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("ropensci/osmextract")

Load the package with:

library(osmextract)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
#> Check the package website, https://docs.ropensci.org/osmextract/, for more details.

To use alongside functionality in the sf package, we also recommend attaching this geographic data package as follows:

library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

Warnings:

The functions defined in this package may return a warning message like

st_crs<- : replacing crs does not reproject data; use st_transform for that 

if the user is running an old version of GDAL (<= 3.0.0) or PROJ (<= 6.0.0). See here for more details. Nevertheless, every function should still work correctly. Please, raise a new issue if you find any odd behaviour.

Basic usage

Give osmextract a place name and it will try to find it in a list of names in the specified provider (Geofabrik by default). If the name you give it matches a place, it will download and import the associated data into R. The function oe_get() downloads (if not already downloaded) and reads-in data from OSM providers as sf objects. By default oe_get() imports the lines layer, but any layer can be read-in by changing the layer argument:

osm_lines = oe_get("Isle of Wight", stringsAsFactors = FALSE, quiet = TRUE)
osm_points = oe_get("Isle of Wight", layer = "points", stringsAsFactors = FALSE, quiet = TRUE)
nrow(osm_lines)
#> [1] 51226
nrow(osm_points)
#> [1] 67783
par(mar = rep(0, 4))
plot(st_geometry(osm_lines), xlim = c(-1.59, -1.1), ylim = c(50.5, 50.8))
plot(st_geometry(osm_points), xlim = c(-1.59, -1.1), ylim = c(50.5, 50.8))

The figures above give an insight into the volume and richness of data contained in OSM extracts. Even for a small island such as the Isle of Wight, it contains over 50k features including ferry routes, shops and roads. The column names in the osm_lines object are as follows:

names(osm_lines) # default variable names
#>  [1] "osm_id"     "name"       "highway"    "waterway"   "aerialway" 
#>  [6] "barrier"    "man_made"   "z_order"    "other_tags" "geometry"

Once imported, you can use all functions for data frames in base R and other packages. You can also use functions from the sf package for spatial analysis and visualisation. Let’s plot all the major, secondary and residential roads, for example:

ht = c("primary", "secondary", "tertiary", "unclassified") # highway types of interest
osm_major_roads = osm_lines[osm_lines$highway %in% ht, ]
plot(osm_major_roads["highway"], key.pos = 1)

The same steps can be used to get other OSM datasets (examples not run):

malta = oe_get("Malta", quiet = TRUE)
andorra = oe_get("Andorra", extra_tags = "ref")
leeds = oe_get("Leeds")
goa = oe_get("Goa", query = "SELECT highway, geometry FROM 'lines'")

If the input place does not match any of the existing names in the supported providers, then oe_get() will try to geocode it via Nominatim API, and it will select the smallest OSM extract intersecting the area. For example (not run):

oe_get("Milan") # Warning: It will download more than 400MB of data
#> No exact match found for place = Milan and provider = geofabrik. Best match is Iran.
#> Checking the other providers.
#> No exact match found in any OSM provider data. Searching for the location online.
#> ... (extra messages here)

For further details on using the package, see the Introducing osmextract vignette.

Persistent download directory

The default behaviour of oe_get() is to save all the files in a temporary directory, which is erased every time you restart your R session. If you want to set a directory that will persist, you can add OSMEXT_DOWNLOAD_DIRECTORY=/path/for/osm/data in your .Renviron file, e.g. with:

usethis::edit_r_environ()
# Add a line containing: OSMEXT_DOWNLOAD_DIRECTORY=/path/to/save/files

We strongly advise you setting a persistent directory since working with .pbf files is an expensive operation, that is skipped by oe_*() functions if they detect that the input .pbf file was already downloaded.

You can always check the default download_directory used by oe_get() with:

oe_download_directory()

Next steps

We would love to see more providers added (see the Add new OpenStreetMap providers for details) and see what people can do with OSM datasets of the type provided by this package in a reproducible and open statistical programming environment for the greater good. Any contributions to support this or any other improvements to the package are very welcome via our issue tracker.

Licence

We hope this package will provide easy access to OSM data for reproducible research in the public interest, adhering to the condition of the OdBL licence which states that

Any Derivative Database that You Publicly Use must be only under the terms of:

    1. This License;
    1. A later version of this License similar in spirit to this

See the Introducing osmextract vignette for more details.

Other approaches

  • osmdata is an R package for importing small datasets directly from OSM servers
  • osmapiR is an R interface to the OpenStreetMap API v0.6 for fetching and saving raw from/to the OpenStreetMap database including map data as well as map notes, GPS traces, changelogs, and users data.
  • geofabrik is an R package to download OSM data from Geofabrik
  • pyrosm is a Python package for reading .pbf files
  • pydriosm is a Python package to download, read and import OSM extracts
  • osmium provides python bindings for the Libosmium C++ library
  • OpenStreetMapX.jl is a Julia package for reading and analysing .osm files
  • PostGIS is an established spatial database that works well with large OSM datasets
  • Any others? Let us know!

Contribution

We very much look forward to comments, questions and contributions. If you have any question or if you want to suggest a new approach, feel free to create a new discussion in the github repository. If you found a bug, or if you want to add a new OSM extracts provider, create a new issue in the issue tracker or a new pull request. We always try to build the most intuitive user interface and write the most informative error messages, but if you think that something is not clear and could have been explained better, please let us know.

Contributor Code of Conduct

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

osmextract's People

Contributors

agila5 avatar dabreegster avatar gretatimaite avatar jmaspons avatar mem48 avatar robinlovelace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osmextract's Issues

Bug when trying to download data from Central America

Any ideas @agila5 ?

library(geofabric)
place_sf = sf::st_as_sf(data.frame(x = -86, y = 15, n = 1),
                        coords = c("x", "y"), crs = 4326)
osm_data_place = get_geofabric(place_sf)
#> although coordinates are longitude/latitude, st_contains assumes that they are planar
#> The place is within these geofabrik zones: Central America
#> Selecting the smallest:
#> Error in if (interactive() & ask & large_size) {: argument is of length zero
plot(osm_data_place["highway"])
#> Error in plot(osm_data_place["highway"]): object 'osm_data_place' not found

Created on 2019-11-15 by the reprex package (v0.3.0)

Error with multiple names in get_geofabric

I don't know what's the standard procedure for reading or downloading multiple datasets but this is failing now:

# packages
library(geofabric)

# downloads
get_geofabric(c("Malta", "Andorra"))
#> Warning in if (!file.exists(download_path)) {: la condizione la lunghezza >
#> 1 e solo il promo elemento verrà utilizzato
#> Downloading http://download.geofabrik.de/europe/andorra-latest.osm.pbf to 
#> C:\Users\Utente\AppData\Local\Temp\RtmpkhZ2nY/Malta.osm.pbfC:\Users\Utente\AppData\Local\Temp\RtmpkhZ2nY/Andorra.osm.pbf
#> Warning in utils::download.file(url = zone_url, destfile = download_path, :
#> solo il primo elemento di 'destifle' viene utilizzato
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(C:\Users\Utente\AppData\Local\Temp\RtmpkhZ2nY/ini_new.ini)
#> Warning in if (nchar(dsn) < 1) {: la condizione la lunghezza > 1 e solo il
#> promo elemento verrà utilizzato
#> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : Expecting a single value: [extent=2].

Created on 2019-11-13 by the reprex package (v0.3.0)

The easy solution is to just stop the function if length(name) > 1 but in my opinion we should at least discuss it a little bit better.

Download based on a geographic input?

Thinking it may be useful for people using the package to download the smallest available file that exists for their region. That way the first argument to get_geofabric() (or similar) could be an sf object.

Improve good practice results

Results:

── GP osmextractr ──────────────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 1% of code lines are
    covered by test cases.

    R/download.R:28:NA
    R/download.R:33:NA
    R/download.R:34:NA
    R/download.R:35:NA
    R/download.R:36:NA
    ... and 274 more lines

  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are
    used it and it is easier to read your code for them if you use '<-'.

    R/download.R:86:25
    R/get.R:43:12
    R/get.R:85:18
    R/match.R:19:14
    R/match.R:56:11
    ... and 13 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows
    that are about 80 characters wide. Try make your lines shorter than 80 characters

    R/data.R:3:1
    R/data.R:5:1
    R/download.R:3:1
    R/download.R:28:1
    R/get.R:34:1
    ... and 8 more lines

  ✖ fix this R CMD check NOTE: Note: found 28 marked UTF-8 strings
  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically
    indicates Rd problems.

Bug with get_geofabric and sfc data

Reprex:

devtools::install_github("ITSLeeds/geofabric")
#> Skipping install of 'geofabric' from a github remote, the SHA1 (19a09498) has not changed since last install.
#>   Use `force = TRUE` to force installation

# packages
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(geofabric)

# example
get_geofabric(
  name = st_sfc(st_point(c(9, 45)), crs = 4326)
)
#> Error in if (nrow(name) > 1) {: argument is of length zero

Created on 2019-11-17 by the reprex package (v0.3.0)

The problem is here
https://github.com/ITSLeeds/geofabric/blob/19a094985f47d6dd8fb031a6df4bfc33e42b54ef/R/get_geofabric.R#L41-L42
since we cannot use nrow with sfc but only length. Should be easy fix.

Salvage functions to create/modify config files from geofabrik code

Previous versions contained this function:

make_ini_attributes = function(attributes,
                               layer,
                               defaults = get_ini_layer_defaults(layer),
                               append = TRUE) {
  attributes_default_ini = paste0("attributes=", paste(defaults, collapse = ","))
  if (append) {
    attributes = c(defaults, attributes)
  }
  attributes_default_ini_new = paste0("attributes=", paste(attributes, collapse = ","))
  ini_file = readLines("https://github.com/OSGeo/gdal/raw/master/gdal/data/osmconf.ini")
  sel_attributes = grepl(pattern = attributes_default_ini, x = ini_file)
  message("Old attributes: ", ini_file[sel_attributes])
  message("New attributes: ", attributes_default_ini_new)
  ini_file[sel_attributes] = attributes_default_ini_new
  ini_file
}

and:

get_ini_layer_defaults = function(layer) {
  # generate defaults for layer attributes
  # ini_file = readLines("https://github.com/OSGeo/gdal/raw/master/gdal/data/osmconf.ini")
  # attributes = ini_file[grepl(pattern = "^attributes=", ini_file)]
  # layer_names = ini_file[grepl(pattern = "^\\[", x = ini_file)]
  # layer_names = gsub(pattern = "\\[|\\]", replacement = "", x = layer_names)
  # attributes = gsub(pattern = "attributes=", "", attributes)
  # l = sapply(attributes, function(x) names(read.csv(text = x)))
  # class(l)
  # names(l) = layer_names
  # dput(l)
  l = list(
    points = c(
      "name",
      "barrier",
      "highway",
      "ref",
      "address",
      "is_in",
      "place",
      "man_made"
    ),
    lines = c(
      "name",
      "highway",
      "waterway",
      "aerialway",
      "barrier",
      "man_made"
    ),
    multipolygons = c(
      "name",
      "type",
      "aeroway",
      "amenity",
      "admin_level",
      "barrier",
      "boundary",
      "building",
      "craft",
      "geological",
      "historic",
      "land_area",
      "landuse",
      "leisure",
      "man_made",
      "military",
      "natural",
      "office",
      "place",
      "shop",
      "sport",
      "tourism"
    ),
    multilinestrings = c("name", "type"),
    other_relations = c("name", "type")
  )
  l[[layer]]
}

Is it worth salvaging some of these ideas? I guess so...

GDAL Error 1: Too many features have accumulated in lines layer. Use OGR_INTERLEAVED_READING=YES mode

Reprex of the error:

library(geofabric)
get_geofabric(layer = "points")
#> No exact matching geofabric zone. Best match is West Yorkshire (28.7 MB)
#> Downloading http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf to 
#> /tmp/RtmpfjodGC/west-yorkshire.osm.pbf
#> Old attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made
#> New attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made,building,natural,surface,source,power,amenity,shop,operator
#> Using ini file that can can be edited with file.edit(/tmp/RtmpfjodGC/ini_new.ini)
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode

Created on 2019-10-08 by the reprex package (v0.3.0)

Probably I'm missing something, but I'm not sure why it returns Too many features have accumulated in lines layer even if I set layer = "points". Anyway I tried added the OGR_INTERLEAVED_READING=YES option to st_read but with no success:

my_url <- "https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
sf::st_read(my_url, layer = "points", options = "OGR_INTERLEAVED_READING=YES")
#> options:        OGR_INTERLEAVED_READING=YES 
#> Reading layer `points' from data source `https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 72849 features and 10 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -2.326617 ymin: 53.3142 xmax: -1.039338 ymax: 54.03102
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Then I read here some details on that error and here that the appropriate open option is called INTERLEAVED_READING but that doesn't work either:

my_url <- "https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
res <- sf::st_read(my_url, layer = "points", options = "INTERLEAVED_READING=YES")
#> options:        INTERLEAVED_READING=YES 
#> Reading layer `points' from data source `https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 72849 features and 10 fields (with 72849 geometries empty)
#> geometry type:  GEOMETRYCOLLECTION
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
res$geometry
#> Geometry set for 72849 features  (with 72849 geometries empty)
#> geometry type:  GEOMETRYCOLLECTION
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 5 geometries:
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY

Created on 2019-10-08 by the reprex package (v0.3.0)

Support for .shp back?

In the latest version, .shp files are no longer supported. But some computers cannot read .pbf files. Should we re-add .shp support? May be something you're interested in @agila5.

Geometry column has funny name when query or key/value arguments are used in read_pbf

res = read_pbf(f, layer = "points")
Old attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made
New attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made,building,natural,surface,source,power,amenity,shop,operator
Using ini file that can can be edited with file.edit(/tmp/RtmpZPQ7RS/ini_new.ini)
> names(res)
 [1] "osm_id"         "name"           "barrier"        "highway"        "ref"           
 [6] "address"        "is_in"          "place"          "man_made"       "building"      
[11] "natural"        "surface"        "source"         "power"          "amenity"       
[16] "shop"           "operator"       "other_tags"     "_ogr_geometry_"

Benchmarks and documentation on when to use geofabrik

It's clear that there are many cases when bulk extract download is not the best way to get OSM data. Would be useful to users to know when osmdata is quicker.

Example: you want cycleways in Louvain-la-Neuve in Belgium. With geofabrik:

# remotes::install_github("itsleeds/geofabrik")
library(geofabrik)
lvn_centroid = tmaptools::geocode_OSM("louvain-la-neuve", as.sf = T)
system.time({ # around 2 seconds
  lvn = get_geofabrik(lvn_centroid)
  })
#> although coordinates are longitude/latitude, st_contains assumes that they are planar
#> The place is within these geofabrik zones: Europe, Belgium
#> Selecting the smallest: Belgium
#> Downloading http://download.geofabrik.de/europe/belgium-latest.osm.pbf to 
#> ~/h/data/osm/Belgium.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmprktWUQ/ini_new.ini)
#>    user  system elapsed 
#>  72.514  18.150 269.507
plot(louvain_highway)

Created on 2020-02-06 by the reprex package (v0.3.0)

Windows - GDAL Error 1: An error occurred during the parsing of data around byte 18

As the title says. I tried to replicate the error we found this morning but I found this. Funny thing is that the same error does not happen using sf::st_read.

Reprex on Windows:

library(geofabric)

get_geofabric(name = "andorra", layer = "lines")
#> No exact matching geofabric zone. Best match is Andorra (1.5 MB)
#> Downloading http://download.geofabrik.de/europe/andorra-latest.osm.pbf to 
#> C:\Users\Utente\AppData\Local\Temp\RtmpIpSE9S/andorra.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(C:\Users\Utente\AppData\Local\Temp\RtmpIpSE9S/ini_new.ini)
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: An error occurred during the parsing of data around byte 18

#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: An error occurred during the parsing of data around byte 18
sf::st_read("http://download.geofabrik.de/europe/andorra-latest.osm.pbf", layer = "lines")
#> Reading layer `lines' from data source `http://download.geofabrik.de/europe/andorra-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 5410 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0.975577 ymin: 42.32422 xmax: 1.824654 ymax: 42.78834
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Created on 2019-09-30 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  Italian_Italy.1252          
#>  ctype    Italian_Italy.1252          
#>  tz       Europe/London               
#>  date     2019-09-30                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version date       lib source                             
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)                     
#>  backports     1.1.4   2019-04-10 [1] CRAN (R 3.6.0)                     
#>  callr         3.3.2   2019-09-22 [1] CRAN (R 3.6.1)                     
#>  class         7.3-15  2019-01-01 [2] CRAN (R 3.6.1)                     
#>  classInt      0.4-1   2019-08-06 [1] CRAN (R 3.6.1)                     
#>  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)                     
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)                     
#>  DBI           1.0.0   2018-05-02 [1] CRAN (R 3.6.0)                     
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)                     
#>  devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.1)                     
#>  digest        0.6.21  2019-09-20 [1] CRAN (R 3.6.1)                     
#>  e1071         1.7-2   2019-06-05 [1] CRAN (R 3.6.0)                     
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)                     
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)                     
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.1)                     
#>  geofabric   * 0.1.0   2019-09-30 [1] Github (ITSLeeds/geofabric@7acec3a)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)                     
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)                     
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.6.0)                     
#>  KernSmooth    2.23-15 2015-06-29 [2] CRAN (R 3.6.1)                     
#>  knitr         1.25    2019-09-18 [1] CRAN (R 3.6.1)                     
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)                     
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)                     
#>  pillar        1.4.2   2019-06-29 [1] CRAN (R 3.6.0)                     
#>  pkgbuild      1.0.5   2019-08-26 [1] CRAN (R 3.6.1)                     
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)                     
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)                     
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)                     
#>  processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.0)                     
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)                     
#>  R6            2.4.0   2019-02-14 [1] CRAN (R 3.6.0)                     
#>  Rcpp          1.0.2   2019-07-25 [1] CRAN (R 3.6.1)                     
#>  remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)                     
#>  rlang         0.4.0   2019-06-25 [1] CRAN (R 3.6.1)                     
#>  rmarkdown     1.15    2019-08-21 [1] CRAN (R 3.6.1)                     
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)                     
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)                     
#>  sf            0.8-1   2019-09-26 [1] Github (r-spatial/sf@590cb67)      
#>  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)                     
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)                     
#>  testthat      2.2.1   2019-07-25 [1] CRAN (R 3.6.1)                     
#>  tibble        2.1.3   2019-06-06 [1] CRAN (R 3.6.1)                     
#>  units         0.6-4   2019-08-22 [1] CRAN (R 3.6.1)                     
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.1)                     
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)                     
#>  xfun          0.9     2019-08-21 [1] CRAN (R 3.6.1)                     
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)                     
#> 
#> [1] C:/Users/Utente/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

Reprex on Linux:

library(geofabric)
    
get_geofabric(name = "andorra", layer = "lines")
#> No exact matching geofabric zone. Best match is Andorra (1.5 MB)
#> Downloading http://download.geofabrik.de/europe/andorra-latest.osm.pbf to 
#> /tmp/RtmpXRpVUw/andorra.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpXRpVUw/ini_new.ini)
sf::st_read("http://download.geofabrik.de/europe/andorra-latest.osm.pbf", layer = "lines")
#> Reading layer `lines' from data source `http://download.geofabrik.de/europe/andorra-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 5410 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0.975577 ymin: 42.32422 xmax: 1.824654 ymax: 42.78834
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Created on 2019-09-30 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       Debian GNU/Linux 9 (stretch)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Etc/UTC                     
#>  date     2019-09-30                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib
#>  assertthat    0.2.1      2019-03-21 [1]
#>  backports     1.1.4      2019-04-10 [1]
#>  callr         3.3.2      2019-09-22 [1]
#>  class         7.3-15     2019-01-01 [2]
#>  classInt      0.4-1      2019-08-06 [1]
#>  cli           1.1.0      2019-03-19 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  DBI           1.0.0      2018-05-02 [1]
#>  desc          1.2.0      2018-05-01 [1]
#>  devtools      2.2.0.9000 2019-09-23 [1]
#>  digest        0.6.21     2019-09-20 [1]
#>  e1071         1.7-2      2019-06-05 [1]
#>  ellipsis      0.3.0      2019-09-20 [1]
#>  evaluate      0.14       2019-05-28 [1]
#>  fs            1.3.1      2019-05-06 [1]
#>  geofabric   * 0.1.0      2019-09-30 [1]
#>  glue          1.3.1      2019-03-12 [1]
#>  highr         0.8        2019-03-20 [1]
#>  htmltools     0.3.6.9004 2019-09-23 [1]
#>  KernSmooth    2.23-15    2015-06-29 [2]
#>  knitr         1.25       2019-09-18 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  pillar        1.4.2      2019-06-29 [1]
#>  pkgbuild      1.0.5      2019-08-26 [1]
#>  pkgconfig     2.0.3      2019-09-22 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.4.1      2019-07-18 [1]
#>  ps            1.3.0      2018-12-21 [1]
#>  R6            2.4.0      2019-02-14 [1]
#>  Rcpp          1.0.2      2019-07-25 [1]
#>  remotes       2.1.0      2019-06-24 [1]
#>  rlang         0.4.0      2019-06-25 [1]
#>  rmarkdown     1.15       2019-08-21 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  sf            0.8-0      2019-09-17 [1]
#>  stringi       1.4.3      2019-03-12 [1]
#>  stringr       1.4.0      2019-02-10 [1]
#>  testthat      2.2.1      2019-07-25 [1]
#>  tibble        2.1.3      2019-06-06 [1]
#>  units         0.6-4      2019-08-22 [1]
#>  usethis       1.5.1      2019-07-04 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xfun          0.9        2019-08-21 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  source                             
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (r-lib/devtools@2765fbe)    
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (ITSLeeds/geofabric@7acec3a)
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (rstudio/htmltools@c49b29c) 
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#> 
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library

Hope it's clear.

Create local version of osmconf.ini

When we create the new version of the CONFIG_FILE for OSM GDAL Open Options we use the following code

https://github.com/ITSLeeds/geofabrik/blob/73dd18af8f47284167a32f9190e673603c589a6e/R/read_pbf.R#L135

which clearly doesn't work if there is no internet connection. Does it make sense to create "local" version of gdal/osmconf.ini file and that local version to the package? In this way, as long as the user has already downloaded the .osm.pbf file and saved it in the GF_DOWNLOAD_DIRECTORY, the get_geofabrik function could also be used offline.

key and attribute

Questions:

  • Regarding terminology, what is the difference between key and attribute? It is confusing.
  • Why are not all available keys/attributes included in ini_new.ini? E.g. waterways is not included as an attribute of the layers multipolygons, while some waterways values are polygons (e.g. riverbanks)
  • The argument attributes can be used in get_geofabrik. The default values, from make_additional_attributes(layer) seem arbitrary. Why this selection?

One suggestion would be to include the key into the attributes. For instance this does not work:

x = get_geofabrik("Ile-de-France", 
   layer = "multipolygons", 
   key = "waterway", value = "riverbank")

while this works:

x = get_geofabrik("Ile-de-France", 
   layer = "multipolygons", 
   key = "waterway", value = "riverbank", 
   attributes = "waterway")

On a side note, it doesn't seem to collect all riverbanks. I downloaded a shapefile from https://mapcruzin.com/free-france-arcgis-maps-shapefiles.htm which is also extracted from OSM, but which does contain all riverbanks.

Strangely, for key/attribute waterways it is the other way round:

Works:

x = get_geofabrik("Ile-de-France", 
    layer = "multipolygons", 
    key = "natural", value = "water")

Does not work:

x = get_geofabrik("Ile-de-France", 
    layer = "multipolygons", 
    key = "natural", value = "water", 
    attributes = "natural")

A select all query does not work. It doesn't throw an error, but returns an empty sf object:

x  = get_geofabrik("Ile-de-France", 
   layer = "multipolygons", 
   key = "waterway", value = "*", 
   attributes = "waterway")

Weird result with query parameter

I was working on the OGR_INTERLEAVED_READING problem (and I will submit a draft PR shortly) and I think I found something weird with the query parameter. I'm not sure if it's a bug or not and I can't easily reproduce that with geofabrik only but the following reprex should show the problem.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3

my_url <- "https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
sf::st_read(my_url, layer = "points")
#> Reading layer `points' from data source `https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 75666 features and 10 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -2.326617 ymin: 53.3142 xmax: -1.039338 ymax: 54.03102
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
sf::st_read(my_url, layer = "points", query = "select * from points")
#> Reading layer `points' from data source `https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 75666 features and 10 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -2.326617 ymin: 53.3142 xmax: -1.039338 ymax: 54.03102
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Created on 2020-01-23 by the reprex package (v0.3.0)

What's the difference between the two approaches? Why I don't see any warning if I add a query that shouldn't modify anything?

Unwanted warning message in oe_download()

 iow_details = oe_match("Isle of Wight", provider = "test")
> f = oe_download(
+   file_url = iow_details$url,
+   file_size = iow_details$file_size
+ )
Warning message:
In grepl(pattern = oe_available_providers(), x = file_url) :
  argument 'pattern' has length > 1 and only the first element will be used
> f
[1] "/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/geofabrik_geofabrik_isle-of-wight-latest.osm.pbf"
[2] "/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/test_geofabrik_isle-of-wight-latest.osm.pbf"     
[3] "/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/bbbike_geofabrik_isle-of-wight-latest.osm.pbf" 

Separate downloading and reading of pbf

The get_geofabric function calls read_pbf at the end. So there is no easy way to download pbf without also loading them into memory. This is bad for thinks like batch downloading.

Instaid the function should just download and return the path to the file.
e.g.

pbf <- get_geofabric("isle of man")
lines <- read_pbf(pbf)

oe_get() seems to fail when layer is not lines

Reproducible example below, can you confirm this @agila5 ?

library(osmextractr)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
#> Geofabrik data are taken from https://download.geofabrik.de/
#> For usage details of bbbike data see https://download.bbbike.org/osm/
    iow = oe_get("Isle of Wight", provider = "test", oe_verbose = TRUE)
#> The input place was matched with: Isle of Wight
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations
#> Reading layer `lines' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/test_geofabrik_isle-of-wight-latest.gpkg' using driver `GPKG'
#> Simple feature collection with 44365 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -5.401978 ymin: 43.35489 xmax: -0.175775 ymax: 50.89599
#> geographic CRS: WGS 84
    class(iow)
#> [1] "sf"         "data.frame"
    summary(sf::st_geometry_type(iow))
#>           GEOMETRY              POINT         LINESTRING            POLYGON 
#>                  0                  0              44365                  0 
#>         MULTIPOINT    MULTILINESTRING       MULTIPOLYGON GEOMETRYCOLLECTION 
#>                  0                  0                  0                  0 
#>     CIRCULARSTRING      COMPOUNDCURVE       CURVEPOLYGON         MULTICURVE 
#>                  0                  0                  0                  0 
#>       MULTISURFACE              CURVE            SURFACE  POLYHEDRALSURFACE 
#>                  0                  0                  0                  0 
#>                TIN           TRIANGLE 
#>                  0                  0
    oe_match("Isle of Wight", provider = "test")
#> $url
#> [1] "https://github.com/ITSLeeds/osmextractr/releases/download/0.0.1/geofabrik_isle-of-wight-latest.osm.pbf"
#> 
#> $file_size
#> [1] 6877468
    f = oe_get("Isle of Wight", provider = "test", download_only = TRUE)
    # todo: write function to get the .pbf file path
    f_pbf = gsub(".gpkg", ".osm.pbf", f)
    sf::st_layers(f)
#> Driver: GPKG 
#> Available layers:
#>   layer_name geometry_type features fields
#> 1      lines   Line String    44365      9
    sf::st_layers(f_pbf)
#> Driver: OSM 
#> Available layers:
#>         layer_name       geometry_type features fields
#> 1           points               Point       NA     10
#> 2            lines         Line String       NA      9
#> 3 multilinestrings   Multi Line String       NA      4
#> 4    multipolygons       Multi Polygon       NA     25
#> 5  other_relations Geometry Collection       NA      4
    # \dontrun{
    # fix issue that different layers cannot be read-in
    iow_points = oe_get("Isle of Wight", provider = "test", layer = "points")
#> Cannot open layer points
#> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : Opening layer failed.

Created on 2020-07-12 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Ubuntu 18.04.4 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB:en                    
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-07-12                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                             
#>  assertthat    0.2.1      2019-03-21 [2] CRAN (R 3.6.0)                     
#>  backports     1.1.8      2020-06-17 [1] CRAN (R 3.6.3)                     
#>  callr         3.4.3      2020-03-28 [1] CRAN (R 3.6.3)                     
#>  class         7.3-17     2020-04-26 [2] CRAN (R 3.6.3)                     
#>  classInt      0.4-3      2020-04-06 [1] Github (r-spatial/classInt@d024051)
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 3.6.2)                     
#>  crayon        1.3.4      2017-09-16 [2] standard (@1.3.4)                  
#>  DBI           1.1.0      2019-12-15 [2] CRAN (R 3.6.2)                     
#>  desc          1.2.0      2018-05-01 [2] standard (@1.2.0)                  
#>  devtools      2.3.0      2020-04-10 [1] CRAN (R 3.6.3)                     
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 3.6.2)                     
#>  dplyr         1.0.0.9000 2020-07-08 [1] Github (tidyverse/dplyr@f53e9ce)   
#>  e1071         1.7-3      2019-11-26 [2] CRAN (R 3.6.1)                     
#>  ellipsis      0.3.1      2020-05-15 [3] CRAN (R 3.6.3)                     
#>  evaluate      0.14       2019-05-28 [2] CRAN (R 3.6.0)                     
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 3.6.2)                     
#>  fs            1.4.2      2020-06-30 [2] CRAN (R 3.6.3)                     
#>  generics      0.0.2      2018-11-29 [3] CRAN (R 3.5.1)                     
#>  glue          1.4.1      2020-05-13 [2] CRAN (R 3.6.3)                     
#>  highr         0.8        2019-03-20 [3] CRAN (R 3.5.3)                     
#>  htmltools     0.5.0.9000 2020-06-18 [1] Github (rstudio/htmltools@a8025f3) 
#>  KernSmooth    2.23-17    2020-04-26 [4] CRAN (R 3.6.3)                     
#>  knitr         1.29       2020-06-23 [1] CRAN (R 3.6.3)                     
#>  lifecycle     0.2.0.9000 2020-06-30 [1] Github (r-lib/lifecycle@8e0f87b)   
#>  magrittr      1.5        2014-11-22 [2] CRAN (R 3.5.2)                     
#>  memoise       1.1.0      2017-04-21 [3] CRAN (R 3.5.0)                     
#>  osmextractr * 0.1.0      2020-07-12 [1] local                              
#>  pillar        1.4.6      2020-07-10 [1] CRAN (R 3.6.3)                     
#>  pkgbuild      1.0.8      2020-05-07 [1] CRAN (R 3.6.3)                     
#>  pkgconfig     2.0.3      2019-09-22 [2] CRAN (R 3.6.1)                     
#>  pkgload       1.1.0      2020-05-29 [3] CRAN (R 3.6.3)                     
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 3.6.2)                     
#>  processx      3.4.3      2020-07-05 [1] CRAN (R 3.6.3)                     
#>  ps            1.3.3      2020-05-08 [1] CRAN (R 3.6.3)                     
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 3.6.3)                     
#>  R6            2.4.1      2019-11-12 [2] CRAN (R 3.6.1)                     
#>  Rcpp          1.0.5      2020-07-06 [1] CRAN (R 3.6.3)                     
#>  remotes       2.1.1      2020-02-15 [1] CRAN (R 3.6.2)                     
#>  rlang         0.4.7      2020-07-09 [1] Github (r-lib/rlang@7e97309)       
#>  rmarkdown     2.3        2020-06-18 [1] CRAN (R 3.6.3)                     
#>  rprojroot     1.3-2      2018-01-03 [2] CRAN (R 3.5.3)                     
#>  sessioninfo   1.1.1      2018-11-05 [3] CRAN (R 3.5.1)                     
#>  sf            0.9-4      2020-06-13 [1] CRAN (R 3.6.3)                     
#>  stringi       1.4.6      2020-02-17 [1] CRAN (R 3.6.2)                     
#>  stringr       1.4.0      2019-02-10 [2] standard (@1.4.0)                  
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 3.6.3)                     
#>  tibble        3.0.3      2020-07-10 [1] CRAN (R 3.6.3)                     
#>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 3.6.3)                     
#>  units         0.6-7      2020-06-13 [1] CRAN (R 3.6.3)                     
#>  usethis       1.6.1.9001 2020-07-09 [1] Github (r-lib/usethis@4abf7ca)     
#>  vctrs         0.3.1      2020-06-05 [1] CRAN (R 3.6.3)                     
#>  withr         2.2.0      2020-04-20 [2] CRAN (R 3.6.3)                     
#>  xfun          0.15       2020-06-21 [1] CRAN (R 3.6.3)                     
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 3.6.2)                     
#> 
#> [1] /home/robin/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

Bug in get_geofabric with non exact matches

For example, if I run get_geofabric("ABCD") in interactive mode (which was just a stupid test to check some other changes) then I don't get any warning or message, just an "unrelated" warning that R is going to download a very large file (7.2 GB).

I didn't understand so I checked and I found that "ABCD" was matched with "Asia" and so everything makes sense now but I think we should add some message if there is no exact match between the input name and the names stored in the geofabric_zones.rda. Something like:

  • If there is an exact match: no message or warning;
  • If there is a match below the max_dist threshold: message
  • If there is no match: warning

Do you agree? Nevertheless it's easy to solve these problem just by changing the max_dist parameter (for example get_geofabric("ABCD", max_dist = "1") returns the usual warning) so I'm not sure if it does make sense to change the function.

Bug in oe_get

If the user stops the download before it's finished, then the next attempt reports something like this:

No exact matching geofabric zone. Best match is West Yorkshire (28.7 MB)
Data already detected in C:\Users\Utente\AppData\Local\Temp\RtmpANehJ6/west-yorkshire.osm.pbf
Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
Using ini file that can can be edited with file.edit(C:\Users\Utente\AppData\Local\Temp\RtmpANehJ6/ini_new.ini)
Warning messages:
1: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 1: An error occurred during the parsing of data around byte 215
2: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 1: An error occurred during the parsing of data around byte 215

Strange results with size_pbf column of geofabric_zones

library(geofabric)
sort(table(geofabric_zones$size_pbf, useNA = "ifany"), decreasing = TRUE)[1:5]
#> 
#> [.osm.bz2]    (60 MB)    (77 MB)    (84 MB)   (104 MB) 
#>         56          4          4          4          3

Created on 2019-10-08 by the reprex package (v0.3.0)

The strange thing is that all the "problematic" countries are in Africa and, if I rerun the code in data-raw/geofabric_zones.R I get a different result.

sort(table(t_all$size_pbf, useNA = "ifany"), decreasing = TRUE)[1:5]
#> 
#>  (84 MB)  (213 MB) (46.2 MB)   (69 MB)   (99 MB)
#>        5         4         4         4         4 

oe_update()

I think a small function to download more recent versions of a file could be really useful. Building on #33 I think it could simply be:

library(osmextractr)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
#> Geofabrik data are taken from https://download.geofabrik.de/
#> For usage details of bbbike data see https://download.bbbike.org/osm/
oe_update = function(place, ...) {
  oe_get(place = place, ..., force_download = TRUE, download_only = TRUE)
}
oe_update("Isle of Wight")
#> [1] "/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/geofabrik_isle-of-wight-latest.gpkg"

Created on 2020-07-13 by the reprex package (v0.3.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.