Hi Danton.
I would like to share this snippet of code that is super fast to calculate the distance of each shape_id. The idea is to use data.table
to make really quick operations.
library(data.table)
library(geosphere)
# read shapes.txt
shapes_df <- fread("shapes.txt)
# convert lat long columns to numeric
shapes_df[, shape_pt_lon := as.numeric(shape_pt_lon) ][, shape_pt_lat := as.numeric(shape_pt_lat) ]
# Pair subsequent points for each shape
shapes_df[, `:=`(next_shape_pt_sequence = shift(shape_pt_sequence, type = "lead"),
next_lat = shift(shape_pt_lat, type = "lead"),
next_lon = shift(shape_pt_lon, type = "lead")), by = .(shape_id)]
# Calculate distance between each point in the shape
shapes_df[ , shape_dist_traveled := distGeo(matrix(c(shape_pt_lon, shape_pt_lat), ncol = 2),
matrix(c(next_lon, next_lat), ncol = 2))/1000]
# sum total distance of each shape_id
shapes_df[ , .(dist_shape= sum(shape_dist_traveled, na.rm=T)), by=shape_id]
#> shape_id dist_shape
#> 1: 29647112 39.476308
#> 2: 17391142 7.941542
#> 3: 17614235 28.088435
#> 4: 29949632 14.276429
#> 5: 17576631 7.025251
I still haven't find time to work on my scripts to contribute with new a function that gets this summary to the package. In the mean time, I hope this will snippet will be useful for other purposes as well.