Code Monkey home page Code Monkey logo

nesa's Introduction

This is an extract of the Network Rail (NR) NESA data downloaded as embedded tifff pdf from the Network Rail National Electronic Sectional Appendix (http://www.networkrail.co.uk/aspx/10563.aspx)

The accompanying data is some years old and is based on OCR extract of the pdf files then reviewed and converted into a tsv format, by region and line.

The regions are:

Region URL
Anglia http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/anglia%20sectional%20appendix.pdf
Kent, Sussex and Wessex http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/kent%20sussex%20wessex%20sectional%20appendix.pdf
London North Eastern http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/london%20north%20eastern%20sectional%20appendix.pdf
London North Western (North) http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/london%20north%20western%20north%20sectional%20appendix.pdf
London North Western (South) http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/london%20north%20western%20south%20sectional%20appendix.pdf
Scotland http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/scotland%20sectional%20appendix.pdf
Wessex http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/wessex%20sectional%20appendix.pdf
Western http://www.networkrail.co.uk/browse%20documents/sectional%20appendix/western%20sectional%20appendix.pdf

The logical process to create the NESA files by line then is something like the following.

Please note that this approximates and is not the actual command line used to generate the final files. This is due to data quality issues with the raw scans and the iterative way in which the text files were manually hacked along with changes the process2.py script

  1. Add nesa/bin directory to the path variable

  2. Download:

$ loop (region, url) 
$ do
$   cd $region
$   wget URL > track-and-route.pdf
$ done
  1. Process track-and-route.pdf file to generate raw scanned text files
$ for region in `cat regions.txt`
$ do
$   cd $region
$   ocr2.sh
$   mkdir archive
$   mv pg_*.pdf archive
$   cd ..
$ done
  1. Create line-number directories and move ocr files to archive directory
$ for region in regions.txt
$ do
$   cd $region
# create line-list.txt file
$   line-list.sh
# create and move ocr files to line-number directories
$   archive-script.sh
$ done
  1. Create line reports and combined region report
$ for region in `cat regions.txt`
$ do
$   cd $region
$   run-process.sh
$ done

nesa's People

Contributors

guidoeco avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.