Code Monkey home page Code Monkey logo

landsat_ingestor's Introduction

landsat_ingestor's People

Contributors

joferkington avatar kapadia avatar warmerdam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

landsat_ingestor's Issues

reprocess scenes without .ovr and tiling

Circa March 1st (in 91e9f50) we set things up to tile and build overviews for ingested scenes; however, we still haven't gone back and reprocessed existing scenes.

There is now an ingestor/for_each_scene.py that can iterate over scenes, and a reprocess_scene.py that can fix up scenes. Work out a technique to run this for the existing outdated scenes but not newer scenes.

index.html rendering thumbnail for scenes without proper bands

Some scenes do not include all of the bands necessary to render a thumbnail, and their index pages look broken as a result.

Here are a few scenes with insufficient bands as an example:
https://s3-us-west-2.amazonaws.com/landsat-pds/L8/116/206/LT81162062015025LGN00/index.html
https://s3-us-west-2.amazonaws.com/landsat-pds/L8/116/202/LT81162022015025LGN00/index.html
https://s3-us-west-2.amazonaws.com/landsat-pds/L8/141/218/LT81412182015024LGN00/index.html

In instances where we do not have enough bands to produce a thumbnail, we should simply not insert a reference to a jpg. In the future, we may consider including messaging explaining that a limited set of bands is available for the scene.

scene_info date handling triggering exception

+cc @kapadia

I'm seeing lots of job failures like this:

gdaladdo -ro -r average --config COMPRESS_OVERVIEW DEFLATE --config PREDICTOR_OVERVIEW 2 --config GDAL_TIFF_OVR_BLOCKSIZE 512 LC82200762015113LGN00/LC82200762015113LGN00_B10.TIF 3 9 27 81
Traceback (most recent call last):
File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 115, in
status = main(sys.argv[1:])
File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 106, in main
overwrite = args.overwrite)
File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 65, in process
scene_info.add_mtl_info(scene_dict, scene_root, local_dir)
File "/opt/planet/programs/landsat_ingestor/ingestor/scene_info.py", line 74, in add_mtl_info
mtl_dict['PRODUCT_METADATA']['SCENE_CENTER_TIME'])
TypeError: combine() argument 2 must be datetime.time, not str

Fix unbound var

An unbound var is lingering around when the tarball is corrupt.

+ l8_process_scene.py --verbose -s s3queue --clean --overwrite --list-file job_33861984.csv LC80750192015105LGN00
LC80750192015105LGN00_B1.TIF
LC80750192015105LGN00_B2.TIF
LC80750192015105LGN00_B3.TIF
LC80750192015105LGN00_B4.TIF
LC80750192015105LGN00_B5.TIF
LC80750192015105LGN00_B6.TIF
LC80750192015105LGN00_B7.TIF

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

LC80750192015105LGN00.tar.gz successfully downloaded (402891096 bytes)
tar xvf LC80750192015105LGN00.tar.gz --directory=LC80750192015105LGN00 
Traceback (most recent call last):
  File "landsat_ingestor/ingestor/l8_process_scene.py", line 115, in <module>
    status = main(sys.argv[1:])
  File "landsat_ingestor/ingestor/l8_process_scene.py", line 106, in main
    overwrite = args.overwrite)
  File "landsat_ingestor/ingestor/l8_process_scene.py", line 65, in process
    scene_info.add_mtl_info(scene_dict, scene_root, local_dir)
UnboundLocalError: local variable 'local_dir' referenced before assignment
Task ended with status 1

/cc @warmerdam

Receiving old scenes on SNS topic

I’ve just subscribed to the Landsat PDS AWS SNS feed, but it looks like I’m getting lots of old scenes through. In the last 12hrs I’ve received hundreds of events, but the most recent acquisition date is 2017-01-12.

Can anyone confirm whether this is expected behaviour? Are new scenes coming through the SNS feed at the moment?

Throttled downloads from usgs servers?

A new error is being propagated from USGS servers. It appears that changes have been made on their end, and are now throttling downloads.

If this is the case, we'll need to re-work certain areas of the landsat-ingestor to request no more than 10 download urls at a time. I'll investigate a little more to find out the extent of this new constraint.

usgs.USGSError: User currently has more than 10 downloads that have not been attempted in the past 10 minutes.

/cc @warmerdam @jedsundwall @camillacaros

usgs request failure - perhaps we need retry?

I get stuff like the following at least occationally. Perhaps there is something that needs retry logic?

+ l8_process_run.py -v -s auto --start-date=2015-01-15 --end-date=2015-02-06 --queue
logging in...
Traceback (most recent call last):
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_run.py", line 178, in <module>
    status = main(sys.argv[1:])
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_run.py", line 155, in main
    limit=args.limit)
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_run.py", line 21, in query_for_scenes
    os.environ['USGS_PASSWORD'])
  File "/opt/planet/programs/landsat_ingestor/ingestor/usgs/api.py", line 154, in login
    api_key = element.text
AttributeError: 'NoneType' object has no attribute 'text'

Duplicates in scene_list

Have seen a couple of duplicates showing up in scene_list.gz. Doesn't seem to be tied to date. Maybe items are getting queued up twice?

$ grep LC80200312015200LGN00 scene_list              
LC80200312015200LGN00,2015-07-19 16:16:07.837833,65.39,L1T,20,31,40.62882,-85.17706,42.79844,-82.23444,https://s3-us-west-2.amazonaws.com/landsat-pds/L8/020/031/LC80200312015200LGN00/index.html
LC80200312015200LGN00,2015-07-19 16:16:07.837833,65.39,L1T,20,31,40.62882,-85.17706,42.79844,-82.23444,https://s3-us-west-2.amazonaws.com/landsat-pds/L8/020/031/LC80200312015200LGN00/index.html

Scene list is missing ProductId

If I download the scene list from https://landsat-pds.s3.amazonaws.com/c1/L8/scene_list.gz it contains productId as a column name.

If I download the scene list from s3://landsat-pds/scene_list.gz it does NOT contain productId as a column name.

Which scene list is considering best?

Scenes missing

Hi

On Landsat on AWS you say that "all Landsat-8 scenes from 2015 are available along with a selection of cloud-free scenes from 2013 and 2014". However, I have come across several scenes acquired after 2015 that I cant find at Landsat on AWS.

Here are two examples acquired 2016-07-08:

  • LC08_L1TP_194001_20160708_20170323_01_T1
  • LC08_L1TP_178006_20160708_20170323_01_T1

Am I doing something wrong/looking at the wrong place, or is it correct that the scenes are not there?

Regards,
Vebjørn

reset connections

I am getting errors like this from time to time. Possibly it would help to do a retry or something?

+ l8_process_scene.py --verbose -s s3queue --clean --list-file job_20719258.csv LC80011152015035LGN00
.....Traceback (most recent call last):
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 110, in <module>
    status = main(sys.argv[1:])
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 101, in main
    overwrite = args.overwrite)
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 56, in process
    verbose=verbose)
  File "/opt/planet/programs/landsat_ingestor/ingestor/puller.py", line 18, in pull
    return puller_s3queue.pull(scene_root, scene_dict, verbose=verbose)
  File "/opt/planet/programs/landsat_ingestor/ingestor/puller_s3queue.py", line 34, in pull
    for d in rv.iter_content(chunk_size=1024 * 1024 * 10):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 616, in generate
    decode_content=True):
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/response.py", line 236, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/response.py", line 183, in read
    data = self._fp.read(amt)
  File "/usr/lib/python2.7/httplib.py", line 561, in read
    s = self.fp.read(amt)
  File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/usr/lib/python2.7/ssl.py", line 241, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 160, in read
    return self._sslobj.read(len)
socket.error: [Errno 104] Connection reset by peer

Fewer requests to the USGS download-url endpoint

Currently each scene is processed individually. Part of the processing is requesting an authenticated download url from USGS. It's currently done:

usgs.api.download('LANDSAT_8', 'EE', [scene_root], 'STANDARD')

Rather than submitting 1 request per scene, we can group multiple scenes together in a single request.

usgs.api.download('LANDSAT_8', 'EE', [scene_root_1, scene_root_2, ..., scene_root_n], 'STANDARD')

/cc @warmerdam

Fewer requests to the USGS auth endpoint

Right now a new API token is generated each time a download url is requested. The ingestor doesn't need to hit the auth endpoint each time:

def get_download_url(scene_root, verbose):
if 'USGS_PASSWORD' in os.environ:
if verbose:
print 'logging in...'
api_key = api.login(os.environ['USGS_USERID'],
os.environ['USGS_PASSWORD'])
if verbose:
print ' api_key = %s' % api_key
urls = api.download('LANDSAT_8', 'EE', [scene_root], 'STANDARD')
return urls[0]

API tokens are valid for 1 hour after the initial request and reset each time a request is made. Let's be kind to USGS servers, and only request a single token.

/cc @warmerdam

corrupt tar files reprocessed indefinately

Every two hours as we try to reprocess the tarq contents this corrupt scene is tried and fails. We need some better logic to migrate such corrupt files to the tarq_corrupt area like we do for the quick test in puller_s3queue.py. It is a little tricker for this case since it happens significantly later.

l8_process_scene.py --verbose -s s3queue --clean --overwrite --list-file job_33049805.csv LC81790442014335LGN00
Scene LC81790442014335LGN00 already exists on destination bucket.
Processing scene: LC81790442014335LGN00
Fetching: http://s3-us-west-2.amazonaws.com/landsat-pds/tarq/LC81790442014335LGN00.tar.gz
...
.....
....
.....
.....
....
.....
.....
.....
.....
.....
.....
....
....
......
.....
....
.....
.....
.LC81790442014335LGN00_B1.TIF
LC81790442014335LGN00_B2.TIF
LC81790442014335LGN00_B3.TIF
LC81790442014335LGN00_B4.TIF
LC81790442014335LGN00_B5.TIF
LC81790442014335LGN00_B6.TIF
LC81790442014335LGN00_B7.TIF
LC81790442014335LGN00_B8.TIF
LC81790442014335LGN00_B9.TIF
LC81790442014335LGN00_B10.TIF
LC81790442014335LGN00_B11.TIF
LC81790442014335LGN00_BQA.TIF

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
LC81790442014335LGN00.tar.gz successfully downloaded (942628864 bytes)
tar xvf LC81790442014335LGN00.tar.gz --directory=LC81790442014335LGN00 
Traceback (most recent call last):
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 110, in <module>
    status = main(sys.argv[1:])
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 101, in main
    overwrite = args.overwrite)
  File "/opt/planet/programs/landsat_ingestor/ingestor/l8_process_scene.py", line 58, in process
    local_dir = splitter.split(scene_root, local_tarfile, verbose=verbose)
  File "/opt/planet/programs/landsat_ingestor/ingestor/splitter.py", line 63, in split
    verbose=verbose)
  File "/opt/planet/programs/landsat_ingestor/ingestor/splitter.py", line 13, in run_command
    raise Exception('command "%s" failed with code %d.' % (cmd, result))
Exception: command "tar xvf LC81790442014335LGN00.tar.gz --directory=LC81790442014335LGN00 " failed with code 512.
Task ended with status 1
+ STATUS=1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.