cloudtostreet / sen1floods11 Goto Github PK

View Code? Open in Web Editor NEW

139.0 139.0 41.0 11.39 MB

Jupyter Notebook 100.00%

sen1floods11's People

Contributors

Stargazers

Watchers

Forkers

cuulee ajainf rucnyz akashish990 yhjack samburu hillsonghimire minhokim93 konstantinosf jawaechan dedwing youthle catantics13 woodfanwood eshanjairath chaoecohydrors luckmouse ichit sophiezang sahar-hoseynzadeh adhuan 2794456352 sharadgupta27 donhuvy tian-dandan tiandan-geo prajna1999 noeleos soiqualang suhailrafi thanaporn2 anacarolbs imadfen internetoftim cuki-chang bakunawa30 nvallez gaishixingzai ddrainer

sen1floods11's Issues

Preprocessing steps for permanent water

Hi, first off, thank you for this resource!

I was testing both flood and permanent water segmentation, and I noticed that the Sentinel-1 images have two slightly different formats: I believe that Flood and Weakly have the same logarithmic scale, while the JRC perm_water subset has been normalized between 0 and 1

flood range: [-60.44136, 36.832024]
water range: [0.0, 1.0]
weak  range: [-70.71592, 39.25037]

Plotting them it looks like a very "low contrast" image: I'm obtaining close results by applying np.clip(10 ** (img_flood / 50), 0, 1) to the flood ones, however I can't find the exact transformation that you applied there.
Where can I find more info about that? Apologies if it's already documented somewhere, I couldn't find anything on this regard.
Thanks!

About name of all files: more details was needed

in HandLabeled ( v1.1 )， there are 5 file：

JRCWaterHand (it might be the label of s1 ? )
LabelHand ( (it might be the label of cloud ? ))
S1Hand (it might be the data of s1 ? )
S1OtsuLabelHand (***?)
S2Hand (it might be the data of s2 ? )
but there are no more details of those files name.
if possible，can you add it to README ?
thank you !

sorry, I got it in docs

how to computer the hand labeled training dataset ([0.6851, 0.5235],[0.0820, 0.1102]),these is mean and std

norm = transforms.Normalize([0.6851, 0.5235], [0.0820, 0.1102])
in the code,how to computer the tow data?in article,the author did not comment.How do we calculate the mean and standard deviation of S2

Label TIF optimizations

While attempting to use the labels described within this repo, it became apparent that a couple optimizations are advisable:

Because the data has 3 possible values (-1, 0, 1), the use of int16 tifs is significant overkill. A byte tif (int8) would save considerable space/transfer time
At the moment, these tifs have a NoData value of −32768. It is likely more appropriate for these tifs to have a NoData value of -1, given the fact that this tracks the advertised semantics more closely and experience teaches that incorrectly set NoData values are sometimes problematic for downstream processes

Describe data in each folder on Google Cloud Storage bucket

Thanks for documenting this research here!

I'm reading through the paper, this github repo, and looking at the S3 bucket. When I list the GS bucket, I get:

gs://cnn_chips/CNN_Chips_FTC.geojson
gs://cnn_chips/Sen1Floods11_labeled.tgz
gs://cnn_chips/flood_bolivia_data.csv
gs://cnn_chips/flood_test_data.csv
gs://cnn_chips/flood_train_data.csv
gs://cnn_chips/flood_valid_data.csv
gs://cnn_chips/permanent_water_data.csv
gs://cnn_chips/permanent_water_test_data.csv
gs://cnn_chips/permanent_water_train_data.csv
gs://cnn_chips/permanent_water_validation_data.csv
gs://cnn_chips/NoQC/
gs://cnn_chips/Perm/
gs://cnn_chips/PermJRC/
gs://cnn_chips/QC_v2/
gs://cnn_chips/S1/
gs://cnn_chips/S1Flood/
gs://cnn_chips/S1Flood_NoQC/
gs://cnn_chips/S1Perm/
gs://cnn_chips/S1_NoQC/
gs://cnn_chips/S2/
gs://cnn_chips/S2Flood/
gs://cnn_chips/S2_NoQC/
gs://cnn_chips/cnn_checkpoints/

As part of the documentation in this repo, it would be helpful to have a brief (< 2 sentence) human readable description of what each of the directories and files in the bucket are for. For example, the only reference I can find to S1Flood is in the getTradFName function in Test_Models.ipynb. Does that mean that these are the labels for the Otsu Threshold-VH dataset or something else?

Download Issue

I downloaded the Sen1Flood11 dataset and managed to get all subdirectories. However, when I go to open some of the flood_events data, the majority of it is black and white. I'm new to this, so is this what I am supposed to be seeing? I thought I would see the original Sentinel Imagery from which the flooding pixels were derived from. Could someone just explain to me what I should be seeing? Thank you!

Necessary Data to Distinguish Permanent Water from Flood Water

Thank you for this fine work.

In trying to reproduce the experiments found in your paper, it seemed to me that the data on Google Storage are not sufficient to reproduce all of the results in the paper.

For example, there are results given for the various models on permanent water, flood water, and all water, but I was not able to find the labels necessary to distinguish between permanent water and flood water for the weakly-labeled Sentinel-1 case nor the weakly-labeled Sentinel-2 case (I think that I could have accomplished this by augmenting the dataset with additional labels from JRC, but I am curious if you already have these labels prepared and/or if I have overlooked something).

Similarly, in the case of the model trained on permanent water labels, the methodology to use to separately evaluate flood water and permanent water was not clear to me.

Thanks again.

Error with the train.ipynb example

Hello everyone,

tried to launch the train.ypinb example with Colab to understand how to use your dataset.
When I launch the training phase, I get this error at the end of Epoch 0:

Current Epoch: 0

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:61: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

model saved at checkpoints/Sen1Floods11_0_0.3114752471446991.cp
Training Loss: tensor(0.5943, device='cuda:0', grad_fn=<DivBackward0>)
Training IOU: tensor(0.2001, device='cuda:0')
Training Accuracy: tensor(0.8267, device='cuda:0')
Validation Loss: tensor(0.4088, device='cuda:0')
Validation IOU: tensor(0.3115, device='cuda:0')
Validation Accuracy: tensor(0.8518, device='cuda:0')

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

[<ipython-input-17-8365ceef919c>](https://localhost:8080/#) in <module>()
     18   epochs.append(i)
     19   x = epochs
---> 20   plt.plot(x, training_losses, label='training losses')
     21   plt.plot(x, training_accuracies, 'tab:orange', label='training accuracy')
     22   plt.plot(x, training_ious, 'tab:purple', label='training iou')

6 frames

<__array_function__ internals> in atleast_1d(*args, **kwargs)

[/usr/local/lib/python3.7/dist-packages/torch/_tensor.py](https://localhost:8080/#) in __array__(self, dtype)
    676             return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
    677         if dtype is None:
--> 678             return self.numpy()
    679         else:
    680             return self.numpy().astype(dtype, copy=False)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I tried to identify the problem, with no success unfortunately.
Could you give me some hints?

Moreover, could it be possible to have some more explanation in the example?
For istance, the training loop iterates over 1000 epochs, but I don't understand why the number 10 in:
train_validation_loop(net, optimizer, scheduler, train_loader, valid_loader, 10, i)

Thanks in advance for your help.

Why function download_flood_water_data_from_list(l) Variable A is reassigned 2 times. And if np.sum((arr_y != arr_y)) == 0: always return True

  arr_x = np.nan_to_num(getArrFlood(os.path.join("files/", im_fname)))
  arr_y = getArrFlood(os.path.join("files/", mask_fname))
  ignore = (arr_y == -1)
  ignore = ((np.uint8(ignore) * -1) * 256) + 1
  arr_y *= ignore
  arr_y = np.uint8(getArrFlood(os.path.join("files/", mask_fname)))
  if np.sum((arr_y != arr_y)) == 0:

Sentinel-2 weak label data

Does the dataset not include Sentinel-2 weak label data?

how to preprocess hlss,hlsl,s2 l1c and s1 image for the model?

I want to use the demo,but i don't know how to process these data ( hlss,hlsl,s2 l1c and s1), is there any standard steps or data requirements?

RuntimeError: "round" "_vml_cpu" not implemented for 'Int'

I tried to duplicate the notebook in the Google Colab, and I am getting some errors in my final step: Train model and assess metrics over epochs

Current Epoch: 0

RuntimeError Traceback (most recent call last)
in ()
17
18 for i in range(start, 1000):
---> 19 train_validation_loop(net, optimizer, scheduler, train_loader, valid_loader, 10, i)
20 epochs.append(i)
21 x = epochs

7 frames
in processTestIm(data)
82 if torch.sum(labels.gt(.003) * labels.lt(.004)):
83 labels *= 255
---> 84 labels = labels.round()
85
86 return ims, labels

RuntimeError: "round" "_vml_cpu" not implemented for 'Int'

Split image stack to seperate bands

In order to improve cloud storage access and ML training we should split the sentinel-2 and sentinel-1 imagery into single band TIFFs. Label can stay the same

Sentinel-2 13 band TIFF --> 13 single TIFFs for each band
Sentinel-1 2 band TIFF --> 2 single band TIFFs for each band