mahmoudnafifi / c5 Goto Github PK

Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)

License: Apache License 2.0

Python 100.00%

color-constancy color-correction colorconstancy computational-photography cross-camera deep-learning hypernetworks iccv2021 illuminant-estimation machine-learning pytorch white-balance whitebalance

c5's People

Contributors

Stargazers

Watchers

Forkers

peterzs davekeck manipopopo linker6610 lcchust amandayeyan tx512185408 donjet hminle caogaofeng hgkang1226 albertflorinus jasonlee8768 alisterbaroi songtogether yelban composite-ai sky0098

c5's Issues

what dataset is the pre-trained model trained on?

What dataset is the pre-trained model trained on? If I want to run test, is the datasets like Cheng and Gehler be good test samples? I am just concerned whether these two sets are already in the training set for the pre-trained models.

Hello, I'm very interested in the work of C5, but I have a question regarding data collection. I'd like to know what kind of camera settings were used to capture these color-shifted images. When I take pictures with my EOS 600D, they appear as normal white, but due to lighting conditions, there is some color deviation. Can this also be considered as a color constancy task?

I can't understand (In src/ops.py, the function add_camera_name(dataset_dir) can be used to rename image filenames and corresponding ground-truth JSON files. Each JSON file should include a key named either illuminant_color_raw or gt_ill that has the ground-truth illuminant color of the corresponding image).Can you help me ?

Some questions about the Training and Testing details

Hi Dr. Afifi, Thanks for your great work!
I have read many of your inspiring papers about color constancy. From a private point of view, most of your papers are elegant enough and easy to understand, except the 'C5', which really confused me a lot.

I do not know if I have some misunderstanding of related learning such as Transfer learning or Transductive learning. I just could not understand the core training and testing processing of your method.

I have three questions below:

From your abstract, it seems that you only use the additional labels image at test time.

C5 approaches this problem through the lens of transductive inference: additional unlabeled images are provided as input to the model at test time

However, you also highlight that even in the training process, the model is trained on labeled and unlabeled images……

Our system is trained using labeled (and unlabeled) images from multiple cameras, but at test time our model is able to look at a set of the (unlabeled) test set images from a new camera.

I am not sure if I understand the meaning of query image and additional image in your training and testing procedure.

For instance, if you use sensor1's image as training and testing on sensor2

Training

The query image represents one of an image with a label from sensor1 and the additional image is selected from sensor1 without labels, am I right?

Testing

The query image represents one of an image with a label from sensor2 and the additional images are selected from sensor2 without labels, actually, just the same as the Training process except for the dataset, am I right?

If so, what the paper said that did not use any labels which are biased for the truth

In contrast, our technique requires no ground-truth labels for the unseen camera, and is essentially calibration-free for this new sensor.

If not so, I am confused that how the model could be updated with its parameters without labels in the testing process.

The leave-one-out evaluation approach

For camera-specific, it is clear that leave-one-out means you use n-1 images as training and the left 1 image as testing, looping it n times. However, I do not understand what the paper which focuses on cross-sensor, said here:

we adopt a leave-one-out cross-validation evaluation approach: for each dataset, we exclude all scenes and cameras used by the test set from our training images. For a fair comparison with FFCC [13], we trained FFCC using the same leave-one-out cross-validation evaluation approach.

Can you describe the detail of the leave-one-out method here, for instance, how did you re-train the FFCC method using such method?

The three above questions may be connected with each other.
I will be very grateful for your reply~

some question about the input image type

hello,I am very interested in your C5 work. But I have some doubts on the format of the input image.
When I use my own dataset, should its data be 16-bit?，Is it ok if I directly enter a .hdr format?
When the output is generated, what operations should be performed to get the feeling that accords with the human eye, such as CCM, AE and other operations？
Greatful thanks

Testing results

Hi Mahmoud,
For my work, one of my tasks is to reproduce the results mentioned in your paper. For this, I am testing the provided pre-trained model on the INETL-TAU dataset (7022 images) with m=7. Since the images are already black-level subtracted as mentioned on the dataset website, I am directly passing resized PNG images (384×256) to the model along with corresponding .json files (illuminant information). I am also using cross-validation but without the G multiplier for testing. My obtained results are as: Mean: 2.61, Median: 1.77, Best25: 0.57, Worst25: 1.44, Worst05: 2.16, Tri: 1.95, and Max: 28.39.

There is slight variation in results except the Worst25 which has a lot. As per my understanding, one reason could be the random sample selection nature of cross-validation. Is it so? or is there any other important step, I am missing?

Another thing to be mentioned, during the test I didn't mask out the color checker present in the scenes that you mentioned in the paper. Could you please provide details on it, how you did that? Because I think for the masking the coordinates for the color checker in each scene should be known.

Many detaild questions

Thanks for your great work! it has indeed sparked a lot of inspiration for me. However, there are several aspects that I would like to discuss further:

The paper mentioned: "To allow the network to reason about the set of additional input images in a way that is insensitive to their ordering, we adopt the permutation invariant pooling approach of Aittala et al."

1. Could you elaborate on why insensitivity to ordering is crucial? Specifically, I'm curious whether a sufficiently large training dataset would inherently cover all potential orderings.

Regarding the number of additional unlabeld images (m), it appears that were used in both the training and testing stages. From the ablation study, it seems that various values of m were only tested on the test camera, as illustrated in Table 4. I have a question about this:

2. During the training process, did you experiment with varying quantities for 'm', or was there a consistent fixed number applied throughout, for example, 8?

When m equals 1, I understand that this means only the query image is used during testing. If so, my question is:

4. Could you clarify whether m=1 only signifies the zero-shot condition, i.e., just inferring, or does it mean that the single query image is used for self-calibration, followed by parameter fixation, and then inference?

5. From the results shown in Table 4, it doesn't seem that the results improve as m increases(i.e., error(m=13)>error(m=7)). Could you provide some insights into this?

6. Have you considered using additional labeled images for fine-tuning? If so, would this lead to better results than the current method?

Thank you for taking the time to answer these questions. Your responses will be greatly beneficial to my understanding.

Where is the pretrain model ?

about preprocessing of the input image

Hi Mahmoud,

Thanks for sharing the great work!
May I know whether there is some preprocessing to the images input to the network. Here is what I observed, when the input is from the ffcc's dataset: https://github.com/google/ffcc/tree/master/data
the output looks good. But if input other data like NUS from here: http://cvil.eecs.yorku.ca/projects/public_html/illuminant/illuminant.html
the output is bad. Is this because I missed some preprocessing steps?

Thanks
Simon

Question about your data augmentation method and CIE XYZ color space

Hi, @mahmoudnafifi , I have a question about your data augmentation method.

I think that I have a little confusion about the color space transform process in general ISP, which converts WB applied raw image into CIE XYZ color space.
(I referenced the 2019 ICCV tutorial by your supervisor, Professor Michael Brown)

As far as I know, CST only changes the axis(or basis) representing the color, it doesn't change the unique color itself.
So from what I understand, the CIE XYZ images (with WB applied) for the same scene on two different devices are different because they represent colors (unique colors, which are different from each other) observed by different sensors in the canonical color space (axis, CIE XYZ space).

However, according to the data augmentation method presented in the paper, the above sentence I said is wrong.
According to the method used in your paper, since images in CIE XYZ space are device-independent, data augmentation in RAW corresponding to each device is possible using conversion/inverse transformation to CIE XYZ space.

I'd appreciate it if you could let me know which of the two is correct in the part where I'm mistaken.