Code Monkey home page Code Monkey logo

cfat's Introduction

Abhisek Ray1, Gaurav Kumar1, Maheshkumar H. Kolekar1

1Indian Institute of Technology Patna, India

paper GitHub Stars Visitors

🧗 Updates

  • ✅ 2024-03-24: Release the first version of the paper at Arxiv.
  • ✅ 2024-03-24: Release the supplementary material of the paper at Arxiv.
  • ✅ 2024-03-24: Release the codes, models and results of CFAT.
  • ✅ 2024-06-11: Update repo with CVPR version.
  • (To do) Release the small (CFAT-S) and large (CFAT-L) versions of our model.
  • (To do) Add the pre-trained model of CFAT for SRx4.
  • (To do) Add a Replicate demo for CFAT model implementation.
  • (To do) Release extensive code of CFAT for Multiple Image Restoration tasks.

main figure

Abstract: Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion at the boundaries and has limited unique shifting modes. To overcome these weaknesses, we propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion and allows the model to access more unique sifting modes. In this paper, we propose a Composite Fusion Attention Transformer (CFAT) that incorporates triangular-rectangular window-based local attention with a channel-based global attention technique in image super-resolution. As a result, CFAT enables attention mechanisms to be activated on more image pixels and captures long-range, multi-scale features to improve SR performance. The extensive experimental results and ablation study demonstrate the effectiveness of CFAT in the SR domain. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.


Highlight

The triangular window mechanism that we proposed is beneficial not only in super-resolution tasks but also in various other computer vision applications that implement the rectangular window technique in their mainframe.


Results

  • Quantitative Results


Fig. Comparing performance (PSNR in dB) of various state-of-the-art models with CFAT.

  • Qualitative Results


Fig. Visual Comparison of CFAT with other state-of-the-art methods.

  • LAM Results


Fig. LAM results and corresponding Diffusion Index for CFAT and various SOTA methods.


Training Settings

  • Requirements

    • Platforms: Ubuntu 20.04.2, cuda-11.2.0
    • Python 3.8.18, PyTorch == 2.1.0
    • Requirements: see requirements.txt
  • Installation

# download code
git clone https://github.com/rayabhisek123/CFAT
cd CFAT
pip install -r requirements.txt
pip install basicsr
python setup.py develop

Training

Command to train CFAT after placing datasets at their respective repositories

CUDA_VISIBLE_DEVICES=0,1,2 python3 train.py

Checkpoints (Pre-trained Models)

The inference results on benchmark datasets will be available soon.


Testing

Run the following command after placing the Pre-trained Models in the given repository

CUDA_VISIBLE_DEVICES=0,1 python3 test.py

**Note: For different configurations, change the argument values. We will update the corresponding configuration files(.yml) soon.


Testing Results



Fig. Quantitative comparison of the CFAT with various state-of-the-art SR methods. Red: Best & Green: Second Best.


Citations

BibTeX

@article{ray2024cfat,
  title={CFAT: Unleashing TriangularWindows for Image Super-resolution},
  author={Ray, Abhisek and Kumar, Gaurav and Kolekar, Maheshkumar H},
  journal={arXiv preprint arXiv:2403.16143},
  year={2024}
}


Acknowledgement

Some parts of this code are adapted from:

We thank the authors for sharing codes of their great works.


License

This project is licensed under the MIT License and was originally developed by @abhisek-ray.


Contact

If you have any questions, please email [email protected] to discuss with the authors.

cfat's People

Contributors

rayabhisek123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cfat's Issues

gpu

请问怎么在一台有两张gpu的电脑上复现您的代码呢

移位是怎么移位的

矩形窗口的移位和三角形的移位是从左到右、从上到下吗,和swintransformer中的shifted window不一样吗

custom dataset issues

ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of your array with array.copy().)

setup.py

Hello,

is there a setup.py file missing?

Depth of CFAT

Thank you for sharing an amazing work.

I have a small question about the depth of CFAT. When i read the main paper, in the section 5.2, you wrote that "While configuring the proposed CFAT, we put 3 DWAB and 3 SWAB in an alternative order.". I understand that have 6 Window Attention Blocks in total. But when i looked at the supp. file, in the section 1., you wrote "We take (8, 8, 8, 8, 8, 8, 8, 8) WAB units for CFAT-l, (8, 8, 8, 8, 8) for CFAT, and (8, 8, 8) for CFAT-r.", it means CFAT has 5 Window Attention Blocks.
It's quite confused to me, can you confirm this point?

Issue about python setup.py develop

Traceback (most recent call last):
File "setup.py", line 81, in
write_version_py()
File "setup.py", line 58, in write_version_py
with open('VERSION', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'VERSION'

Issue for input

if the input is (1,1,40,28)
how to modify your code parameter to make your network work?
RuntimeError: shape '[1, 5, 8, 3, 8, 1]' is invalid for input of size 1120

There are something wrong in Table 3.

As describe in paper, "We can observe from Tab. 3 that the model gives the best performances for interval sizes 0 and 2. "

But in Table 3, the best performance for interval sizes is 4.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.