Code Monkey home page Code Monkey logo

ttst's Introduction

TTST (IEEE TIP 2024)

📖Paper | 🖼️PDF

PyTorch codes for "TTST: A Top-k Token Selective Transformer for Remote Sensing Image Super-Resolution", IEEE Transactions on Image Processing (TIP), 2024.

🎉🎉 News 🎉🎉

Abstract

Transformer-based method has demonstrated promising performance in image super-resolution tasks, due to its long-range and global aggregation capability. However, the existing Transformer brings two critical challenges for applying it in large-area earth observation scenes: (1) redundant token representation due to most irrelevant tokens; (2) single-scale representation which ignores scale correlation modeling of similar ground observation targets. To this end, this paper proposes to adaptively eliminate the interference of irreverent tokens for a more compact self-attention calculation. Specifically, we devise a Residual Token Selective Group (RTSG) to grasp the most crucial token by dynamically selecting the top-k keys in terms of score ranking for each query. For better feature aggregation, a Multi-scale Feed-forward Layer (MFL) is developed to generate an enriched representation of multi-scale feature mixtures during feed-forward process. Moreover, we also proposed a Global Context Attention (GCA) to fully explore the most informative components, thus introducing more inductive bias to the RTSG for an accurate reconstruction. In particular, multiple cascaded RTSGs form our final Top-k Token Selective Transformer (TTST) to achieve progressive representation. Extensive experiments on simulated and real-world remote sensing datasets demonstrate our TTST could perform favorably against state-of-the-art CNN-based and Transformer-based methods, both qualitatively and quantitatively. In brief, TTST outperforms the state-of-the-art approach (HAT-L) in terms of PSNR by 0.14 dB on average, but only accounts for 47.26% and 46.97% of its computational cost and parameters.

Network

image

🧩 Install

git clone https://github.com/XY-boy/TTST.git

Environment

  • CUDA 11.1
  • Python 3.9.13
  • PyTorch 1.9.1
  • Torchvision 0.10.1
  • basicsr 1.4.2

🎁 Dataset

Please download the following remote sensing benchmarks:

Data Type AID DOTA-v1.0 DIOR NWPU-RESISC45
Training Download None None None
Testing Download Download Download Download

🧩 Usage

Quick Test

Download Pre-trained Model

  • Step I. Use the structure below to prepare your dataset.

/xxxx/xxx/ (your data path)

/GT/ 
   /000.png  
   /···.png  
   /099.png  
/LR/ 
   /000.png  
   /···.png  
   /099.png  
  • Step II. Change the --data_dir to your data path.
  • Step III. Run the eval_4x.py
python eval_4x.py

Train

python train_4x.py

🖼️ Results

Quantitative

image

Visual

image

Acknowledgments

Our TTST mainly borrows from DRSFormer (https://github.com/cschenxiang/DRSformer) and SKNet.
Thanks for these excellent open-source works!

Contact

If you have any questions or suggestions, feel free to contact me.
Email: [email protected]; [email protected]

Citation

If you find our work helpful in your research, please consider citing it. We appreciate your support!😊

@ARTICLE{xiao2024ttst,
  author={Xiao, Yi and Yuan, Qiangqiang and Jiang, Kui and He, Jiang and Lin, Chia-Wen and Zhang, Liangpei},
  journal={IEEE Transactions on Image Processing}, 
  title={TTST: A Top-k Token Selective Transformer for Remote Sensing Image Super-Resolution}, 
  year={2024},
  volume={33},
  number={},
  pages={738-752},
  doi={10.1109/TIP.2023.3349004}
}

ttst's People

Contributors

xy-boy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ttst's Issues

Significance of rgb_mean

Hi,

Thank you for your code! I have been very interested in one aspect in particular. Many papers' code implementations (including SwinIR, HAT, DAT, and more recently, SPAN) make use of the rgb_mean code that is commented out here.
I have noticed that doing the same with the SPAN code implementation code greatly increases training stability, but breaks compatibility due to its interaction with the network features.

I was wondering if you could explain why you've chosen not to include it in your implementation, and what pros and cons you know of when using it?

Kind regards,
terrainer

AID Datasets Preparation

Hi, there:
Can I ask you about how you deal with the AID datasets from changed it from 600px to 512px, and how to make its LR data?

Metrics

Thank you very much for your excellent work. But here, I have a question I want to consult, about the inference stage of the matrics calculation I have not seen in the existing code, may I ask which code is used to achieve it? If you are free, please reply, thank you!

Error when running with different sizes of image

Hi,

Thank you so much for your work!

I have just tested your code with an image of 510x339 pixel but I got an error

x = x.view(b, h // window_size, window_size, w // window_size, window_size, c)
RuntimeError: shape '[1, 42, 8, 63, 8, 1]' is invalid for input of size 172890

Therefore, I cropped that image to 128x128 pixel and it works as normal. The reason to choose the size of 128x128 comes from your example image "airport_285.png" in your source code.

Could you please check and fix this problem?

Thank you for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.