Code Monkey home page Code Monkey logo

model_desc's Introduction

Data source

  • scene data

    • CTW[1]
    • DIV2K[2]
    • Total-text[3]
  • text data

    • Synthetic data by program
    • Data collected by taking pictures of mobile phones

The simulator

  • parameter configuration
import random
kwargs = dict(
                L=random.randint(200, 400),
                D=random.uniform(0.06143, 0.091254),
                Cn2=random.uniform(5.7386e-14, 9.7386e-14),
                corr=random.choice(np.arange(-1, -.00, 0.01)).__round__(3),
            )   

Model

Our main method consists of transformer and image quality assessment(IQA):

  • overall pipeline:
  • Restormer[4]: This is an image restoration paper at CVPR2022 that achieves SOTA in four tasks: dahazing, deraining, super resolution, and deblurring. As it is described in this paper, it uses a progressive model structure based on the transformer. We adopted the restormer infrastructure and finetuned our model. We modified the input of the model to be multiple images. In addition, we analyze the atmospheric turbulence mitigation task. The frames are not very closely related to each other in terms of timing, but they are highly related in terms of space. We use a strategy of randomly selecting 20 frames from 100 frames and feeding them into the network to extract the spatial relationships between frames. This approach has been shown to be effective in a large number of experiments.
  • NIMA[5]: Image quality assessment (IQA) has an important position in image restoration. The goodness of the model can be evaluated by reference and non-reference image assessment methods. We used NIMA, a non-reference image quality assessment method proposed by Google in 2018, to score the input information and weight it with the reconstructed image. Compared to our manual setting of weights, the recognition accuracy improves from 89.6 to 94 and the psnr improves from 22.3 to 24.8.
  • Training process: We used the training strategy of Restormer, using the adam optimizer, and the learning rate was changed from 1e-4 to 1e-10 by CosineAnnealingWarmRestarts of Pytorch.
  • Training data: We design two parts of experimental data, the first part of the data contains a large number of text images and a small amount of image data, and the second part contains the same proportion of text images and scene images. In the training, we first train the first part of the data for about a week, and then send the second part of the data to the model on the basis of the first part of the data. This balances the performance of our method on textual and scene data.
  • Loss function: We have designed a large number of loss functions, such as Fourier loss, perceptual loss, etc., but most of them cannot guide the model to obtain a better recovery effect. In the final reference to the restormer, we used L1 loss, but additionally used ssim loss as an auxiliary loss.
  • On the noise: We use the synthesizer paper[6, 7, 8] and cycleISP[9] to think about the overall image imaging process, and find that it contains more than just atmospheric turbulence. Without considering other noises, such as Gaussian noise, Gaussian blur and other noises, it is difficult to obtain good results by directly synthesizing data for training. In the end we added other noises, such as Gaussian noise, Gaussian blur, lighting and etc., into the synthesis process, and we got a better result compared to just atmospheric turbulence noise.
  • Sliding window strategy[10]: We refer to and improve the windowing strategy, combine the information of 100 frames and conduct inference, and finally integrate it, which not only uses the spatial information but also all frame information. have been used. We have higher results in both recognition accuracy and PSNR.

Download ours data

  • We are preparing to upload the link, please contact us in time if necessary.

Reference

[1] A Large Chinese Text Dataset in the Wild

[2] NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study

[3] Total-Text: Towards Orientation Robustness in Scene Text Detection

[4] Restormer: Efficient Transformer for High-Resolution Image Restoration

[5] NIMA: Neural Image Assessment

[6] Turbulence Simulator v2: Phase-to-space transform

[7] Turbulence Simulator v1: Multi-aperture simulator

[8] Turbulence Reconstruction

[9] CycleISP: Real Image Restoration via Improved Data Synthesis

[10] Revisiting Global Statistics Aggregation for Improving Image Restoration

[11] Simple Baselines for Image Restoration

model_desc's People

Contributors

phymond avatar

Watchers

James Cloos avatar LiuZhuang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.