Code Monkey home page Code Monkey logo

cvpdl-ntucsie-2023's Introduction

Computer Vision Practice with Deep Learning

2023 Spring NTU CSIE

Homework 1: Transformer-Based Object Detection

  • Objective: Implement object detection using the DINO framework.
  • Techniques and Implementation:
    • Employed advanced object detection methods using the DINO (DEtection with Transformers) framework.
    • Re-trained the model on a given dataset to tailor it for specific object detection tasks.
    • Utilized a pretrained model, checkpoint0033_4scale.pth, fine-tuned on the COCO 2017 dataset.
    • Backbone architecture used was R50.
  • Performance Metrics:
    • Achieved a mean Average Precision (mAP) of 0.5233.

Homework 2: Generic Object Detection and Practical Issue Survey

  • Objective: Analyzed convergence issues in DETR models and explored advanced models and techniques.
  • Key Highlights:
    • Analyzed DETR (End-to-End Object Detection with Transformers) for convergence issues.
    • Compared DAB-DETR and DN-DETR, addressing slow convergence of DETR.
    • Explored knowledge distillation methods, both from logits and intermediate layers.
    • Discussed CLIP's training process and zero-shot classification applications.
    • Investigated challenges in Open-Vocabulary Object Detection with RegionCLIP.

Homework 3: Object Detection and Data Augmentation

  • Objective: Addressed data imbalance in object detection using foundation models and data augmentation techniques.
  • Key Features:
    • Utilized BLIP2 for image captioning and GLIGEN for data augmentation.
    • Addressed data imbalance from HW1's dataset using generated prompts for image generation.
    • Applied text-to-image generation techniques to augment object detection datasets.
    • Evaluated using Fréchet Inception Distance (FID) for the quality of generated images.
    • Improved detection model performance post data augmentation.
    • Demonstrated techniques through examples of image captioning and text-to-image generation.

cvpdl-ntucsie-2023's People

Contributors

jessiechin7 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.