Code Monkey home page Code Monkey logo

hrnet-for-fashion-landmark-estimation.pytorch's Introduction

HRNet for Fashion Landmark Estimation

(Modified from deep-high-resolution-net.pytorch)

Introduction

This code applies the HRNet (Deep High-Resolution Representation Learning for Human Pose Estimation) onto fashion landmark estimation task using the DeepFashion2 dataset. HRNet maintains high-resolution representations throughout the forward path. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise.

Illustrating the architecture of the proposed HRNet

Please note that every image in DeepFashion2 contains multiple fashion items, while our model assumes that there exists only one item in each image. Therefore, what we feed into the HRNet is not the original image but the cropped ones provided by a detector. In experiments, one can either use the ground truth bounding box annotation to generate the input data or use the output of a detecter (you can try this clothing detector).

Main Results

Landmark Estimation Performance on DeepFashion2 Test set

We won the third place in the "DeepFashion2 Challenge 2020 - Track 1 Clothes Landmark Estimation" competition. DeepFashion2 Challenge 2020 - Track 1 Clothes Landmark Estimation

Landmark Estimation Performance on DeepFashion2 Validation Set

Arch BBox Source AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet Detector 0.579 0.793 0.658 0.460 0.581 0.706 0.939 0.784 0.548 0.708
pose_hrnet GT 0.702 0.956 0.801 0.579 0.703 0.740 0.965 0.827 0.592 0.741

Quick start

Installation

  1. Install pytorch >= v1.2 following official instruction. Note that if you use pytorch's version < v1.0.0, you should follow the instruction at https://github.com/Microsoft/human-pose-estimation.pytorch to disable cudnn's implementations of BatchNorm layer. We encourage you to use higher pytorch's version(>=v1.0.0)

  2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  5. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    |-- lib
    |-- tools 
    |-- experiments
    |-- models
    |-- data
    |-- log
    |-- output
    |-- README.md
    `-- requirements.txt
    
  6. Download pretrained models from our Onedrive Cloud Storage

Data preparation

Our experiments were conducted on DeepFashion2, clone this repo, and we'll call the directory that you cloned as ${DF2_ROOT}.

1) Download the dataset

Extract the dataset under ${POSE_ROOT}/data.

2) Convert annotations into coco-type

The above code repo provides a script to convert annotations into coco-type.

We uploaded our converted annotation file onto OneDrive named as train/val-coco_style.json. We also made truncated json files such as train-coco_style-32.json meaning the first 32 samples in the dataset to save the loading time during development period.

3) Install the deepfashion_api

Enter ${DF2_ROOT}/deepfashion2_api/PythonAPI and run

python setup.py install

Note that the deepfashion2_api is modified from the cocoapi without changing the package name. Therefore, conflicts occur if you try to install this package when you have installed the original cocoapi in your computer. We provide two feasible solutions: 1) run our code in a virtualenv 2) use the deepfashion2_api as a local pacakge. Also note that deepfashion2_api is different with cocoapi mainly in the number of classes and the values of standard variations for keypoints.

At last the directory should look like this:

${POSE_ROOT}
|-- data
`-- |-- deepfashion2
    `-- |-- train
        |   |-- image
        |   |-- annos                           (raw annotation)
        |   |-- train-coco_style.json           (converted annotation file)
        |   `-- train-coco_style-32.json      (truncated for fast debugging)
        |-- validation
        |   |-- image
        |   |-- annos                           (raw annotation)
        |   |-- val-coco_style.json             (converted annotation file)
        |   `-- val-coco_style-64.json        (truncated for fast debugging)
        `-- json_for_test
            `-- keypoints_test_information.json

Training and Testing

Note that the GPUS parameter in the yaml config file is deprecated. To select GPUs, use the environment varaible:

 export CUDA_VISIBLE_DEVICES=1

Testing on DeepFashion2 dataset with BBox from ground truth using trained models:

python tools/test.py \
    --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth \
    TEST.USE_GT_BBOX True

Testing on DeepFashion2 dataset with BBox from a detector using trained models:

python tools/test.py \
    --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth \
    TEST.DEEPFASHION2_BBOX_FILE data/bbox_result_val.pkl \

Training on DeepFashion2 dataset using pretrained models:

python tools/train.py \
    --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml \
     MODEL.PRETRAINED models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth

Other options

python tools/test.py \
    ... \
    DATASET.MINI_DATASET True \ # use a subset of the annotation to save loading time
    TAG 'experiment description' \ # this info will appear in the output directory name
    WORKERS 4 \ # num_of_worker for the dataloader
    TEST.BATCH_SIZE_PER_GPU 8 \
    TRAIN.BATCH_SIZE_PER_GPU 8 \

OneDrive Cloud Storage

OneDrive

We provide the following files:

  • Model checkpoint files
  • Converted annotation files in coco-type
  • Bounding box results from our self-implemented detector in a pickle file.
hrnet-for-fashion-landmark-estimation.pytorch
|-- models
|   `-- pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth
|
|-- data
|   |-- bbox_result_val.pkl
|   |
`-- |-- deepfashion2
    `---|-- train
        |   |-- train-coco_style.json           (converted annotation file)
        |   `-- train-coco_style-32.json      (truncated for fast debugging)
        `-- validation
            |-- val-coco_style.json             (converted annotation file)
            `-- val-coco_style-64.json        (truncated for fast debugging)
        

Discussion

Experiment Configuration

  • For the regression target of keypoint heatmaps, we tuned the standard deviation value sigma and finally set it to 2.
  • During training, we found that the data augmentation from the original code was too intensive which makes the training process unstable. We weakened the augmentation parameters and observed performance gain.
  • Due to the imbalance of classes in DeepFashion2 dataset, the model's performance on different classes varies a lot. Therefore, we adopted a weighted sampling strategy rather than the naive random shuffling strategy, and observed performance gain.
  • We expermented with the value of weight decay, and found that either 1e-4 or 1e-5 harms the performance. Therefore, we simply set weight decay to 0.

hrnet-for-fashion-landmark-estimation.pytorch's People

Contributors

dependabot[bot] avatar shenhanqian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hrnet-for-fashion-landmark-estimation.pytorch's Issues

for catId in catIds} File "/home/sa/anaconda3/envs/torch1.7/lib/python3.6/site-packages/pycocotools/cocoeval.py", line 229, in computeOks e = (dx**2 + dy**2) / vars / (gt['area']+np.spacing(1)) / 2 ValueError: operands could not be broadcast together with shapes (294,) (17,)

训练的时候出现以下错误
for catId in catIds}
File "/home/sa/anaconda3/envs/torch1.7/lib/python3.6/site-packages/pycocotools/cocoeval.py", line 229, in computeOks
e = (dx2 + dy2) / vars / (gt['area']+np.spacing(1)) / 2
ValueError: operands could not be broadcast together with shapes (294,) (17,)
@ShenhanQian

Visuliaztion problem

First, thanks for sharing this great work! Here is some issue that I met.

I try to visualize the result by running the script###

python tools/test.py --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml TEST.MODEL_FILE models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth TEST.USE_GT_BBOX True DATASET.MINI_DATASET True TAG 'experiment description' WORKERS 4 TEST.BATCH_SIZE_PER_GPU 8 TRAIN.BATCH_SIZE_PER_GPU 8

the config file is

AUTO_RESUME: false #
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
DATA_DIR: ''
GPUS: (1,)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 8
PRINT_FREQ: 100
PIN_MEMORY: true

DATASET:
COLOR_RGB: false
DATASET: 'deepfashion2'
DATA_FORMAT: jpg
FLIP: true
NUM_JOINTS_HALF_BODY: 8
PROB_HALF_BODY: 0.3
ROOT: 'data/deepfashion2/'
ROT_FACTOR: 15 #45
SCALE_FACTOR: 0.1 #0.35
TEST_SET: 'validation'
TRAIN_SET: 'train'
MINI_DATASET: True
SELECT_CAT: [1,2,3,4,5,6,7,8,9,10,11,12,13]
MODEL:
INIT_WEIGHTS: true
NAME: pose_hrnet
NUM_JOINTS: 294
PRETRAINED: ''
TARGET_TYPE: gaussian
IMAGE_SIZE:

  • 288
  • 384
    HEATMAP_SIZE:
  • 72
  • 96
    SIGMA: 2 # 3
    EXTRA:
    PRETRAINED_LAYERS:
    • 'conv1'
    • 'bn1'
    • 'conv2'
    • 'bn2'
    • 'layer1'
    • 'transition1'
    • 'stage2'
    • 'transition2'
    • 'stage3'
    • 'transition3'
    • 'stage4'
      FINAL_CONV_KERNEL: 1
      STAGE2:
      NUM_MODULES: 1
      NUM_BRANCHES: 2
      BLOCK: BASIC
      NUM_BLOCKS:
      • 4
      • 4
        NUM_CHANNELS:
      • 48
      • 96
        FUSE_METHOD: SUM
        STAGE3:
        NUM_MODULES: 4
        NUM_BRANCHES: 3
        BLOCK: BASIC
        NUM_BLOCKS:
      • 4
      • 4
      • 4
        NUM_CHANNELS:
      • 48
      • 96
      • 192
        FUSE_METHOD: SUM
        STAGE4:
        NUM_MODULES: 3
        NUM_BRANCHES: 4
        BLOCK: BASIC
        NUM_BLOCKS:
      • 4
      • 4
      • 4
      • 4
        NUM_CHANNELS:
      • 48
      • 96
      • 192
      • 384
        FUSE_METHOD: SUM
        LOSS:
        USE_TARGET_WEIGHT: true
        TRAIN:
        BATCH_SIZE_PER_GPU: 8
        SHUFFLE: true
        BEGIN_EPOCH: 0
        END_EPOCH: 210
        OPTIMIZER: adam
        LR: 0.001 #0.001
        LR_FACTOR: 0.1
        LR_STEP:
  • 170
  • 200
    WD: 0.
    GAMMA1: 0.99
    GAMMA2: 0.0
    MOMENTUM: 0.9
    NESTEROV: false
    TEST:
    BATCH_SIZE_PER_GPU: 8
    COCO_BBOX_FILE: ''
    DEEPFASHION2_BBOX_FILE: ''
    BBOX_THRE: 1.0
    IMAGE_THRE: 0.0 # threshold for detected bbox to be feed into HRNet
    IN_VIS_THRE: 0.2
    MODEL_FILE: ''
    NMS_THRE: 1.0
    OKS_THRE: 0.9 # the lower threshold for a peak point in a heatmap to be kept
    USE_GT_BBOX: true
    FLIP_TEST: true
    POST_PROCESS: true
    SHIFT_HEATMAP: true
    DEBUG:
    DEBUG: True
    SAVE_BATCH_IMAGES_GT: false
    SAVE_BATCH_IMAGES_PRED: false
    SAVE_BATCH_IMAGES_GT_PRED: True
    SAVE_HEATMAPS_GT: false
    SAVE_HEATMAPS_PRED: false

I change the CONFIG parameter to True, however it still does not save any image. The image saving only works when I change the BATH_SIZE_PER_GPU to 1. However, the image-saving function is based on a torch grid, thus result in a very wired visualization since the scale of keypoint and output is different. Could you please try to solve the problem? I am using a single GPU RTX 3080TI with Ubuntu 18.04.

infer on test image

how can we get clothes bbox, clothes class, scale and other parameters, if we want to run this repo on test image

Question about testing

val_400_gt_pred

I test the checkpoint, using the command in Readme.
python tools/test.py
--cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml
TEST.MODEL_FILE models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth
TEST.USE_GT_BBOX True

I open the debug switch in config file, and the saved image result is very bad. Any show problem?

Question about the test result

Thanks for great job! And there is a question about the output of pretained model provided.
When I test pretained model, it return a tensor whose shape is [batchsize, 294, height, weight]. Could you give some expanation about the number "294" ?

How to predict landmark for new images?

First, I would like to thank authors for sharing the code. The work inspired me a lot and I‘m trying to followup your method for finding keypoints using my own input images.
However, I found that it is hard to directly modify test.py to meet a situation when the input is not the validation set, which really confused me. Is there a straightforward way to generate landmark with input images not in the deepfashion2 dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.