Code Monkey home page Code Monkey logo

publaynet's Introduction

Publaynet

Company Articles DataSet

Overview

Category Training Set Validating Set Testing Set
Num of Images 20365 500 499
Percentage 95% 2.5% 2.5%

Training Set:

category #instances category #instances category #instances category #instances
chapter 11312 section 17471 clause 106931 total 135714

Validating Set:

category #instances category #instances category #instances category #instances
chapter 151 section 246 clause 3096 total 3493

Testing Set:

category #instances category #instances category #instances category #instances
chapter 151 section 249 clause 2947 total 3347

Download

All Files:

Images

Annotation

Dataset:

Model:

Pretrained on Publaynet Dataset

Trained on Company Articles Dataset

Python Files:

  • faster_rcnn_resnet101_coco_2018_01_28: backbone的预训练模型,用于publaynet数据集训练
  • visualizeSet.py: 可视化数据集
  • build.py: 构建优化器和学习率策略
  • utils.py: 使用publaynet数据集的工具文件
  • train.py: 使用publaynet数据集的训练文件
  • test_per_img.py: 可视化测试集的预测结果
  • predict.py: 使用publaynet数据集的预测文件

Requirements

Detectron2

Run on Google Colab:

Install Requirements and Clone Publaynet

!pip install pyyaml==5.1
!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
!git clone https://github.com/noba1anc3/Publaynet.git
cd Publaynet

Build Detectron2 from Source

After having the above dependencies and gcc & g++ ≥ 5, run:

!git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
!python -m pip install -e .
cd ..

# Or if you are on macOS
# CC=clang CXX=clang++ python -m pip install -e .

Train

Data Preparation

Mount Google Drive

from google.colab import drive
drive.mount('/content/drive/')

Copy Training and Testing Data to Publaynet's Path

mkdir data

cp -rf ../drive/'My Drive'/train.zip ./data/
cp -rf ../drive/'My Drive'/val.zip ./data/

cd data
!unzip train.zip
!unzip val.zip
cd ..

Finetune on Faster_RCNN_X_101_32x8d_FPN_3x

!python train.py -f False

Finetune on Publaynet's Pretrained Model

mkdir output
cp -rf ../drive/'My Drive'/model_final.pth ./output/
!python train.py -f True

Training Log

Training From Scratch

Training on Faster-RCNN Pretrained Model

Training on Pretrained Model Finetuned on Publaynet Dataset

Comparison

Training From Scratch & Training on Faster RCNN Pretrained Model

scratch & faster rcnn

Faster RCNN Pretrained Model & Publaynet Pretrained Model

faster rcnn & publaynet

Evaluation Result on Testing Set

Per-class AP

chapter AP section AP clause AP mAP
85.180 86.641 93.367 88.396

Average Precision

AP AP50 AP75 APs APm APl
88.396 99.037 98.956 NaN 80.382 88.964

Average Recall

AR1 AR10 AR100 ARs ARm ARl
57.0 91.4 92.0 NaN 84.8 92.1

publaynet's People

Contributors

noba1anc3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.