Code Monkey home page Code Monkey logo

publaynet's People

Contributors

ajjimeno avatar bdwyer2 avatar edwardleardi avatar kmh4321 avatar zhxgj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

publaynet's Issues

Add additional categories

Is it possible to start training with additional categories such as: heading2, heading3, ..., image description, ...?

model cannot be downloaded automatically.

I'm new in deep learning so i may be doing many things wrong. I'm just trying to re-do the deep layout parsel example to learn and I'm having trouble with this part of the code:
model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
# Load the deep layout model from the layoutparser API
# For all the supported model, please check the Model
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html

It gives me FileNotFoundError: [Errno 2] No such file or directory: '/root/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth' error and do not autmatically download the model. Why is that and how can i solve it?
PS: i tried to download it manually but this time i have corrupted file errors.

config for pdf2image

Good work. May I know what is your config when using pdf2image, e.g., dpi, size, etc.

Batch Inference Time Calculation

Hi Team,
I have a question regarding the run time of publaynet during Inference time or test time.
Right now I see arguments take one Image at a time. I tried passing multiple Images to arguments and processing one at a time and it works.

But I am curious and want to know how to process a batch of images at a time like may be batch of 128 or 64 ..etc.?

Can anybody estimate like time for 100 Images if processed as a batch?

Currently it's taking 14 Minutes for 1000 Images detection of sections in Images publaynet detections if process one by one on Tesla K 80 GPU or CPU as it is in evaluation mode

or does it follow detectron 2 which can process 8 images per second for training but not sure during test time?

Problem with example.tar.gz.

The link mentioned in the notebook downloads a broken/corrupted file.

fname = 'examples.tar.gz'
url = 'http://s3.us-south.cloud-object-storage.appdomain.cloud/dax-assets-dev/dax-publaynet/1.0.0/' + fname
r = requests.get(url)
open(fname , 'wb').write(r.content)

Mistakes in Table Examples

After separating the table I have checked some dataset, it appears that there are some mistakes in coordinates of tables, the mistakes are:

  1. line like the table, just a line not table
  2. Image as a table
  3. Coordinates only covering part of table but not covering the full tables.

ambiguous with labels mapping

Hi,
I am currently fine tuning layout parser on my custom dataset. I am using pubLayNet/faster_rcnn_R_50_FPN_3x as my base model but according to this model output label set is something like this. {0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}. but in my original PDF I just want to use "Title", "Section", "Paragraph", "ListItem", PageNumber""Table". my question is : what should be the order of the label mapping. Also, with use of pre-trained model its pretty much detecting tables in customdata and i Just dont want to ruin it. can you please suggest me how should I Move along?.

loading the pre-trained model throwing an error

loading the pre-trained model throwing an error

cfg = get_cfg()
#Use the final weights generated after successful training for inference

cfg.MODEL.WEIGHTS = "publaynet.pkl"
cfg.merge_from_file("faster_rcnn.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7

Non-existent config key: MODEL.TYPE

How can I map PubTabNet with PubLayNet

We want to start a research of end-to-end extract and recognize table from a big image, so I wonder how can I map a table image in PubTabNet into the original image in PubLayNet.

yaml.scanner.ScannerError: mapping values are not allowed here

Trying to run Detectron's inference demo with the PubLayNet models and .yaml configs.

My exact command is:

python demo.py --config-file ../models/e2e_faster_rcnn_X-101-64x4d-FPN_1x.yaml \
--input ../images/sample_image.png \
--opts MODEL.WEIGHTS ../dir2/models/Faster-RCNN.pkl

The full traceback is this:

[04/08 20:23:44 detectron2]: Arguments: Namespace(config_file='../models/e2e_faster_rcnn_X-101-64x4d-FPN_1x.yaml', webcam=False, video_input=None, input=['../images/Page 0.png'], output=None, confidence_threshold=0.5, opts=['MODEL.WEIGHTS', '../models/Faster-RCNN.pkl'])
Traceback (most recent call last):
  File "../detectron2/demo/demo.py", line 80, in <module>
    cfg = setup_cfg(args)
  File "../Documents/detectron2/demo/demo.py", line 26, in setup_cfg
    cfg.merge_from_file(args.config_file)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/detectron2/config/config.py", line 31, in merge_from_file
    loaded_cfg = self.load_yaml_with_base(cfg_filename, allow_unsafe=allow_unsafe)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/fvcore/common/config.py", line 61, in load_yaml_with_base
    cfg = yaml.safe_load(f)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/composer.py", line 58, in compose_document
    self.get_event()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/parser.py", line 118, in get_event
    self.current_event = self.state()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/parser.py", line 193, in parse_document_end
    token = self.peek_token()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 129, in peek_token
    self.fetch_more_tokens()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
    return self.fetch_value()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 577, in fetch_value
    raise ScannerError(None, None,
yaml.scanner.ScannerError: mapping values are not allowed here

My brief bits of research didn't turn up much more than this syntactical mistake.

I did a brief search through the .yaml to see if this was occurring, but I didn't see anything. Maybe I missed something, or my prompt is somehow wrong. Let's discuss.

I'm using Python 3.9.2 and detectron2 v0.4.

Wrong lables

Hi,

I have been trying to find labels just for tabular data. First of all there are too many None values in image column. I have tried filtering based on category_id=4 i.e., Table. After filtering, I got the Bbox and created a seperate column and tried seperating only the filtered images. When I looked into the filtered images there were too many images in which there was no table or table like structure present. Then, I tried creating the bbox over it and was getting the incorrect bounding box which is covering the partial table or in many cases no table at all. I am reading the bbox as (xmin ymin w h). Have tried other variations as well. Am I missing something here. Please help

Dataset format on IBM Developer

Hi,

First, thanks for moving your dataset off of box.com. Their interface was not very user-friendly.

The dataset format listed on the IBM Developer page does not match the contents of the downloaded archive. The images appear to all be JPEGs, not PNGs.

Could you please request that the dataset format be updated, or (even better due to JPEG compression artifacts) upload PNGs?

Download the dataset on command line

Nice paper and thanks for releasing the PubLayNet dataset!
Is there a way to download the dataset on command line? (instead of using webpage browsers)

Cheers.

Test.json?

Hi first of all thank you for this great dataset :-) Just one minor question, do you plan to also upload the test.json? I couldn't find it in the box.

Image IDs of Test set images

Hi PubLayNet Team,

Thank you for sharing the dataset. I am wondering is there a Json file (or a list of image id) of the Test set provided?

Mismatch between image and PDF

Firstly, thank you for your useful dataset.
I have download Publaynet in forms in image and PDF. But I noticed that the image and PDF of the same page are NOT the same size. For example, the size of PDF file is 600.05792, but the JPG image's size is 602792. So the annotation should be sightly different for these 2 type of files.
How can I solve this problem? Thank you again!

Download speed of this dataset is very slow from India

Thank you for sharing large scale lay out analysis document dataset. The download speed for this dataset is very slow from India. Sometimes, it often disconnected downloads. Is there any other way to download the dataset?

mean and std

what are the Mean and Std of the pretrained model?

bounding box for each list item

could you advise how to get bound box for each list item -- currently, a bounding box cover all list items; I would like to have a separate bound box for each item. Thanks.

Download questions

Great job,thank you for sharing such large-scale document data. However,the speed which i download these datasets is very slow. And, it often disconnected downloads, is there any other way to get these datasets?

instruction about training on my own dataset

Hi,

Thanks for sharing the useful dataset and pre-trained model.

Is the model trained on Detectron or Detectron2? I wonder if you could provide any detailed instruction about the fine-tuning on my dataset based on the pre-trained model.

Thank you.

Cannot access the competition website anymore.

The dataset does not contains test.json and the competition website is not working. Is there any method to use the test data, so that I can compare the performance with the teams joined the competition?

raw pdf files

Nice work and appreciated for the effort for making the dataset publicly available.

I am working document object detection now and would like to utilize the content of the document to help boost detection performance. So, I need the raw pdf files (from which you generates images ) to extract some content information. Could you please release them as well?

I think one of the major differences of object detection in document images and natural images is documents contain auxiliary text information absent in natural images. Incorporating this auxiliary information should help reach better detection results. It also should benefit for some NLP+CV tasks. Thanks.

How to generate bounding box from pdf and text?

I am very interested about simulating the layout of papers. And I can prepare a lot of text and images, but I don't know how to generate a document that seems real, and get the bounding box automatically. Any suggestion about this ?

License terms of the PubLayNet dataset

Hi, the document images are from the commercial use collection of PMC OA.
Then what are the license terms of the PubLayNet dataset?
Is the whole dataset also allowed for commercial purposes?

Thanks a lot!

Unsupported type found in checkpoint! weight_order: <class 'bytes'>

Hi,
I am Trying to do Fine tuning on my custom datasets using pre-trained models(MaskRCNN) published by PubLayNet on Detectron2.

Here is my cell Please have a look.

from detectron2.config import get_cfg

config_file = "/content/config/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml" #(wget https://raw.githubusercontent.com/ibm-aur-nlp/PubLayNet/master/pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml)
model_file = "/content/models/model_final.pkl" #(wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/pre-trained-models/Mask-RCNN/model_final.pkl)
cfg = get_cfg()
cfg._open_cfg(config_file)
cfg.DATASETS.TRAIN = ("layout_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_file   # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 3000    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

Gives output:

[05/13 08:55:08 d2.engine.defaults]: Model:
GeneralizedRCNN(
  (backbone): ResNet(
    (stem): BasicStem(
      (conv1): Conv2d(
        3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
    )
    (res2): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv1): Conv2d(
          64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
    )
    (res3): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv1): Conv2d(
          256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
    )
    (res4): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
        (conv1): Conv2d(
          512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (4): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (5): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
    )
  )
  (proposal_generator): RPN(
    (rpn_head): StandardRPNHead(
      (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (objectness_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))
      (anchor_deltas): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1))
    )
    (anchor_generator): DefaultAnchorGenerator(
      (cell_anchors): BufferList()
    )
  )
  (roi_heads): Res5ROIHeads(
    (pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
      )
    )
    (res5): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
        (conv1): Conv2d(
          1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
    )
    (box_predictor): FastRCNNOutputLayers(
      (cls_score): Linear(in_features=2048, out_features=2, bias=True)
      (bbox_pred): Linear(in_features=2048, out_features=4, bias=True)
    )
  )
)
[05/13 08:55:26 d2.data.build]: Removed 0 images with no usable annotations. 1691 images left.
[05/13 08:55:26 d2.data.build]: Distribution of instances among all 4 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|    Text    | 43760        |    Math    | 2864         |   Table    | 78           |
|   Image    | 312          |            |              |            |              |
|   total    | 47014        |            |              |            |              |
[05/13 08:55:26 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]
[05/13 08:55:26 d2.data.build]: Using training sampler TrainingSampler
[05/13 08:55:26 d2.data.common]: Serializing 1691 elements to byte tensors and concatenating them all ...
[05/13 08:55:26 d2.data.common]: Serialized dataset takes 5.30 MiB

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-25-fc8bb5a8baf3> in <module>()
     21 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
     22 trainer = DefaultTrainer(cfg)
---> 23 trainer.resume_or_load(resume=False)
     24 trainer.train()

4 frames

/usr/local/lib/python3.7/dist-packages/fvcore/common/checkpoint.py in _convert_ndarray_to_tensor(self, state_dict)
    357             if not isinstance(v, np.ndarray) and not isinstance(v, torch.Tensor):
    358                 raise ValueError(
--> 359                     "Unsupported type found in checkpoint! {}: {}".format(k, type(v))
    360                 )
    361             if not isinstance(v, torch.Tensor):

ValueError: Unsupported type found in checkpoint! weight_order: <class 'bytes'>

When tried to open the Pretrained model
torch.load(model_file)
outputs error:

RuntimeError                              Traceback (most recent call last)

<ipython-input-26-857e01592fb9> in <module>()
----> 1 torch.load(model_file)

1 frames

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    764     magic_number = pickle_module.load(f, **pickle_load_args)
    765     if magic_number != MAGIC_NUMBER:
--> 766         raise RuntimeError("Invalid magic number; corrupt file?")
    767     protocol_version = pickle_module.load(f, **pickle_load_args)
    768     if protocol_version != PROTOCOL_VERSION:

RuntimeError: Invalid magic number; corrupt file?

Is there any other way to use these pretrained models?

Jupyter Notebook is down

Hi, I am quite new to coding and would like to use the pre-trained model but it would be nice to have some sort of sample code to orient with. I saw that you have a jupyter notebook with an example, but apparently the link is broken?

Best wishes,
T

a dirty bbox data in val.json?

Hi! I found a strange bbox data in:
val.json ['annotations'][914]['bbox'] = 50.73, 89.65, 498.14, 0.0

20200219112158

Does the bbox mean "x_left, y_up, w, h"? But why there is a "0.0"?
Thanks!

Checksum for PubLayNet_PDF.tar.gz

Hello,
I tried downloading the pdf dataset, but I only unzipped around 10% before I ran into a data corruption issue. Are checksums or data splits available for the PubLayNet_PDF.tar.gz?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.