ibm-aur-nlp / publaynet Goto Github PK

View Code? Open in Web Editor NEW

866.0 866.0 160.0 44.06 MB

License: Other

Jupyter Notebook 99.91% Python 0.09%

publaynet's People

Contributors

Stargazers

Watchers

Forkers

millx2021 wini1680 liuhan26 sheirving kapitsa2811 krzynio rkshuai fendaq duood xuweidongkobe ray-mami aaaaaaada happog liyucode chixma bobld gaohaidong cccbleach alwc lambert-shirzad pkrouth neerajj9 guoyin90 jewelcai kmh4321 etrigger ninazizi gztangde jbecke agentoo7 zzmcdc zhzhuangxue greenmughal elnazsn1988 dasaprakashk eedanny shalevy1 edwardleardi yaoliuoa krishnansr wrznr rabi3elbeji fighting41love bdwyer2 anic1618 eduardishion lmpan mildlyautistic richiezhzh rajgupt codeaudit danglive himanshug396 tangxiaochu123230 iceflameworm chirag-kyal clowdr-app gds101054108 edwardpwtsoi basilcm xrosliang dev-strender mylabs-ai mtchibozo frances9992000 brandnewa yfeng14 queenzhon coderjoyce nmber5 askintution xingx001 maximiliano02 rocke2020 pekinghk luojie1024 wang91zhe cqray1990 jceg yeahmao lill98 david-lee-1990 andrew05200 juju2yoyo mabounassif yangyuyou vaseekaran-v aiwenforgit persistforever yangyin2016 jobinkv dean930610 yueyedeai selinamu fireae titan-figue xxjunxx carlbhy lhtrieu87 zzzzzzyyy

publaynet's Issues

submit to https://aieval.draco.res.ibm.com/challenge/41/overview

Did anyone join this competition? I don't know where to submit my result.

Add additional categories

Is it possible to start training with additional categories such as: heading2, heading3, ..., image description, ...?

model cannot be downloaded automatically.

I'm new in deep learning so i may be doing many things wrong. I'm just trying to re-do the deep layout parsel example to learn and I'm having trouble with this part of the code:
model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
# Load the deep layout model from the layoutparser API
# For all the supported model, please check the Model
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html

It gives me FileNotFoundError: [Errno 2] No such file or directory: '/root/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth' error and do not autmatically download the model. Why is that and how can i solve it?
PS: i tried to download it manually but this time i have corrupted file errors.

config for pdf2image

Good work. May I know what is your config when using pdf2image, e.g., dpi, size, etc.

Batch Inference Time Calculation

Hi Team,
I have a question regarding the run time of publaynet during Inference time or test time.
Right now I see arguments take one Image at a time. I tried passing multiple Images to arguments and processing one at a time and it works.

But I am curious and want to know how to process a batch of images at a time like may be batch of 128 or 64 ..etc.?

Can anybody estimate like time for 100 Images if processed as a batch?

Currently it's taking 14 Minutes for 1000 Images detection of sections in Images publaynet detections if process one by one on Tesla K 80 GPU or CPU as it is in evaluation mode

or does it follow detectron 2 which can process 8 images per second for training but not sure during test time?

Problem with example.tar.gz.

The link mentioned in the notebook downloads a broken/corrupted file.

fname = 'examples.tar.gz'
url = 'http://s3.us-south.cloud-object-storage.appdomain.cloud/dax-assets-dev/dax-publaynet/1.0.0/' + fname
r = requests.get(url)
open(fname , 'wb').write(r.content)

Mistakes in Table Examples

After separating the table I have checked some dataset, it appears that there are some mistakes in coordinates of tables, the mistakes are:

line like the table, just a line not table
Image as a table
Coordinates only covering part of table but not covering the full tables.

ambiguous with labels mapping

Hi,
I am currently fine tuning layout parser on my custom dataset. I am using pubLayNet/faster_rcnn_R_50_FPN_3x as my base model but according to this model output label set is something like this. {0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}. but in my original PDF I just want to use "Title", "Section", "Paragraph", "ListItem", PageNumber""Table". my question is : what should be the order of the label mapping. Also, with use of pre-trained model its pretty much detecting tables in customdata and i Just dont want to ruin it. can you please suggest me how should I Move along?.

loading the pre-trained model throwing an error

cfg = get_cfg()
#Use the final weights generated after successful training for inference

cfg.MODEL.WEIGHTS = "publaynet.pkl"
cfg.merge_from_file("faster_rcnn.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7

Non-existent config key: MODEL.TYPE

Is there a plan to open source the trained weights of MaskRCNN?

Thank you this large dataset. Is there a plan to open source the trained weights and code in keras or pytorch?

How can I map PubTabNet with PubLayNet

We want to start a research of end-to-end extract and recognize table from a big image, so I wonder how can I map a table image in PubTabNet into the original image in PubLayNet.

yaml.scanner.ScannerError: mapping values are not allowed here

Trying to run Detectron's inference demo with the PubLayNet models and .yaml configs.

My exact command is:

python demo.py --config-file ../models/e2e_faster_rcnn_X-101-64x4d-FPN_1x.yaml \
--input ../images/sample_image.png \
--opts MODEL.WEIGHTS ../dir2/models/Faster-RCNN.pkl

The full traceback is this:

[04/08 20:23:44 detectron2]: Arguments: Namespace(config_file='../models/e2e_faster_rcnn_X-101-64x4d-FPN_1x.yaml', webcam=False, video_input=None, input=['../images/Page 0.png'], output=None, confidence_threshold=0.5, opts=['MODEL.WEIGHTS', '../models/Faster-RCNN.pkl'])
Traceback (most recent call last):
  File "../detectron2/demo/demo.py", line 80, in <module>
    cfg = setup_cfg(args)
  File "../Documents/detectron2/demo/demo.py", line 26, in setup_cfg
    cfg.merge_from_file(args.config_file)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/detectron2/config/config.py", line 31, in merge_from_file
    loaded_cfg = self.load_yaml_with_base(cfg_filename, allow_unsafe=allow_unsafe)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/fvcore/common/config.py", line 61, in load_yaml_with_base
    cfg = yaml.safe_load(f)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/composer.py", line 58, in compose_document
    self.get_event()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/parser.py", line 118, in get_event
    self.current_event = self.state()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/parser.py", line 193, in parse_document_end
    token = self.peek_token()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 129, in peek_token
    self.fetch_more_tokens()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
    return self.fetch_value()
  File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 577, in fetch_value
    raise ScannerError(None, None,
yaml.scanner.ScannerError: mapping values are not allowed here

My brief bits of research didn't turn up much more than this syntactical mistake.

I did a brief search through the .yaml to see if this was occurring, but I didn't see anything. Maybe I missed something, or my prompt is somehow wrong. Let's discuss.

I'm using Python 3.9.2 and detectron2 v0.4.

Wrong lables

Hi,

I have been trying to find labels just for tabular data. First of all there are too many None values in image column. I have tried filtering based on category_id=4 i.e., Table. After filtering, I got the Bbox and created a seperate column and tried seperating only the filtered images. When I looked into the filtered images there were too many images in which there was no table or table like structure present. Then, I tried creating the bbox over it and was getting the incorrect bounding box which is covering the partial table or in many cases no table at all. I am reading the bbox as (xmin ymin w h). Have tried other variations as well. Am I missing something here. Please help

Dataset format on IBM Developer

Hi,

First, thanks for moving your dataset off of box.com. Their interface was not very user-friendly.

The dataset format listed on the IBM Developer page does not match the contents of the downloaded archive. The images appear to all be JPEGs, not PNGs.

Could you please request that the dataset format be updated, or (even better due to JPEG compression artifacts) upload PNGs?

Download the dataset on command line

Nice paper and thanks for releasing the PubLayNet dataset!
Is there a way to download the dataset on command line? (instead of using webpage browsers)

Cheers.

Thanks!

Thanks so much for this repo it has been amazingly useful. We used it to build ICLR 2020 virtual addition and added all the pictures this way!

https://twitter.com/srush_nlp/status/1253788694739386371

Test.json?

Hi first of all thank you for this great dataset :-) Just one minor question, do you plan to also upload the test.json? I couldn't find it in the box.

Do you supply the textline-ocr informations for publaynet?

If it's extracted from PDF and XML format, I am wondering did you extract text line and OCR information too? If not, could it be done in the future?

Downloading Pretrained Models

The links for downloading pre-trained model weights is not working..have they been moved to a new location

Image IDs of Test set images

Hi PubLayNet Team,

Thank you for sharing the dataset. I am wondering is there a Json file (or a list of image id) of the Test set provided?

Is it possible to get the annotation for each textline?

Thank you for sharing the dataset! Well, it would be convenient if we can utilize the annotations for each textline, including the corresponding bbox and text, especial for logical layout analysis tasks.

Mismatch between image and PDF

Firstly, thank you for your useful dataset.
I have download Publaynet in forms in image and PDF. But I noticed that the image and PDF of the same page are NOT the same size. For example, the size of PDF file is 600.05792, but the JPG image's size is 602792. So the annotation should be sightly different for these 2 type of files.
How can I solve this problem? Thank you again!

Download speed of this dataset is very slow from India

Thank you for sharing large scale lay out analysis document dataset. The download speed for this dataset is very slow from India. Sometimes, it often disconnected downloads. Is there any other way to download the dataset?

How can we use mask rcnn weights in keras?

Thank you for sharing the dataset! I want to use the maskrcnn with keras. How can we use the config and weights for the same?

mean and std

what are the Mean and Std of the pretrained model?

Any non-English document images in the dataset?

This is a nice dataset for research on NLP and CV. Thank you for making it publicly available.
Wondering any foreign language document image is included in the PubLayNet dataset?

cannot download examples.tar.gz

https://dax.cdn.appdomain.cloud/dax-publaynet/1.0.0/examples.tar.gz no longer works...is there a new location for this file?

The scripts/code used to match the PDF miner outputs on documents to the XML representations

Do you provide the scripts/code that you developed to match the PDFMiner outputs on the documents to the XML representation of the PDF page itself? Thanks

bounding box for each list item

could you advise how to get bound box for each list item -- currently, a bounding box cover all list items; I would like to have a separate bound box for each item. Thanks.

Download questions

Great job，thank you for sharing such large-scale document data. However，the speed which i download these datasets is very slow. And, it often disconnected downloads, is there any other way to get these datasets?

instruction about training on my own dataset

Hi,

Thanks for sharing the useful dataset and pre-trained model.

Is the model trained on Detectron or Detectron2? I wonder if you could provide any detailed instruction about the fine-tuning on my dataset based on the pre-trained model.

Thank you.

Cannot access the competition website anymore.

The dataset does not contains test.json and the competition website is not working. Is there any method to use the test data, so that I can compare the performance with the teams joined the competition?

raw pdf files

Nice work and appreciated for the effort for making the dataset publicly available.

I am working document object detection now and would like to utilize the content of the document to help boost detection performance. So, I need the raw pdf files (from which you generates images ) to extract some content information. Could you please release them as well?

I think one of the major differences of object detection in document images and natural images is documents contain auxiliary text information absent in natural images. Incorporating this auxiliary information should help reach better detection results. It also should benefit for some NLP+CV tasks. Thanks.

How to generate bounding box from pdf and text?

I am very interested about simulating the layout of papers. And I can prepare a lot of text and images, but I don't know how to generate a document that seems real, and get the bounding box automatically. Any suggestion about this ?

License terms of the PubLayNet dataset

Hi, the document images are from the commercial use collection of PMC OA.
Then what are the license terms of the PubLayNet dataset?
Is the whole dataset also allowed for commercial purposes?

Thanks a lot!

Unsupported type found in checkpoint! weight_order: <class 'bytes'>

Hi,
I am Trying to do Fine tuning on my custom datasets using pre-trained models(MaskRCNN) published by PubLayNet on Detectron2.

Here is my cell Please have a look.

from detectron2.config import get_cfg

config_file = "/content/config/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml" #(wget https://raw.githubusercontent.com/ibm-aur-nlp/PubLayNet/master/pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml)
model_file = "/content/models/model_final.pkl" #(wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/pre-trained-models/Mask-RCNN/model_final.pkl)
cfg = get_cfg()
cfg._open_cfg(config_file)
cfg.DATASETS.TRAIN = ("layout_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_file   # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 3000    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

Gives output:

[05/13 08:55:08 d2.engine.defaults]: Model:
GeneralizedRCNN(
  (backbone): ResNet(
    (stem): BasicStem(
      (conv1): Conv2d(
        3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
    )
    (res2): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv1): Conv2d(
          64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
    )
    (res3): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv1): Conv2d(
          256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
    )
    (res4): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
        (conv1): Conv2d(
          512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (4): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (5): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
    )
  )
  (proposal_generator): RPN(
    (rpn_head): StandardRPNHead(
      (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (objectness_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))
      (anchor_deltas): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1))
    )
    (anchor_generator): DefaultAnchorGenerator(
      (cell_anchors): BufferList()
    )
  )
  (roi_heads): Res5ROIHeads(
    (pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
      )
    )
    (res5): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
        (conv1): Conv2d(
          1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
    )
    (box_predictor): FastRCNNOutputLayers(
      (cls_score): Linear(in_features=2048, out_features=2, bias=True)
      (bbox_pred): Linear(in_features=2048, out_features=4, bias=True)
    )
  )
)
[05/13 08:55:26 d2.data.build]: Removed 0 images with no usable annotations. 1691 images left.
[05/13 08:55:26 d2.data.build]: Distribution of instances among all 4 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|    Text    | 43760        |    Math    | 2864         |   Table    | 78           |
|   Image    | 312          |            |              |            |              |
|   total    | 47014        |            |              |            |              |
[05/13 08:55:26 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]
[05/13 08:55:26 d2.data.build]: Using training sampler TrainingSampler
[05/13 08:55:26 d2.data.common]: Serializing 1691 elements to byte tensors and concatenating them all ...
[05/13 08:55:26 d2.data.common]: Serialized dataset takes 5.30 MiB

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-25-fc8bb5a8baf3> in <module>()
     21 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
     22 trainer = DefaultTrainer(cfg)
---> 23 trainer.resume_or_load(resume=False)
     24 trainer.train()

4 frames

/usr/local/lib/python3.7/dist-packages/fvcore/common/checkpoint.py in _convert_ndarray_to_tensor(self, state_dict)
    357             if not isinstance(v, np.ndarray) and not isinstance(v, torch.Tensor):
    358                 raise ValueError(
--> 359                     "Unsupported type found in checkpoint! {}: {}".format(k, type(v))
    360                 )
    361             if not isinstance(v, torch.Tensor):

ValueError: Unsupported type found in checkpoint! weight_order: <class 'bytes'>

When tried to open the Pretrained model
torch.load(model_file)
outputs error:

RuntimeError                              Traceback (most recent call last)

<ipython-input-26-857e01592fb9> in <module>()
----> 1 torch.load(model_file)

1 frames

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    764     magic_number = pickle_module.load(f, **pickle_load_args)
    765     if magic_number != MAGIC_NUMBER:
--> 766         raise RuntimeError("Invalid magic number; corrupt file?")
    767     protocol_version = pickle_module.load(f, **pickle_load_args)
    768     if protocol_version != PROTOCOL_VERSION:

RuntimeError: Invalid magic number; corrupt file?

Is there any other way to use these pretrained models?