ibm-aur-nlp / publaynet Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Did anyone join this competition? I don't know where to submit my result.
Is it possible to start training with additional categories such as: heading2, heading3, ..., image description, ...?
I'm new in deep learning so i may be doing many things wrong. I'm just trying to re-do the deep layout parsel example to learn and I'm having trouble with this part of the code:
model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
# Load the deep layout model from the layoutparser API
# For all the supported model, please check the Model
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html
It gives me FileNotFoundError: [Errno 2] No such file or directory: '/root/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth' error and do not autmatically download the model. Why is that and how can i solve it?
PS: i tried to download it manually but this time i have corrupted file errors.
Good work. May I know what is your config when using pdf2image, e.g., dpi, size, etc.
Hi Team,
I have a question regarding the run time of publaynet during Inference time or test time.
Right now I see arguments take one Image at a time. I tried passing multiple Images to arguments and processing one at a time and it works.
But I am curious and want to know how to process a batch of images at a time like may be batch of 128 or 64 ..etc.?
Can anybody estimate like time for 100 Images if processed as a batch?
Currently it's taking 14 Minutes for 1000 Images detection of sections in Images publaynet detections if process one by one on Tesla K 80 GPU or CPU as it is in evaluation mode
or does it follow detectron 2 which can process 8 images per second for training but not sure during test time?
The link mentioned in the notebook downloads a broken/corrupted file.
fname = 'examples.tar.gz'
url = 'http://s3.us-south.cloud-object-storage.appdomain.cloud/dax-assets-dev/dax-publaynet/1.0.0/' + fname
r = requests.get(url)
open(fname , 'wb').write(r.content)
After separating the table I have checked some dataset, it appears that there are some mistakes in coordinates of tables, the mistakes are:
Hi,
I am currently fine tuning layout parser on my custom dataset. I am using pubLayNet/faster_rcnn_R_50_FPN_3x as my base model but according to this model output label set is something like this. {0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}
. but in my original PDF I just want to use "Title", "Section", "Paragraph", "ListItem", PageNumber""Table"
. my question is : what should be the order of the label mapping. Also, with use of pre-trained model its pretty much detecting tables in customdata and i Just dont want to ruin it. can you please suggest me how should I Move along?.
cfg = get_cfg()
#Use the final weights generated after successful training for inference
cfg.MODEL.WEIGHTS = "publaynet.pkl"
cfg.merge_from_file("faster_rcnn.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7
Non-existent config key: MODEL.TYPE
Thank you this large dataset. Is there a plan to open source the trained weights and code in keras or pytorch?
We want to start a research of end-to-end extract and recognize table from a big image, so I wonder how can I map a table image in PubTabNet into the original image in PubLayNet.
Trying to run Detectron's inference demo with the PubLayNet models and .yaml configs.
My exact command is:
python demo.py --config-file ../models/e2e_faster_rcnn_X-101-64x4d-FPN_1x.yaml \
--input ../images/sample_image.png \
--opts MODEL.WEIGHTS ../dir2/models/Faster-RCNN.pkl
The full traceback is this:
[04/08 20:23:44 detectron2]: Arguments: Namespace(config_file='../models/e2e_faster_rcnn_X-101-64x4d-FPN_1x.yaml', webcam=False, video_input=None, input=['../images/Page 0.png'], output=None, confidence_threshold=0.5, opts=['MODEL.WEIGHTS', '../models/Faster-RCNN.pkl'])
Traceback (most recent call last):
File "../detectron2/demo/demo.py", line 80, in <module>
cfg = setup_cfg(args)
File "../Documents/detectron2/demo/demo.py", line 26, in setup_cfg
cfg.merge_from_file(args.config_file)
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/detectron2/config/config.py", line 31, in merge_from_file
loaded_cfg = self.load_yaml_with_base(cfg_filename, allow_unsafe=allow_unsafe)
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/fvcore/common/config.py", line 61, in load_yaml_with_base
cfg = yaml.safe_load(f)
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/__init__.py", line 162, in safe_load
return load(stream, SafeLoader)
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/__init__.py", line 114, in load
return loader.get_single_data()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/composer.py", line 58, in compose_document
self.get_event()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/parser.py", line 118, in get_event
self.current_event = self.state()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/parser.py", line 193, in parse_document_end
token = self.peek_token()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 129, in peek_token
self.fetch_more_tokens()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
return self.fetch_value()
File "../anaconda3/envs/pytorch_env/lib/python3.9/site-packages/yaml/scanner.py", line 577, in fetch_value
raise ScannerError(None, None,
yaml.scanner.ScannerError: mapping values are not allowed here
My brief bits of research didn't turn up much more than this syntactical mistake.
I did a brief search through the .yaml to see if this was occurring, but I didn't see anything. Maybe I missed something, or my prompt is somehow wrong. Let's discuss.
I'm using Python 3.9.2 and detectron2 v0.4.
Hi,
I have been trying to find labels just for tabular data. First of all there are too many None values in image column. I have tried filtering based on category_id=4 i.e., Table. After filtering, I got the Bbox and created a seperate column and tried seperating only the filtered images. When I looked into the filtered images there were too many images in which there was no table or table like structure present. Then, I tried creating the bbox over it and was getting the incorrect bounding box which is covering the partial table or in many cases no table at all. I am reading the bbox as (xmin ymin w h). Have tried other variations as well. Am I missing something here. Please help
Hi,
First, thanks for moving your dataset off of box.com. Their interface was not very user-friendly.
The dataset format listed on the IBM Developer page does not match the contents of the downloaded archive. The images appear to all be JPEGs, not PNGs.
Could you please request that the dataset format be updated, or (even better due to JPEG compression artifacts) upload PNGs?
Nice paper and thanks for releasing the PubLayNet dataset!
Is there a way to download the dataset on command line? (instead of using webpage browsers)
Cheers.
Thanks so much for this repo it has been amazingly useful. We used it to build ICLR 2020 virtual addition and added all the pictures this way!
Hi first of all thank you for this great dataset :-) Just one minor question, do you plan to also upload the test.json? I couldn't find it in the box.
If it's extracted from PDF and XML format, I am wondering did you extract text line and OCR information too? If not, could it be done in the future?
The links for downloading pre-trained model weights is not working..have they been moved to a new location
Hi PubLayNet Team,
Thank you for sharing the dataset. I am wondering is there a Json file (or a list of image id) of the Test set provided?
Thank you for sharing the dataset! Well, it would be convenient if we can utilize the annotations for each textline, including the corresponding bbox and text, especial for logical layout analysis tasks.
Firstly, thank you for your useful dataset.
I have download Publaynet in forms in image and PDF. But I noticed that the image and PDF of the same page are NOT the same size. For example, the size of PDF file is 600.05792, but the JPG image's size is 602792. So the annotation should be sightly different for these 2 type of files.
How can I solve this problem? Thank you again!
Thank you for sharing large scale lay out analysis document dataset. The download speed for this dataset is very slow from India. Sometimes, it often disconnected downloads. Is there any other way to download the dataset?
Thank you for sharing the dataset! I want to use the maskrcnn with keras. How can we use the config and weights for the same?
what are the Mean and Std of the pretrained model?
This is a nice dataset for research on NLP and CV. Thank you for making it publicly available.
Wondering any foreign language document image is included in the PubLayNet dataset?
https://dax.cdn.appdomain.cloud/dax-publaynet/1.0.0/examples.tar.gz no longer works...is there a new location for this file?
Do you provide the scripts/code that you developed to match the PDFMiner outputs on the documents to the XML representation of the PDF page itself? Thanks
could you advise how to get bound box for each list item -- currently, a bounding box cover all list items; I would like to have a separate bound box for each item. Thanks.
Great job,thank you for sharing such large-scale document data. However,the speed which i download these datasets is very slow. And, it often disconnected downloads, is there any other way to get these datasets?
Hi,
Thanks for sharing the useful dataset and pre-trained model.
Is the model trained on Detectron or Detectron2? I wonder if you could provide any detailed instruction about the fine-tuning on my dataset based on the pre-trained model.
Thank you.
The dataset does not contains test.json and the competition website is not working. Is there any method to use the test data, so that I can compare the performance with the teams joined the competition?
Nice work and appreciated for the effort for making the dataset publicly available.
I am working document object detection now and would like to utilize the content of the document to help boost detection performance. So, I need the raw pdf files (from which you generates images ) to extract some content information. Could you please release them as well?
I think one of the major differences of object detection in document images and natural images is documents contain auxiliary text information absent in natural images. Incorporating this auxiliary information should help reach better detection results. It also should benefit for some NLP+CV tasks. Thanks.
I am very interested about simulating the layout of papers. And I can prepare a lot of text and images, but I don't know how to generate a document that seems real, and get the bounding box automatically. Any suggestion about this ?
Hi, the document images are from the commercial use collection of PMC OA.
Then what are the license terms of the PubLayNet dataset?
Is the whole dataset also allowed for commercial purposes?
Thanks a lot!
Hi,
I am Trying to do Fine tuning on my custom datasets using pre-trained models(MaskRCNN) published by PubLayNet on Detectron2.
Here is my cell Please have a look.
from detectron2.config import get_cfg
config_file = "/content/config/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml" #(wget https://raw.githubusercontent.com/ibm-aur-nlp/PubLayNet/master/pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml)
model_file = "/content/models/model_final.pkl" #(wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/pre-trained-models/Mask-RCNN/model_final.pkl)
cfg = get_cfg()
cfg._open_cfg(config_file)
cfg.DATASETS.TRAIN = ("layout_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_file # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR
cfg.SOLVER.MAX_ITER = 3000 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
Gives output:
[05/13 08:55:08 d2.engine.defaults]: Model:
GeneralizedRCNN(
(backbone): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
)
(proposal_generator): RPN(
(rpn_head): StandardRPNHead(
(conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(objectness_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))
(anchor_deltas): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
)
(roi_heads): Res5ROIHeads(
(pooler): ROIPooler(
(level_poolers): ModuleList(
(0): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
(box_predictor): FastRCNNOutputLayers(
(cls_score): Linear(in_features=2048, out_features=2, bias=True)
(bbox_pred): Linear(in_features=2048, out_features=4, bias=True)
)
)
)
[05/13 08:55:26 d2.data.build]: Removed 0 images with no usable annotations. 1691 images left.
[05/13 08:55:26 d2.data.build]: Distribution of instances among all 4 categories:
| category | #instances | category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
| Text | 43760 | Math | 2864 | Table | 78 |
| Image | 312 | | | | |
| total | 47014 | | | | |
[05/13 08:55:26 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]
[05/13 08:55:26 d2.data.build]: Using training sampler TrainingSampler
[05/13 08:55:26 d2.data.common]: Serializing 1691 elements to byte tensors and concatenating them all ...
[05/13 08:55:26 d2.data.common]: Serialized dataset takes 5.30 MiB
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-fc8bb5a8baf3> in <module>()
21 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
22 trainer = DefaultTrainer(cfg)
---> 23 trainer.resume_or_load(resume=False)
24 trainer.train()
4 frames
/usr/local/lib/python3.7/dist-packages/fvcore/common/checkpoint.py in _convert_ndarray_to_tensor(self, state_dict)
357 if not isinstance(v, np.ndarray) and not isinstance(v, torch.Tensor):
358 raise ValueError(
--> 359 "Unsupported type found in checkpoint! {}: {}".format(k, type(v))
360 )
361 if not isinstance(v, torch.Tensor):
ValueError: Unsupported type found in checkpoint! weight_order: <class 'bytes'>
When tried to open the Pretrained model
torch.load(model_file)
outputs error:
RuntimeError Traceback (most recent call last)
<ipython-input-26-857e01592fb9> in <module>()
----> 1 torch.load(model_file)
1 frames
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
764 magic_number = pickle_module.load(f, **pickle_load_args)
765 if magic_number != MAGIC_NUMBER:
--> 766 raise RuntimeError("Invalid magic number; corrupt file?")
767 protocol_version = pickle_module.load(f, **pickle_load_args)
768 if protocol_version != PROTOCOL_VERSION:
RuntimeError: Invalid magic number; corrupt file?
Is there any other way to use these pretrained models?
Hi, I am quite new to coding and would like to use the pre-trained model but it would be nice to have some sort of sample code to orient with. I saw that you have a jupyter notebook with an example, but apparently the link is broken?
Best wishes,
T
how can I contact with simo team in task A? I am interested with their method.
Hi All,
Thanks a lot of this awesome Dataset and pretrained weights . I wanted to know how can i use this for prediction of bounding box given a page image ?
Hello,
I tried downloading the pdf dataset, but I only unzipped around 10% before I ran into a data corruption issue. Are checksums or data splits available for the PubLayNet_PDF.tar.gz?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.