adithya-s-k / marker-api Goto Github PK

View Code? Open in Web Editor NEW

653.0 6.0 56.0 1.67 MB

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

License: GNU General Public License v3.0

Shell 2.48% Python 96.77% TeX 0.01% Dockerfile 0.74%

fastapi marker pdf-converter pdf-files pdf-parser pdf-parsing api rest-api

marker-api's Introduction

Marker API

Important

Marker API provides a simple endpoint for converting PDF documents to Markdown quickly and accurately. With just one click, you can deploy the Marker API endpoint and start converting PDFs seamlessly.

Features

Converts PDF to Markdown.
Can convert Multiple PDFs at the same time.
Supports a wide range of documents, including books and scientific papers.
Supports all languages.
Removes headers, footers, and other artifacts.
Formats tables and code blocks.
Extracts and saves images along with the Markdown.
Converts most equations to LaTeX.
Works on GPU, CPU, or MPS.

Comparison

Original PDF	Marker-API	PyPDF

Installation and Setup

🐍 Python

To install Marker API in a Python environment, follow these steps:

Clone the Marker API repository from GitHub:

git clone https://github.com/adithya-s-k/marker-api

Navigate to the cloned repository directory:

cd marker-api

Install the dependencies using the following commands:

poetry install or pip install -e .

After installation, you can run the server through marker_api command

marker_api

python server.py

🛳️ Docker

To use Marker API with Docker, execute the following commands:

Pull the Marker API Docker image from Docker Hub:
Run the Docker container, exposing port 8000: 👉🏼Docker Image

docker pull savatar101/marker-api:0.3
# if you are running on a gpu 
docker run --gpus all -p 8000:8000 savatar101/marker-api:0.3
# else
docker run -p 8000:8000 savatar101/marker-api:0.3

Alternatively, if you prefer to build the Docker image locally: Then, run the Docker container as follows:

docker build -t marker-api .
# if you are running on a gpu
docker run --gpus all -p 8000:8000 marker-api
# else
docker run -p 8000:8000 marker-api

✈️ Skypilot

SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. To deploy Marker API using Skypilot on any cloud provider, execute the following command:

pip install skypilot-nightly[all]

# setup skypilot with the cloud provider our your

sky launch skypilot.yaml

please refer to skypilot documentation for more information.

Usage

API Client Code:

Endpoint

URL: /convert
Method: POST

Request

Body Parameters:
- pdf_file: The PDF file to be converted. (Type: File)
- extract_images (Optional): Specify whether to extract images from the PDF. Default is true. (Type: Boolean)

Response

Success Response:
- Code: 200 OK
- Content: JSON containing the converted Markdown text, metadata, and optionally extracted image data.
```
{
    "markdown": "Converted Markdown text...",
    "metadata": {...},
    "images": {
        "image_1": "data:image/png;base64,<base64_encoded_image_data>",
        "image_2": "data:image/png;base64,<base64_encoded_image_data>",
        ...
    }
}
```
If images are included in the response, they are provided in base64-encoded format. You can use this data to display the images in your application. Additionally, you can use the following Python script invoke.py to invoke the endpoint with a local PDF file and save the images locally
Error Response:
- Code: 415 Unsupported Media Type
- Content: JSON containing error details.

Invoke Endpoint

CURL

curl -X POST \
  -F "[email protected];type=application/pdf" \
  http://localhost:8000/convert

Python

Please refer to examples on how to invoke the api and save it as Markdown Notebook , Script

import requests
import os

url = "http://localhost:8000/convert"
pdf_file_path = "example.pdf"
with open(pdf_file_path, 'rb') as pdf_file:
    pdf_content = pdf_file.read()
files = {'pdf_file': (os.path.basename(pdf_file_path), pdf_content, 'application/pdf')}
response = requests.post(url, files=files)

print(response.json())

JavaScript

const fetch = require('node-fetch');
const fs = require('fs');

const url = "http://localhost:8000/convert";
const pdfFilePath = "example.pdf";

fs.readFile(pdfFilePath, (err, pdfContent) => {
    if (err) {
        console.error(err);
        return;
    }

    const formData = new FormData();
    formData.append('pdf_file', new Blob([pdfContent], { type: 'application/pdf' }), pdfFilePath);

    fetch(url, {
        method: 'POST',
        body: formData
    })
    .then(response => response.json())
    .then(data => console.log(data))
    .catch(error => console.error('Error:', error));
});

Marker Readme

Marker converts PDF to markdown quickly and accurately.

Supports a wide range of documents (optimized for books and scientific papers)
Supports all languages
Removes headers/footers/other artifacts
Formats tables and code blocks
Extracts and saves images along with the markdown
Converts most equations to latex
Works on GPU, CPU, or MPS

How it works

Marker is a pipeline of deep learning models:

Extract text, OCR if necessary (heuristics, surya, tesseract)
Detect page layout and find reading order (surya)
Clean and format each block (heuristics, texify
Combine blocks and postprocess complete text (heuristics, pdf_postprocessor)

It only uses models where necessary, which improves speed and accuracy.

Examples

PDF	Type	Marker	Nougat
Think Python	Textbook	View	View
Think OS	Textbook	View	View
Switch Transformers	arXiv paper	View	View
Multi-column CNN	arXiv paper	View	View

Performance

The above results are with marker and nougat setup so they each take ~4GB of VRAM on an A6000.

See below for detailed speed and accuracy benchmarks, and instructions on how to run your own benchmarks.

Commercial usage

I want marker to be as widely accessible as possible, while still funding my development/training costs. Research and personal usage is always okay, but there are some restrictions on commercial usage.

The weights for the models are licensed cc-by-nc-sa-4.0, but I will waive that for any organization under $5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options here.

Community

Discord is where we discuss future development.

Limitations

PDF is a tricky format, so marker will not always work perfectly. Here are some known limitations that are on the roadmap to address:

Marker will not convert 100% of equations to LaTeX. This is because it has to detect then convert.
Tables are not always formatted 100% correctly - text can be in the wrong column.
Whitespace and indentations are not always respected.
Not all lines/spans will be joined properly.
This works best on digital PDFs that won't require a lot of OCR. It's optimized for speed, and limited OCR is used to fix errors.

Installation

You'll need python 3.9+ and PyTorch. You may need to install the CPU version of torch first if you're not using a Mac or a GPU machine. See here for more details.

Install with:

pip install marker-pdf

Optional: OCRMyPDF

Only needed if you want to use the optional ocrmypdf as the ocr backend. Note that ocrmypdf includes Ghostscript, an AGPL dependency, but calls it via CLI, so it does not trigger the license provisions.

See the instructions here

Usage

First, some configuration:

Inspect the settings in marker/settings.py. You can override any settings with environment variables.
Your torch device will be automatically detected, but you can override this. For example, TORCH_DEVICE=cuda.
- If using GPU, set INFERENCE_RAM to your GPU VRAM (per GPU). For example, if you have 16 GB of VRAM, set INFERENCE_RAM=16.
- Depending on your document types, marker's average memory usage per task can vary slightly. You can configure VRAM_PER_TASK to adjust this if you notice tasks failing with GPU out of memory errors.
By default, marker will use surya for OCR. Surya is slower on CPU, but more accurate than tesseract. If you want faster OCR, set OCR_ENGINE to ocrmypdf. This also requires external dependencies (see above). If you don't want OCR at all, set OCR_ENGINE to None.

Convert a single file

marker_single /path/to/file.pdf /path/to/output/folder --batch_multiplier 2 --max_pages 10 --langs English

--batch_multiplier is how much to multiply default batch sizes by if you have extra VRAM. Higher numbers will take more VRAM, but process faster. Set to 2 by default. The default batch sizes will take ~3GB of VRAM.
--max_pages is the maximum number of pages to process. Omit this to convert the entire document.
--langs is a comma separated list of the languages in the document, for OCR

Make sure the DEFAULT_LANG setting is set appropriately for your document. The list of supported languages for OCR is here. If you need more languages, you can use any language supported by Tesseract if you set OCR_ENGINE to ocrmypdf. If you don't need OCR, marker can work with any language.

Convert multiple files

marker /path/to/input/folder /path/to/output/folder --workers 10 --max 10 --metadata_file /path/to/metadata.json --min_length 10000

--workers is the number of pdfs to convert at once. This is set to 1 by default, but you can increase it to increase throughput, at the cost of more CPU/GPU usage. Parallelism will not increase beyond INFERENCE_RAM / VRAM_PER_TASK if you're using GPU.
--max is the maximum number of pdfs to convert. Omit this to convert all pdfs in the folder.
--min_length is the minimum number of characters that need to be extracted from a pdf before it will be considered for processing. If you're processing a lot of pdfs, I recommend setting this to avoid OCRing pdfs that are mostly images. (slows everything down)
--metadata_file is an optional path to a json file with metadata about the pdfs. If you provide it, it will be used to set the language for each pdf. If not, DEFAULT_LANG will be used. The format is:

{
  "pdf1.pdf": {"languages": ["English"]},
  "pdf2.pdf": {"languages": ["Spanish", "Russian"]},
  ...
}

You can use language names or codes. The exact codes depend on the OCR engine. See here for a full list for surya codes, and here for tesseract.

Convert multiple files on multiple GPUs

MIN_LENGTH=10000 METADATA_FILE=../pdf_meta.json NUM_DEVICES=4 NUM_WORKERS=15 marker_chunk_convert ../pdf_in ../md_out

METADATA_FILE is an optional path to a json file with metadata about the pdfs. See above for the format.
NUM_DEVICES is the number of GPUs to use. Should be 2 or greater.
NUM_WORKERS is the number of parallel processes to run on each GPU. Per-GPU parallelism will not increase beyond INFERENCE_RAM / VRAM_PER_TASK.
MIN_LENGTH is the minimum number of characters that need to be extracted from a pdf before it will be considered for processing. If you're processing a lot of pdfs, I recommend setting this to avoid OCRing pdfs that are mostly images. (slows everything down)

Note that the env variables above are specific to this script, and cannot be set in local.env.

Troubleshooting

There are some settings that you may find useful if things aren't working the way you expect:

OCR_ALL_PAGES - set this to true to force OCR all pages. This can be very useful if the table layouts aren't recognized properly by default, or if there is garbled text.
TORCH_DEVICE - set this to force marker to use a given torch device for inference.
OCR_ENGINE - can set this to surya or ocrmypdf.
DEBUG - setting this to True shows ray logs when converting multiple pdfs
Verify that you set the languages correctly, or passed in a metadata file.
If you're getting out of memory errors, decrease worker count (increased the VRAM_PER_TASK setting). You can also try splitting up long PDFs into multiple files.

In general, if output is not what you expect, trying to OCR the PDF is a good first step. Not all PDFs have good text/bboxes embedded in them.

Benchmarks

Benchmarking PDF extraction quality is hard. I've created a test set by finding books and scientific papers that have a pdf version and a latex source. I convert the latex to text, and compare the reference to the output of text extraction methods. It's noisy, but at least directionally correct.

Benchmarks show that marker is 4x faster than nougat, and more accurate outside arXiv (nougat was trained on arXiv data). We show naive text extraction (pulling text out of the pdf with no processing) for comparison.

Speed

Method	Average Score	Time per page	Time per document
marker	0.613721	0.631991	58.1432
nougat	0.406603	2.59702	238.926

Accuracy

First 3 are non-arXiv books, last 3 are arXiv papers.

Method	multicolcnn.pdf	switch_trans.pdf	thinkpython.pdf	thinkos.pdf	thinkdsp.pdf	crowd.pdf
marker	0.536176	0.516833	0.70515	0.710657	0.690042	0.523467
nougat	0.44009	0.588973	0.322706	0.401342	0.160842	0.525663

Peak GPU memory usage during the benchmark is 4.2GB for nougat, and 4.1GB for marker. Benchmarks were run on an A6000 Ada.

Throughput

Marker takes about 4.5GB of VRAM on average per task, so you can convert 10 documents in parallel on an A6000.

Running your own benchmarks

You can benchmark the performance of marker on your machine. Install marker manually with:

git clone https://github.com/VikParuchuri/marker.git
poetry install

Download the benchmark data here and unzip. Then run benchmark.py like this:

python benchmark.py data/pdfs data/references report.json --nougat

This will benchmark marker against other text extraction methods. It sets up batch sizes for nougat and marker to use a similar amount of GPU RAM for each.

Omit --nougat to exclude nougat from the benchmark. I don't recommend running nougat on CPU, since it is very slow.

Thanks

This work would not have been possible without amazing open source models and datasets, including (but not limited to):

Surya
Texify
Pypdfium2/pdfium
DocLayNet from IBM
ByT5 from Google

Thank you to the authors of these models and datasets for making them available to the community!

To Do

Create server
Add support for single PDF upload
Add support for multi PDF upload
Docker support and Skypilot support
Implement handling for multiple PDF uploads simultaneously.
Introduce a toggle mode to generate Markdown without including images in the output.
Enhance GPU utilization and optimize performance for efficient processing.
Implement dynamic adjustment of batch size based on available VRAM.
Live update API on progress of conversion

Throughput Benchmarks

Updates on throughput benchmarks will be available soon.

Acknowledgements

This project is built on top of the remarkable marker project created by VikParuchuri. We express our gratitude for the inspiration and foundation provided by this project.

marker-api's People

Contributors

Stargazers

Watchers

Forkers

ssghost boxed-dev bonjomondo ttbug xushilundao dave7922 qq137321 kalends badexception galaxycenter hlzhu1983 abnershang zivenyang jonnyquan smartforwarder haodaohong herm-studio annabelleluo startime-h sitdownkevin jesean gaecom yijiaquan ealyn musarehmani291 wizd alexgeng1981 kebingzao curiosity007 rayelgan joemartini sakura4036 starmagic youserx joelsenior jidechao kqazhang maminge prasanthrpadharthi annihilatorrrr sanyaade-teachings lordk911 tianbingsheng jamalhaider790 yukiman76 ppkliu biao506756 kilowon lvwzhen osamahothman mjdhasan bunjunwang ergmax raahulraawat chriswinsatlife zasource-dev

marker-api's Issues

Cloudflare tunnel timeout issue

Hi,

I use cloudflare tunnel to make server in local network accessible through public network, and CF has limitation with 100 seconds to response otherwise the connection would be closed. Most of my pdf files take more than 100s to convert so I wonder if you can add function that response to client and hold the connection before file been converted.

Best regards

兼容下openai不香吗

Docker execution problem

I run it with docker but there is a problem

what should I do
Environment: macOS 14.6 Intel

Specify marker-pdf version behind the marker-api

Git log branch sperated with marker at commit 6f8b239c4a4811cb24cbaf8bb2e452ace9d99222 (tag: v0.2.5), but I saw some code of marker:v0.2.12 commited by @adithya-s-k . Please show the marker version of each savatar101/marker-api image explicitly.
By the way, put the marker in dependencies and import would be better, than picking code from it manually again and again.

Tried to convert multiple pdf files, reported error: itmalloc(): unsorted double linked list corrupted

Tried to convert multiple pdf files：

reported error:

Want to provide a Docker version

Use Docker to configure the environment and provide API services to facilitate confirmation and service. Thanks for contribution

Detecting bboxes: 0%| | 0/105 [00:00<?, ?it/s][W NNPACK.cpp:61] Could not initialize NNPACK! Reason: Unsupported hardware.

How to convert the response result into Markdown?

I'm using Python to request the API and have obtained response.json(). How can I transform it into a .md file?

ImportError: cannot import name 'segformer' from 'surya.model.detection'

Hello, I followed the instructions from your readme file, yet ended up in running into error. And the detailed error info attached hereof:

yutang@bogon marker-api % marker_api
/Users/yutang/Library/Python/3.9/lib/python/site-packages/urllib3/init.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: urllib3/urllib3#3020
warnings.warn(
Traceback (most recent call last):
File "/Users/yutang/Library/Python/3.9/bin/marker_api", line 5, in
from server import main
File "/Users/yutang/marker-api/server.py", line 10, in
from marker.parse import parse_single_pdf # Import function to parse PDF
File "/Users/yutang/marker-api/marker/parse.py", line 14, in
from marker.ocr.recognition import run_ocr
File "/Users/yutang/marker-api/marker/ocr/recognition.py", line 11, in
from marker.models import setup_recognition_model
File "/Users/yutang/marker-api/marker/models.py", line 2, in
from surya.model.detection import segformer
ImportError: cannot import name 'segformer' from 'surya.model.detection' (unknown location)

响应码 422是什么问题

跟着教程走发送convert请求返回422
<Response [422]>
{'detail': [{'type': 'missing', 'loc': ['body', 'pdf_files'], 'msg': 'Field required', 'input': None}]}

texify文件加载加载不到

Loaded texify model to cuda with torch.float16 dtype
Traceback (most recent call last):
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connection.py", line 616, in connect
self.sock = sock = self._new_conn()
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connection.py", line 213, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f79e98f8b50>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/requests/adapters.py", line 589, in send
resp = conn.urlopen(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /vikp/texify/resolve/main/preprocessor_config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f79e98f8b50>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 66, in send
return super().send(request, *args, **kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/requests/adapters.py", line 622, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /vikp/texify/resolve/main/preprocessor_config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f79e98f8b50>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 23834538-08c7-4914-bc99-b3e7222b0f50)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/transformers/utils/hub.py", line 399, in cached_file
resolved_file = hf_hub_download(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1826, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/gr/marker-api/server.py", line 14, in
model_list = load_all_models()
File "/home/gr/marker-api/marker/models.py", line 58, in load_all_models
texify = setup_texify_model()
File "/home/gr/marker-api/marker/models.py", line 28, in setup_texify_model
texify_processor = load_texify_processor()
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/texify/model/processor.py", line 24, in load_processor
processor = VariableDonutProcessor.from_pretrained(settings.MODEL_CHECKPOINT)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/transformers/processing_utils.py", line 465, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/transformers/processing_utils.py", line 511, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/transformers/models/auto/image_processing_auto.py", line 363, in from_pretrained
config_dict, _ = ImageProcessingMixin.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/transformers/image_processing_utils.py", line 334, in get_image_processor_dict
resolved_image_processor_file = cached_file(
File "/home/dell/anaconda3/envs/py39_maker_grr/lib/python3.9/site-packages/transformers/utils/hub.py", line 442, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like vikp/texify is not the path to a directory containing a file named preprocessor_config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.