Comments (11)
Try adding this to your nextjs config:
const nextConfig = {
experimental: {
serverComponentsExternalPackages: ['pdf2json'],
},
};
from pdf2json.
Hey @dprothero im glad my solution helped you.
I did use this on the main process in electron, and it worked fine,
here is the exact code:
const extractFromPdf = (pdfPath):Promise<string[]> => {
return new Promise((resolve, reject) => {
const pdfParser = new PDFParser();
pdfParser.on('pdfParser_dataError', errData => {
console.error(errData)
reject(errData)
});
pdfParser.on('pdfParser_dataReady', pdfData => {
const processedData = [];
// ... code to extract text and other content from pdfData ...
resolve(processedData)
});
let dataBuffer = fs.readFileSync(pdfPath);
pdfParser.parseBuffer(dataBuffer);
})
}
However, i still had some errors in production, so i ended up using the library as standalone module, outside package.json, you can find the modified version here:
https://github.com/OferElfassi/pdf2json_standalone.git
you will need to install "@xmldom/xmldom" package as well to make it work.
So i copied the folder to my project, and used it like this:
import PDFParser from "./pdf2json_standalone/pdfparser";
... rest of the code is the same as above.
Hope this helps ☺
from pdf2json.
@niemal I ran into the same issue... it would work with some PDFs and hang with others.
I switched to pdf-parse-fork and it's been working great.
from pdf2json.
It's not the ideal solution, but it works for me.
I had the same issue when using this library in the ElectronJs app.
I ended up changing each call to nodeUtil.p2j to console.. (warn, error, info, log)
As a result, the library functions as expected, but the console output is not as nice as it was before.
But I disabled console output in production anyway, so it's not a problem for me.
The mofified version can be found here: https://github.com/OferElfassi/pdf2jsonForElectron.git,
or you can simply install it using:
yarn add https://github.com/OferElfassi/pdf2jsonForElectron.git.
from pdf2json.
@OferElfassi I came here with the same issue with Electron! Thanks for sharing your module.
from pdf2json.
@OferElfassi your solution builds in Electron, but have you actually successfully run pdf2json from within Electron? I'm trying to run it from the main process and it just hangs, never firing pdfParser_dataReady
nor pdfParser_dataError
.
const pdfData: PdfData = await new Promise<PdfData>((resolve, reject) => {
const pdfParser = new PDFParser();
pdfParser.on("pdfParser_dataError", (errData: unknown) => {
reject(errData);
});
pdfParser.on("pdfParser_dataReady", async (pdfData: PdfData) => {
resolve(pdfData);
});
pdfParser.loadPDF(localPdf);
});
The console output shows as follows:
Load OK: C:\Users\david\Dropbox\DnD\Campaigns\Empire of the Chromatic Conclave\NPCs\zombie-minion.pdf
Warning: Setting up fake worker.
PDF loaded. pagesCount = 1
start to parse page:1
Skipped: tiny fill: 0 x 0
I can run the very same code in a stand-alone node process and it works just fine on the same PDF file with the following output:
Load OK: C:\Users\david\Dropbox\DnD\Campaigns\Empire of the Chromatic Conclave\NPCs\zombie-minion.pdf
Warning: Setting up fake worker.
PDF loaded. pagesCount = 1
start to parse page:1
Skipped: tiny fill: 0 x 0
Success: Page 1
complete parsing page:1
Conspicuously absent fron the Electron debug output are the last two lines showing successful parsing of the page.
So, running it from Electron, it's hanging somewhere in parsing the PDF. 😢
from pdf2json.
Try adding this to your nextjs config:
const nextConfig = { experimental: { serverComponentsExternalPackages: ['pdf2json'], }, };
Now it throws the following error:
⨯ src\server\generation\index.ts (127:17) @ eval
⨯ TypeError: pdf2json__WEBPACK_IMPORTED_MODULE_1__.default is not a constructor
from pdf2json.
@OferElfassi your solution builds in Electron, but have you actually successfully run pdf2json from within Electron? I'm trying to run it from the main process and it just hangs, never firing
pdfParser_dataReady
norpdfParser_dataError
.const pdfData: PdfData = await new Promise<PdfData>((resolve, reject) => { const pdfParser = new PDFParser(); pdfParser.on("pdfParser_dataError", (errData: unknown) => { reject(errData); }); pdfParser.on("pdfParser_dataReady", async (pdfData: PdfData) => { resolve(pdfData); }); pdfParser.loadPDF(localPdf); });The console output shows as follows:
Load OK: C:\Users\david\Dropbox\DnD\Campaigns\Empire of the Chromatic Conclave\NPCs\zombie-minion.pdf Warning: Setting up fake worker. PDF loaded. pagesCount = 1 start to parse page:1 Skipped: tiny fill: 0 x 0
I can run the very same code in a stand-alone node process and it works just fine on the same PDF file with the following output:
Load OK: C:\Users\david\Dropbox\DnD\Campaigns\Empire of the Chromatic Conclave\NPCs\zombie-minion.pdf Warning: Setting up fake worker. PDF loaded. pagesCount = 1 start to parse page:1 Skipped: tiny fill: 0 x 0 Success: Page 1 complete parsing page:1
Conspicuously absent fron the Electron debug output are the last two lines showing successful parsing of the page.
So, running it from Electron, it's hanging somewhere in parsing the PDF. 😢
I am getting the same behavior on specific PDF files. @dprothero Did you fix it? I am also using the way @OferElfassi suggested, and it works but very rarely without just hanging. Perhaps it's because pdf2json updated their stuff and you need to update the standalone version? I would appreciate it if you took a look!
from pdf2json.
@niemal I ran into the same issue... it would work with some PDFs and hang with others.
I switched to pdf-parse-fork and it's been working great.
Yeah turns out that's a memory leak and the whole thing is unusable in production. pdf-parse seems to work much more resiliently.
from pdf2json.
@niemal how do you know it's a memory leak
from pdf2json.
@darklight9811 were you able to fix this issue?
from pdf2json.
Related Issues (20)
- v2.0.2 has the latest tag on npm instead of v3.0.3 HOT 1
- pdf2json 2.0.0 parses radio button as checkbox with single entry in Boxset. HOT 1
- Height of texts
- Expose pdfunit
- when passing an invalid buffer to PdfReader, an error should be returned.
- ERR_REQUIRE_ESM error while importing pdf2json HOT 2
- TypeError [ERR_INVALID_ARG_TYPE]: The "cb" argument must be of type function. Received undefined HOT 3
- node 18 HOT 2
- TypeError: Cannot read property 'free' of undefined
- Text without spaces or line breaks HOT 4
- Boxes for Radio Buttons always return "checked": false
- Node.js Server got stuck when parsing specific PDF while it is working for other PDFs HOT 2
- fields with periods are truncated HOT 1
- TypeError: pdf2json_1.default is not a constructor HOT 3
- Property 'getRawTextContent' does not exist on type 'Pdfparser'.ts(2339) HOT 6
- The interface for `Line` is missing the `l` property HOT 1
- ENOENT: no such file or directory - util.js HOT 10
- How to detect the HLines correctly?
- Without a ToUnicode CMap, PDF viewers can't map glyphs to Unicode values -> rely on pdf.js?? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf2json.