Comments (3)
Thanks for reporting the issue.
It looks like this is the tabula-java issue, which happens with he specific PDF. I can find similar issue in their repo.
tabulapdf/tabula-java#218
Would you mind if you could provide the PDF and report it on tabula-java?
from tabula-py.
Okay, I confirmed the issue happens with --lattice
option for tabula-java with the file. It doesn't raise an error without --lattice
option.
$ java -Dfile.encoding=UTF8 -jar tabula/tabula-1.0.5-jar-with-dependencies.jar --pages 1 --lattice ~/Downloads/test_pdf_output.pdf
Exception in thread "main" java.lang.IllegalArgumentException: lines must be orthogonal, vertical and horizontal
at technology.tabula.Ruling.intersectionPoint(Ruling.java:214)
at technology.tabula.Ruling.findIntersections(Ruling.java:378)
at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.findCells(SpreadsheetExtractionAlgorithm.java:134)
at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.extract(SpreadsheetExtractionAlgorithm.java:63)
at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.extract(SpreadsheetExtractionAlgorithm.java:41)
at technology.tabula.CommandLineApp$TableExtractor.extractTablesSpreadsheet(CommandLineApp.java:452)
at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:410)
at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:180)
at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:124)
at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:106)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:76)
$ java -Dfile.encoding=UTF8 -jar tabula/tabula-1.0.5-jar-with-dependencies.jar --pages 1 ~/Downloads/test_pdf_output.pdf
"","Utah Medicaid Preferred Drug List - Effective April 1, 2023"
"",Quinolones
"",Last Brand
Preferred Drugs,Status Type Limits Mandatory 3-Month Additional Note
"",Update Required
Cipro suspension,Preferred Brand 02/01/10 Cipro susp
"ciprofloxacin 250, 500, 750mg Preferred",Generic 02/01/10
levofloxacin,Preferred Generic 02/01/16
moxifloxacin,Preferred Generic 01/01/21
"",Last Required Prior Brand
Non Preferred Drugs,Status Type Limits Additional Note
"",Update Authorization Form Required
Baxdela,Non Preferred Brand 10/01/17 Medication Coverage Exception
Cipro tablet,Non Preferred Brand 02/01/10 Medication Coverage Exception
ciprofloxacin 100mg tablet,Non Preferred Generic 01/01/22 Medication Coverage Exception
ciprofloxacin suspension,Non Preferred Generic 01/01/20 Medication Coverage Exception Cipro susp
ofloxacin tablet,Non Preferred Generic 02/01/10 Medication Coverage Exception
"",Tetracyclines
"",Last Brand
Preferred Drugs,Status Type Limits Mandatory 3-Month Additional Note
"",Update Required
doxycycline monohydrate,
"",Preferred Generic 01/01/20
"50, 100mg capsule",
doxycycline hyclate,
"",Preferred Generic 01/01/20
"50, 100mg",
minocycline,
"",Preferred Generic 01/01/20
"50, 75, 100mg capsule",
"",Last Required Prior Brand
Non Preferred Drugs,Status Type Limits Additional Note
"",Update Authorization Form Required
demeclocycline,Non Preferred Generic 01/01/20 Medication Coverage Exception
Doryx,Non Preferred Brand 01/01/20 Medication Coverage Exception
doxycycline (unless listed preferred),Non Preferred Generic 01/01/20 Medication Coverage Exception
Minocin,Non Preferred Brand 01/01/20 Medication Coverage Exception
minocycline ER capsule,Non Preferred Generic 12/01/22 Medication Coverage Exception
minocycline tablet,Non Preferred Generic 01/01/20 Medication Coverage Exception
Minolira,Non Preferred Brand 01/01/20 Medication Coverage Exception
Nuzyra,Non Preferred Brand 01/01/20 Medication Coverage Exception
Solodyn,Non Preferred Brand 01/01/20 Medication Coverage Exception
tetracycline,Non Preferred Generic 01/01/20 Medication Coverage Exception
Vibramycin,Non Preferred Brand 01/01/20 Medication Coverage Exception
Ximino,Non Preferred Brand 01/01/20 Medication Coverage Exception
"",Page 11 of 111
This should hit some issues on tabula-java side.
Close as tabula-py doesn't have any workaround.
from tabula-py.
Hey @chezou,
Thanks for the quick reply, I have created a issue tabulapdf/tabula-java#529 as suggested.
from tabula-py.
Related Issues (20)
- Unable to remove note in log : Got stderr: Picked up _JAVA_OPTIONS: -Djava.awt.headless=true HOT 1
- Tabula py Ignores an entire column if it's blank and if it does not contain headerd? HOT 1
- dont ignore empty columns in tables spanning multiple pages HOT 1
- Try to install tabula-py HOT 1
- Use JPype instead of subprocess HOT 11
- Add a way to set areas for non-existent pages in template HOT 4
- Exception: RuntimeError: java.lang.UnsatisfiedLinkError: HOT 2
- cant install tabula-py on m1 mac vscode. HOT 1
- Support Python 3.12 HOT 5
- Pls add "orientation" parameter to read_pdf HOT 4
- Security vulnerability in tabula-1.0.5-jar-with-dependencies.jar HOT 4
- [BUG] Encoding still being overridden even after fix to #371. HOT 5
- FutureWarning: errors='ignore' is deprecated and will raise in a future version. HOT 3
- Unable to detect table with longer header information HOT 4
- [BUG] issue just running sample code HOT 1
- Table detection in images HOT 1
- [BUG] <FutureWarning: errors='ignore' > HOT 3
- [BUG] Error importing jpype dependencies. Fallback to subprocess. No module named 'org.apache' HOT 1
- [BUG] column parameter of read_pdf currently needs to be list, not generic iterable HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabula-py.