Comments (3)
Thanks for your report, but could you show me the specific error message? other wise I can say nothing.
from tabula-py.
#Code 1:
from tabula import read_pdf
dfs=read_pdf('test.pdf', encoding='cp1254', output_format='csv')
print(dfs)
#ouput 1:
C:\Users\amal\AppData\Local\Programs\Python\Python36-32\python.exe D:/pycharm/cdsco/tabula_csv.py
Oct 03, 2017 7:12:21 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:12:26 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:12:27 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
S.No. Drug Name \
0 NaN NaN
1 1.0 Sofosbuvir 400 mg film
2 NaN coated Tablet
3 NaN NaN
4 2.0 Hydralazine tablets BP
5 NaN 25 mg (Additional
6 NaN Strength)
7 3.0 Hydralazine tablets BP
8 NaN 50 mg
9 NaN NaN
10 NaN (Additional Strength)
11 4.0 Bendamustine
12 NaN hydrochloride injection
13 NaN 90 mg/mL (Fill volume
14 NaN NaN
15 NaN 0.5 mL in 2 ml capacity
16 NaN NaN
17 NaN vial & 2 mL filled in 2
18 NaN NaN
19 NaN mL capacity vial)
20 5.0 Eltrombopagolamine
21 NaN 25 mg/50 mg tablets
22 NaN NaN
23 NaN (Additional indication)
24 NaN NaN
25 NaN NaN
26 NaN NaN
27 NaN NaN
28 NaN NaN
29 NaN NaN
30 NaN NaN
31 6.0 Azacitidine for
32 NaN injection 100 mg
33 NaN NaN
34 NaN NaN
35 7.0 Methylcobalamin 1500
36 NaN mcg orally
37 NaN disintegrating strips
38 8.0 Cefditoren Pivoxil dry
39 NaN powder for suspension
40 NaN 100 mg/5 mL
41 NaN NaN
42 NaN NaN
43 NaN NaN
44 NaN NaN
45 NaN NaN
46 NaN NaN
Indication Date of
0 NaN Approval
1 In combination with other medicinal products 16.02.2017
2 for the treatment of Chronic Hepatitis C (CHC) NaN
3 in adults NaN
4 For moderate to severe hypertension (in 23.02.2017
5 conjunction with a ?-adrenoceptor blocking NaN
6 agent or diuretic) and hypertensive crisis. NaN
7 For moderate to severe hypertension (in 23.02.2017
8 conjunction with a ?-adrenoceptor blocking NaN
9 agent or diuretic) and hypertensive crisis. NaN
10 NaN NaN
11 1. For the treatment of patients with 02.03.2017
12 chronic lymphocytic leukemia NaN
13 2. For the use in Indolent B-cell Non- NaN
14 Hodgkin’s Lymphoma (NHL) that has NaN
15 NaN NaN
16 Progressed During or Within six months NaN
17 NaN NaN
18 of treatment with Rituximab or a NaN
19 Rituximab containing Regimen. NaN
20 For the treatment of thrombocytopenia in 02.03.2017
21 paediatric patients 1 year and older with chronic NaN
22 immune(idiopathic) thrombocytopenia (ITP) NaN
23 NaN NaN
24 who have had an insufficient response to NaN
25 corticosteroids, immunoglobulins or NaN
26 splenectomy. (It should be used only in patients NaN
27 with ITP whose degree of Thrombocytopenia NaN
28 and clinical condition increase the risk for NaN
29 bleeding. It should not be used in an attempt to NaN
30 normalize platelet counts). NaN
31 For the treatment of adult patients with all sub- 07.03.2017
32 types of Myelodysplastic Syndrome NaN
33 With the condition: to be sold by retail on the NaN
34 prescription of Oncologist only NaN
35 For the treatment of Diabetic Neuropathy and 10.03.2017
36 peripheral Neuropathy NaN
37 NaN NaN
38 For the treatment of mild to moderate infections 30.03.2017
39 in adults and adolescents (12 years of age or NaN
40 older) which are caused by susceptible strains of NaN
41 the designated microorganisms in the condition NaN
42 listed below NaN
43 ? Acute Bacterial Exacerbation of NaN
44 Chronic Bronchitis NaN
45 ? Community-Acquired pneumonia NaN
46 ? Pharyngitis/Tonsillitis NaN
Process finished with exit code 0
#description:
It only prints a table from first page
#Code 2:
from tabula import read_pdf
dfs=read_pdf('test.pdf', encoding='cp1254', output_format='csv', pages='all')
print(dfs)
#output 2:
Oct 03, 2017 7:17:20 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:21 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:22 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:24 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:24 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Traceback (most recent call last):
File "D:/pycharm/cdsco/tabula_csv.py", line 2, in <module>
dfs=read_pdf('test.pdf', encoding='cp1254', output_format='csv', pages='all')
File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\tabula\wrapper.py", line 97, in read_pdf
return pd.read_csv(io.BytesIO(output), **pandas_options)
File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 411, in _read
data = parser.read(nrows)
File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1005, in read
ret = self._engine.read(nrows)
File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1748, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
File "pandas\_libs\parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)
File "pandas\_libs\parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)
File "pandas\_libs\parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 49, saw 5
Process finished with exit code 1
from tabula-py.
Use guess=False, lattice=True
options. It seems the result includes some waste columns, but it is tabula-java's problem...
In [33]: df = read_pdf('test.pdf', pages='all', encoding='shift-jis', guess=False, lattice=True)
10 03, 2017 11:04:42 午前 org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
情報: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
In [34]: df
Out[34]:
Unnamed: 0 S.No. \
0 NaN NaN
1 1.0 Sofosbuvir 400 mg film\rcoated Tablet
2 2.0 Hydralazine tablets BP\r25 mg (Additional\rStr...
3 3.0 Hydralazine tablets BP\r50 mg\r(Additional Str...
4 4.0 Bendamustine\rhydrochloride injection\r90 mg/m...
5 5.0 Eltrombopagolamine\r25 mg/50 mg tablets\r(Addi...
6 6.0 Azacitidine for\rinjection 100 mg
7 7.0 Methylcobalamin 1500\rmcg orally\rdisintegrati...
8 8.0 Cefditoren Pivoxil dry\rpowder for suspension\...
9 NaN NaN
10 9.0 Thiotepa Powder for\rconcentrate for solution\...
11 10.0 Thiotepa Powder for\rconcentrate for solution\...
12 11.0 Bortezomib for\rInjection (i.v) 1 mg/vial\r(Ad...
13 12.0 Bortezomib 3.5 mg/vial\rPowder for solution fo...
14 13.0 Mesalamine Delayed\rRelease Tablets 800 mg\r(A...
15 14.0 Teneligliptin Film\rcoated Tablet 20 mg
16 15.0 Ticagrelor 60 mg\rtablets (Additional\rstrengt...
17 NaN NaN
18 16.0 Daclatasvir Tablet 30\rmg
19 17.0 Daclatasvir Tablet 60\rmg
20 18.0 Rosuvastatin Tablets 15\rmg
21 19.0 Rosuvastatin film\rcoated Tablets 30 mg
22 NaN NaN
23 20.0 Abiraterone Acetate\rTablet 500 mg
24 21.0 Enzalutamide hard\rGelatin Capsules 40 mg
Unnamed: 2 Unnamed: 3 Drug Name \
0 NaN Approval NaN
1 In combination with other medicinal products\r... 16.02.2017 NaN
2 For moderate to severe hypertension (in\rconju... 23.02.2017 NaN
3 For moderate to severe hypertension (in\rconju... 23.02.2017 NaN
4 1.\rFor the treatment of patients with\rchroni... 02.03.2017 NaN
5 Forthetreatmentofthrombocytopeniain\rpaediatri... 02.03.2017 NaN
6 For the treatment of adult patients with all s... 07.03.2017 NaN
7 For the treatment of Diabetic Neuropathy and\r... 10.03.2017 NaN
8 For the treatment of mild to moderate infectio... 30.03.2017 NaN
9 ?\rAcute sinusitis\r?\rUncomplicated skin and ... NaN NaN
10 1.\rWithorwithouttotalbody\rirradiation(TBI),a... 06.04.2017 NaN
11 1.\rWithorwithouttotalbody\rirradiation(TBI),a... 06.04.2017 NaN
12 For the treatment of patients with mantle cell... 11.04.2017 NaN
13 For the treatment of patients with mantle cell... 11.04.2017 NaN
14 Forthetreatmentofmildtomoderateacute\rexacerba... 11.04.2017 NaN
15 For the treatment of Type 2 Diabetes Mellitus ... 28.04.2017 NaN
16 Indicatedforthepreventionofthrombotic\revents(... 02.05.2017 NaN
17 prescriptionofCardiologist/Internal\rMedicine ... NaN NaN
18 “For use with Sofosbuvir for the treatment of\... 24.05.2017 NaN
19 “For use with Sofosbuvir for the treatment of\... 24.05.2017 NaN
20 a.\rTreatmentofpatientswithprimary\rhyperchole... 30.05.2017 NaN
21 a.\rTreatmentofpatientswithprimary\rhyperchole... 30.05.2017 NaN
22 treatmentofadultpatientswith\rhypertriglycerde... NaN NaN
23 1.\rIn combination with prednisone for\rthetre... 12.07.2017 NaN
24 Forthetreatmentofadultswithmetastatic\rcastrat... 12.07.2017 NaN
Unnamed: 5 Unnamed: 6 Indication Unnamed: 8 Unnamed: 9 Date of \
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN
10 NaN NaN NaN NaN NaN NaN
11 NaN NaN NaN NaN NaN NaN
12 NaN NaN NaN NaN NaN NaN
13 NaN NaN NaN NaN NaN NaN
14 NaN NaN NaN NaN NaN NaN
15 NaN NaN NaN NaN NaN NaN
16 NaN NaN NaN NaN NaN NaN
17 NaN NaN NaN NaN NaN NaN
18 NaN NaN NaN NaN NaN NaN
19 NaN NaN NaN NaN NaN NaN
20 NaN NaN NaN NaN NaN NaN
21 NaN NaN NaN NaN NaN NaN
22 NaN NaN NaN NaN NaN NaN
23 NaN NaN NaN NaN NaN NaN
24 NaN NaN NaN NaN NaN NaN
Unnamed: 11
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
20 NaN
21 NaN
22 NaN
23 NaN
24 NaN
from tabula-py.
Related Issues (20)
- Unable to remove note in log : Got stderr: Picked up _JAVA_OPTIONS: -Djava.awt.headless=true HOT 1
- Tabula py Ignores an entire column if it's blank and if it does not contain headerd? HOT 1
- tabula-py CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', HOT 3
- dont ignore empty columns in tables spanning multiple pages HOT 1
- Try to install tabula-py HOT 1
- Use JPype instead of subprocess HOT 11
- Add a way to set areas for non-existent pages in template HOT 4
- Exception: RuntimeError: java.lang.UnsatisfiedLinkError: HOT 2
- cant install tabula-py on m1 mac vscode. HOT 1
- Support Python 3.12 HOT 5
- Pls add "orientation" parameter to read_pdf HOT 4
- Security vulnerability in tabula-1.0.5-jar-with-dependencies.jar HOT 4
- [BUG] Encoding still being overridden even after fix to #371. HOT 5
- FutureWarning: errors='ignore' is deprecated and will raise in a future version. HOT 3
- Unable to detect table with longer header information HOT 4
- [BUG] issue just running sample code HOT 1
- Table detection in images HOT 1
- [BUG] <FutureWarning: errors='ignore' > HOT 3
- [BUG] Error importing jpype dependencies. Fallback to subprocess. No module named 'org.apache' HOT 1
- [BUG] column parameter of read_pdf currently needs to be list, not generic iterable HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabula-py.