rnelsonchem / simpledbf Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 17.0 45 KB

A simple DBF file converter for Python3

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

simpledbf's People

Contributors

Stargazers

Watchers

Forkers

sarasafavi isaac1989 oztalha miker985 marcusj niejn jakeb xoristzatziki alexanderluiscampino cindy-silva ap-codkelden geertijewski ba45 adam-trevo jdelucca

simpledbf's Issues

Dbf5('file.dbf') AssertionError assert terminator == b'\r

Whenever I try to load a .dbf file I am getting an error.

>>> Dbf5('myfile.dbf')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\TARGET\WinPython-64bit-3.3.5.9\python-3.3.5.amd64\lib\site-packages\simpledbf\simpledbf.py", line 557, in __init__
    assert terminator == b'\r'
AssertionError
>>>

I am using

simpledbf-0.2.6.tar.gz

Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32

Thanks,
Andre Mikulec

Python 3.4 is a dependency?

Hi! I tried to conda install with conda install -c rnelsonchem simpledbf , and got the message:

UnsatisfiableError: The following specifications were found to be in conflict:
  - python 3.6*
  - simpledbf -> python 3.4*
Use "conda info <package>" to see the dependencies for each package.

Does it really require 3.4? Or can that dependency be omitted from the package installation recipe?

Thanks!

Dbf3 file format support

I have a lot of DBF files in Dbf3 format.
If I try to read with Dbf5 class, the records are shifted by 1 byte and the reading of records fails, so I wrote a derived class:

class Dbf3(Dbf5):
    def __init__(self, dbf, codec='utf-8'):
        super().__init__(dbf, codec)
        self.f.read(1)

then I modified the _get_recs(self, chunk=None) method to recognize the "M" memo types simply to avoid exception:

# memo fields not supported
elif typ == 'M':
    value = self._na

I'm working with the National Highway Transportation and Safety Administration's Fatality Analysis Reporting Systems which reports its data in DBF format from 1975-2016. The data starting in 2010 ( ftp://ftp.nhtsa.dot.gov/fars/2010/DBF/FARS2010.zip ) is read without error and includes appropriate-looking Dbf5 attributes, but none of the conversion methods produce data.

import pandas as pd
from simpledbf import Dbf5
dbf = Dbf5('accident.dbf',codec='latin1')
print("Number of records (raw): {0:,}".format(_dbf.numrec))
# 29,867
print("Number of records (DataFrame): {0:,}".format(len(_dbf.to_dataframe())))
# 0

Using other methods like to_csv and other codec encodings produces the same behavior: no data produced. I've reproduced this issue on both Mac and PC on Anaconda 4 running Python 3.6 in Jupyter Notebook.

Fails on date records with 0/0/0

Have a look at my suggested patch:

marcusj#1

Strange assertion in simpledbf.py line 564

Could you explain me the meaning of this portion of code? I got this AssertionError and don't why... Is the dbf file corupt?

terminator = self.f.read(1)
assert terminator == b'\r'

Please modify the 'to_pandassql' method to accept sqlalchemy 'create_engine' optional parameters like 'encoding', 'convert_unicode' and others

Hi! Thank you for great project.

utf-8 exception

Getting this error from the following code running on anaconda jupyter lab

dbf = Dbf5('partido_colorado.dbf', codec='utf-8')

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-7-56ffe8ba2867> in <module>
      1 dbf = Dbf5('partido_colorado.dbf', codec='utf-8')
----> 2 df = dbf.to_dataframe()
      3 df

C:\ProgramData\Anaconda3\lib\site-packages\simpledbf\simpledbf.py in to_dataframe(self, chunksize, na)
    314         if not chunksize:
    315             # _get_recs is a generator, convert to list for DataFrame
--> 316             results = list(self._get_recs())
    317             df = pd.DataFrame(results, columns=self.columns)
    318             del(results) # Free up the memory? If GC works properly

C:\ProgramData\Anaconda3\lib\site-packages\simpledbf\simpledbf.py in _get_recs(self, chunk)
    599                         value = self._na
    600                     else:
--> 601                         value = value.decode(self._enc)
    602                         # Escape quoted characters
    603                         if self._esc:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1: invalid start byte

Empty strings added to Dataframe as '' not NaN

Not sure what to do here. Seems that empty strings are added as '' to DataFrames and not as NaN. I think that datetime objects will have the same problem.

Not sure what the best option is here... It would be pretty easy to add the conversion. Pandas has some nice utilities for working with NaN values, and it would make them easier to find.

Unknown column type M

Please add to _get_recs typ M, which is string.
I have found this in files generated by a SIEMENS software.

Thank you.

assert terminator == b'\r'

I am using the versions:
Python 3.8.5
simpledbf 0.2.6

When I run:
from simpledbf import Dbf5
import pandas as pd

dbf = Dbf5 (b "C: \ Clascon6_Tesomatic \ data \ Companies.dbf")
db = dbf.to_dataframe ()
I get the error:
assert terminator == b '\ r'
I have read in forums that this problem has occurred to more people, but I have not found any that solve the problem

Is there any way to convert all columns in dbf files to string data type.

I have a dbf file where default date is coming as '00000000'. Is there any way to convert all columns into string before changing the format. My current code is-

import simpledbf as sdbf 
import pandas as pd
import sys
import os
path = "path\\of\\directory\\"
for file in os.listdir(path):
    if file.endswith(".DBF"):
        print(file)
        dbf1 = sdbf.Dbf5(path+file)
        df = dbf1.to_dataframe()
        df.to_csv(path+file[:-4]+'.TXT', header='0', index=None, sep='|', mode='a');

With above code I am getting error as : "ValueError: year 0 is out of range"

I am looking for below solution:

Either convert data types for all columns into string before changing the format
OR
Replace the invalid date value to some default date like 01/01/1900.

@rnelsonchem any ideas how can I handle this issue.