rnelsonchem / simpledbf Goto Github PK
View Code? Open in Web Editor NEWA simple DBF file converter for Python3
License: BSD 3-Clause "New" or "Revised" License
A simple DBF file converter for Python3
License: BSD 3-Clause "New" or "Revised" License
Whenever I try to load a .dbf file I am getting an error.
>>> Dbf5('myfile.dbf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\TARGET\WinPython-64bit-3.3.5.9\python-3.3.5.amd64\lib\site-packages\simpledbf\simpledbf.py", line 557, in __init__
assert terminator == b'\r'
AssertionError
>>>
I am using
simpledbf-0.2.6.tar.gz
on
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Thanks,
Andre Mikulec
Hi! I tried to conda install with conda install -c rnelsonchem simpledbf
, and got the message:
UnsatisfiableError: The following specifications were found to be in conflict:
- python 3.6*
- simpledbf -> python 3.4*
Use "conda info <package>" to see the dependencies for each package.
Does it really require 3.4? Or can that dependency be omitted from the package installation recipe?
Thanks!
I have a lot of DBF files in Dbf3 format.
If I try to read with Dbf5 class, the records are shifted by 1 byte and the reading of records fails, so I wrote a derived class:
class Dbf3(Dbf5):
def __init__(self, dbf, codec='utf-8'):
super().__init__(dbf, codec)
self.f.read(1)
then I modified the _get_recs(self, chunk=None) method to recognize the "M" memo types simply to avoid exception:
# memo fields not supported
elif typ == 'M':
value = self._na
I'm working with the National Highway Transportation and Safety Administration's Fatality Analysis Reporting Systems which reports its data in DBF format from 1975-2016. The data starting in 2010 ( ftp://ftp.nhtsa.dot.gov/fars/2010/DBF/FARS2010.zip ) is read without error and includes appropriate-looking Dbf5
attributes, but none of the conversion methods produce data.
import pandas as pd
from simpledbf import Dbf5
dbf = Dbf5('accident.dbf',codec='latin1')
print("Number of records (raw): {0:,}".format(_dbf.numrec))
# 29,867
print("Number of records (DataFrame): {0:,}".format(len(_dbf.to_dataframe())))
# 0
Using other methods like to_csv
and other codec encodings produces the same behavior: no data produced. I've reproduced this issue on both Mac and PC on Anaconda 4 running Python 3.6 in Jupyter Notebook.
Have a look at my suggested patch:
Could you explain me the meaning of this portion of code? I got this AssertionError and don't why... Is the dbf file corupt?
terminator = self.f.read(1)
assert terminator == b'\r'
Hi! Thank you for great project.
Getting this error from the following code running on anaconda jupyter lab
dbf = Dbf5('partido_colorado.dbf', codec='utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-56ffe8ba2867> in <module>
1 dbf = Dbf5('partido_colorado.dbf', codec='utf-8')
----> 2 df = dbf.to_dataframe()
3 df
C:\ProgramData\Anaconda3\lib\site-packages\simpledbf\simpledbf.py in to_dataframe(self, chunksize, na)
314 if not chunksize:
315 # _get_recs is a generator, convert to list for DataFrame
--> 316 results = list(self._get_recs())
317 df = pd.DataFrame(results, columns=self.columns)
318 del(results) # Free up the memory? If GC works properly
C:\ProgramData\Anaconda3\lib\site-packages\simpledbf\simpledbf.py in _get_recs(self, chunk)
599 value = self._na
600 else:
--> 601 value = value.decode(self._enc)
602 # Escape quoted characters
603 if self._esc:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1: invalid start byte
Not sure what to do here. Seems that empty strings are added as '' to DataFrames and not as NaN. I think that datetime objects will have the same problem.
Not sure what the best option is here... It would be pretty easy to add the conversion. Pandas has some nice utilities for working with NaN values, and it would make them easier to find.
Please add to _get_recs typ M, which is string.
I have found this in files generated by a SIEMENS software.
Thank you.
I am using the versions:
Python 3.8.5
simpledbf 0.2.6
When I run:
from simpledbf import Dbf5
import pandas as pd
dbf = Dbf5 (b "C: \ Clascon6_Tesomatic \ data \ Companies.dbf")
db = dbf.to_dataframe ()
I get the error:
assert terminator == b '\ r'
I have read in forums that this problem has occurred to more people, but I have not found any that solve the problem
I have a dbf file where default date is coming as '00000000'. Is there any way to convert all columns into string before changing the format. My current code is-
import simpledbf as sdbf
import pandas as pd
import sys
import os
path = "path\\of\\directory\\"
for file in os.listdir(path):
if file.endswith(".DBF"):
print(file)
dbf1 = sdbf.Dbf5(path+file)
df = dbf1.to_dataframe()
df.to_csv(path+file[:-4]+'.TXT', header='0', index=None, sep='|', mode='a');
With above code I am getting error as : "ValueError: year 0 is out of range"
I am looking for below solution:
@rnelsonchem any ideas how can I handle this issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.