Code Monkey home page Code Monkey logo

simpledbf's People

Contributors

rnelsonchem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

simpledbf's Issues

Dbf5('file.dbf') AssertionError assert terminator == b'\r

Whenever I try to load a .dbf file I am getting an error.

>>> Dbf5('myfile.dbf')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\TARGET\WinPython-64bit-3.3.5.9\python-3.3.5.amd64\lib\site-packages\simpledbf\simpledbf.py", line 557, in __init__
    assert terminator == b'\r'
AssertionError
>>>

I am using

simpledbf-0.2.6.tar.gz

on

Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32

Thanks,
Andre Mikulec

Python 3.4 is a dependency?

Hi! I tried to conda install with conda install -c rnelsonchem simpledbf , and got the message:

UnsatisfiableError: The following specifications were found to be in conflict:
  - python 3.6*
  - simpledbf -> python 3.4*
Use "conda info <package>" to see the dependencies for each package.

Does it really require 3.4? Or can that dependency be omitted from the package installation recipe?

Thanks!

Dbf3 file format support

I have a lot of DBF files in Dbf3 format.
If I try to read with Dbf5 class, the records are shifted by 1 byte and the reading of records fails, so I wrote a derived class:

class Dbf3(Dbf5):
    def __init__(self, dbf, codec='utf-8'):
        super().__init__(dbf, codec)
        self.f.read(1)

then I modified the _get_recs(self, chunk=None) method to recognize the "M" memo types simply to avoid exception:

# memo fields not supported
elif typ == 'M':
    value = self._na

No records on conversion

I'm working with the National Highway Transportation and Safety Administration's Fatality Analysis Reporting Systems which reports its data in DBF format from 1975-2016. The data starting in 2010 ( ftp://ftp.nhtsa.dot.gov/fars/2010/DBF/FARS2010.zip ) is read without error and includes appropriate-looking Dbf5 attributes, but none of the conversion methods produce data.

import pandas as pd
from simpledbf import Dbf5
dbf = Dbf5('accident.dbf',codec='latin1')
print("Number of records (raw): {0:,}".format(_dbf.numrec))
# 29,867
print("Number of records (DataFrame): {0:,}".format(len(_dbf.to_dataframe())))
# 0

Using other methods like to_csv and other codec encodings produces the same behavior: no data produced. I've reproduced this issue on both Mac and PC on Anaconda 4 running Python 3.6 in Jupyter Notebook.

Strange assertion in simpledbf.py line 564

Could you explain me the meaning of this portion of code? I got this AssertionError and don't why... Is the dbf file corupt?

terminator = self.f.read(1)
assert terminator == b'\r'

utf-8 exception

Getting this error from the following code running on anaconda jupyter lab

dbf = Dbf5('partido_colorado.dbf', codec='utf-8')

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-7-56ffe8ba2867> in <module>
      1 dbf = Dbf5('partido_colorado.dbf', codec='utf-8')
----> 2 df = dbf.to_dataframe()
      3 df

C:\ProgramData\Anaconda3\lib\site-packages\simpledbf\simpledbf.py in to_dataframe(self, chunksize, na)
    314         if not chunksize:
    315             # _get_recs is a generator, convert to list for DataFrame
--> 316             results = list(self._get_recs())
    317             df = pd.DataFrame(results, columns=self.columns)
    318             del(results) # Free up the memory? If GC works properly

C:\ProgramData\Anaconda3\lib\site-packages\simpledbf\simpledbf.py in _get_recs(self, chunk)
    599                         value = self._na
    600                     else:
--> 601                         value = value.decode(self._enc)
    602                         # Escape quoted characters
    603                         if self._esc:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1: invalid start byte

Empty strings added to Dataframe as '' not NaN

Not sure what to do here. Seems that empty strings are added as '' to DataFrames and not as NaN. I think that datetime objects will have the same problem.

Not sure what the best option is here... It would be pretty easy to add the conversion. Pandas has some nice utilities for working with NaN values, and it would make them easier to find.

Unknown column type M

Please add to _get_recs typ M, which is string.
I have found this in files generated by a SIEMENS software.

Thank you.

assert terminator == b'\r'

I am using the versions:
Python 3.8.5
simpledbf 0.2.6

When I run:
from simpledbf import Dbf5
import pandas as pd

dbf = Dbf5 (b "C: \ Clascon6_Tesomatic \ data \ Companies.dbf")
db = dbf.to_dataframe ()
I get the error:
assert terminator == b '\ r'
I have read in forums that this problem has occurred to more people, but I have not found any that solve the problem

Is there any way to convert all columns in dbf files to string data type.

I have a dbf file where default date is coming as '00000000'. Is there any way to convert all columns into string before changing the format. My current code is-

import simpledbf as sdbf 
import pandas as pd
import sys
import os
path = "path\\of\\directory\\"
for file in os.listdir(path):
    if file.endswith(".DBF"):
        print(file)
        dbf1 = sdbf.Dbf5(path+file)
        df = dbf1.to_dataframe()
        df.to_csv(path+file[:-4]+'.TXT', header='0', index=None, sep='|', mode='a');

With above code I am getting error as : "ValueError: year 0 is out of range"

I am looking for below solution:

  1. Either convert data types for all columns into string before changing the format
    OR
  2. Replace the invalid date value to some default date like 01/01/1900.

@rnelsonchem any ideas how can I handle this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.