Comments (5)
Thanks again @kaskawu for reporting this issue. I've updated CleverCSV using the unix_path regex you suggested above (diving into it, that regex seemed to be the problem). I'm preparing an updated release of the package now. Thanks also @lmmentel for confirming!
from clevercsv.
Hi @kaskawu! Thanks for your interest in the package and for reporting this issue. Strangely, I have a hard time replicating your results:
$ python3 -m timeit -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
500 loops, best of 5: 721 usec per loop
and with the change you propose:
$ python3 -m timeit -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
1 loop, best of 5: 638 usec per loop
What version of the regex
package are you using?
That said, it does seem to make a massive difference on your system, so I'm certainly open to making this change. I do however want to make sure I fully understand the cause before implementing any changes. Thanks!
from clevercsv.
> pip3 freeze | grep regex
regex==2020.5.7
>
That said, I tested across multiple python versions. I tried python 3.7 and 3.8, and the slowdown only happens on 3.8:
Python 3.7:
> python3 --version
Python 3.7.7
> python3 -m timeit -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
1 loop, best of 5: 5.75 msec per loop
Python 3.8:
> python3 --version
Python 3.8.2
> python3 -m timeit -n 1 -r 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
1 loop, best of 1: 19.7 sec per loop
from clevercsv.
Wow that's very interesting! Thanks for doing some more digging. I'll take a more detailed look at this soon, hopefully I can reproduce it in someway and figure out a good solution. Thanks again for reporting it!
from clevercsv.
Same here, performance drops with python3.8
python --version
Python 3.8.1
python -m timeit -n 1 -r 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
1 loop, best of 1: 8.34 sec per loop
from clevercsv.
Related Issues (20)
- Allow avoidance of Pandas dependency HOT 1
- Add link to the ReadTheDocs page in the README and on the Pypi posting
- Add a `clevercsv.standardize(...)` function that performs the actions of the `standardize` CLI call HOT 2
- Confidence score
- Add conveniant support for stdin HOT 2
- a standardize that fixes? HOT 5
- Create a conda package HOT 1
- reading single csv line which containing multiple lines in content malfunctioned HOT 1
- can this detect the datatype (ie timestamp, int, double, string) of each column in csv? HOT 1
- delimiter detection error HOT 4
- precedence for delimiter characters? HOT 1
- Invalid abstract representation of the file with repeating newline HOT 1
- Testsuite failure with Python 3.11 HOT 9
- Please migrate away from setup.py HOT 6
- 0.7.4: pep517 build fails HOT 5
- cchardet is no longer maintained HOT 1
- 0.7.5: pytest is failing in `tests/test_unit/test_encoding.py::EncodingTestCase::test_encoding_cchardet` unit HOT 4
- A Possible Typo in the Readthedocs docs of Clevercsv package. HOT 2
- allow more rows to be checked
- Detection breaks on good file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clevercsv.