Comments (6)
Hi @akash-goel - It could probably be done. Could you describe your use case in a little more detail?
from tablite.
Hi ,
we have a user case , in which we have multiple users which are using a webapp and are trying to work on different files.
At the Server end we want to read the file , process it and share the results to the users concurrently.
when i have checked the Tablite i am not able to see where tablite is storing the data , is it storing in a different file or the same file and can we change the location of storage.
Please let me know if this usecase works with tablite.
Regards,
Akash
from tablite.
Data is stored in tmp/tablite.hdf5. This file - just like sqlite3 - can contain an infinite number of datasets (as long as you have disk space).
from tablite.
Thanks for your response ,
I have tried on small dataset , Functionality working fine but when i am trying with big dataset getting below error.
Code Block
Table.reset_storage()
t3 = Table.import_file('Data_test.csv')
t3.show()
Error
_Traceback (most recent call last):
File "<string>", line 1, in <module>
File "\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "\lib\runpy.py", line 87, in _run_code
importing: processing 'Data...est.csv': 66.99%|██████████████████████████████▏ | [00:04<00:01]
exec(code, run_globals)
File "c\\API\tablite_test.py", line 11, in <module>
t3 = Table.import_file('Data_test.csv')
File "\lib\site-packages\tablite\core.py", line 1756, in
import_file
t = reader(**config, **additional_configs)
File "\lib\site-packages\tablite\core.py", line 481, in text_reader
with TaskManager(cpu_count - 1) as tm:
File "\lib\site-packages\mplite\__init__.py", line 79, in __enter__
self.start()
File "\lib\site-packages\mplite\__init__.py", line 89, in start
worker.start()
File "\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
importing: saving 'Data...est.csv' to disk: 70.00%|████████████████████████████▋ | [00:04<00:01]_
Can you please guide what configuration we can do to load the file.
from tablite.
As noted in the config in line 21 the single processing limit is 1_000_000 rows. When the data exceeds this number of rows, tablite switches to multiprocessing.
As you are using windows, this means you need to make your module importable for the windows subprocess 1.
The easiest way to do this, is to wrap your code block in a function, such as this:
def main():
Table.reset_storage()
t3 = Table.import_file('Data_test.csv')
t3.show()
if __name__ == "__main__":
main()
from tablite.
Closing this issue as there has been no news since April 21st.
from tablite.
Related Issues (20)
- Join (reindexing) fails when table spans multiple pages HOT 2
- Documentation is out of sync HOT 1
- Determine method to handle out-of-memory for large joins. HOT 1
- Proposed format specification HOT 1
- multi proc groupby HOT 1
- multi proc join HOT 3
- Add warning in add_rows that is the slowest method HOT 1
- Deprecating support for python 3.8 in favor of type hints throughout the code HOT 1
- Columns with empty names HOT 2
- Table.load very slow with dtype('O') HOT 5
- Bloat in H5 storage following repeated SIGKILL HOT 3
- Statistics discrepancies in median/mode HOT 1
- Addition of match operator HOT 5
- HDF5 file size never decreases + concurrent interpreters can overwrite each others files. HOT 14
- sorting problem with datetime dt columns HOT 1
- Inconsistent row slice HOT 3
- Slow import of files with text escape HOT 16
- statistics() fails on time column HOT 2
- my first issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tablite.