Comments (12)
Thats great to hear :) Let me know if you need any help, feel free to reach out to me on the duckdb discord or [email protected]
from duckdb_azure.
could you try the abfs://
urls instead? the az://
once are triggering our auto-installation routing the requests through the azure extension. Alternatively, disable autoloading with set autoinstall_known_extensions=false;
from duckdb_azure.
I have managed to make it work the issue at my end was this part of the path - {storage_account_name}.blob.core.windows.net
which even the adlfs
library does not like for some reason. Not using that in the path just works fine. Thanks a lot again for all the guidance.
from duckdb_azure.
Hello,
For kotlin I'm not aware of a workaround :(
Nevertheless I have start to work on the issue but I think I will not be able to make it work until the following PR is merge
from duckdb_azure.
Hello duckdb team, nice to see that this has already been posted as an issue. I and my team would also love to have this as a feature. Just to add some context here - we are working with ETL pipelines in my company that mostly use pandas
however for some performance related reasons we have started migrating to duckdb
. We have an Azure native infra and therefore, while we could already enjoy the parquet import feature we would like to have a parquet export capability like in S3.
Just a small example,
CREATE TABLE weather (
city VARCHAR,
temp_lo INTEGER, -- minimum temperature on a day
temp_hi INTEGER, -- maximum temperature on a day
prcp REAL,
date DATE
);
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
COPY weather TO 'az://⟨my_container⟩/⟨my_file⟩.⟨parquet_or_csv⟩'
Additionally, would love to receive advice on any temporary workarounds that can enable us to write from duckdb
directly to Azure Blob (or Data Lake) Storage. Thanks 😃
from duckdb_azure.
@csubhodeep you can try using fsspec if you're on python, they should have azure support
from duckdb_azure.
@csubhodeep you can try using fsspec if you're on python, they should have azure support
ok thanks a lot. I will try it.
from duckdb_azure.
Thanks again! I tried to use the fsspec
library in conjunction with the adlfs
. TLDR - NO success
Here are what I tried:
>>> storage_account_name = "our_account"
>>> container_name = "our_container"
>>> account_creds = <our_key>
>>> duckdb.register_filesystem(filesystem('abfs', connection_string=account_creds))
>>> duckdb.sql("CREATE OR REPLACE TABLE test_table (a INTEGER, b VARCHAR(100))")
>>> duckdb.sql("INSERT INTO test_table VALUES (1, 'a'), (2, 'b'), (3, 'c')")
>>> duckdb.sql("SELECT * FROM test_table")
┌───────┬─────────┐
│ a │ b │
│ int32 │ varchar │
├───────┼─────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
└───────┴─────────┘
>>> write_query = f"COPY test_table TO 'https://{storage_account_name}.blob.core.windows.net/{container_name}/test.parquet' (FORMAT 'parquet')"
---------------------------------------------------------------------------
IOException Traceback (most recent call last)
Cell In[41], [line 2](vscode-notebook-cell:?execution_count=41&line=2)
[1](vscode-notebook-cell:?execution_count=41&line=1) # dump it as parquet
----> [2](vscode-notebook-cell:?execution_count=41&line=2) duckdb.sql(write_query)
IOException: IO Error: Cannot open file "https://<storage_account_name>.blob.core.windows.net/<container_name>/test.parquet": No such file or directory
>>> write_query = f"COPY test_table TO 'az://{storage_account_name}.blob.core.windows.net/{container_name}/test.parquet' (FORMAT 'parquet')"
---------------------------------------------------------------------------
NotImplementedException Traceback (most recent call last)
Cell In[43], [line 2](vscode-notebook-cell:?execution_count=43&line=2)
[1](vscode-notebook-cell:?execution_count=43&line=1) # dump it as parquet
----> [2](vscode-notebook-cell:?execution_count=43&line=2) duckdb.sql(write_query)
NotImplementedException: Not implemented Error: Writing to Azure containers is currently not supported
Please let me know if I am doing something wrong.
from duckdb_azure.
After trying the suggestion above, here are the results:
Exception ignored in: <function AzureBlobFile.__del__ at 0x7feb3d5a4280>
Traceback (most recent call last):
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/adlfs/spec.py", line 2166, in __del__
self.close()
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/adlfs/spec.py", line 1983, in close
super().close()
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/fsspec/spec.py", line 1932, in close
self.flush(force=True)
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/fsspec/spec.py", line 1803, in flush
if self._upload_chunk(final=force) is not False:
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/workspaces/rev_man_sys/venv/lib/python3.8/site-packages/adlfs/spec.py", line 2147, in _async_upload_chunk
raise RuntimeError(f"Failed to upload block: {e}!") from e
RuntimeError: Failed to upload block: The specifed resource name contains invalid characters.
RequestId:3545381a-d01e-0083-2346-6efd88000000
Time:2024-03-04T15:11:28.7881722Z
ErrorCode:InvalidResourceName
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidResourceName</Code><Message>The specifed resource name contains invalid characters.
RequestId:3545381a-d01e-0083-2346-6efd88000000
Time:2024-03-04T15:11:28.7881722Z</Message></Error>!
Is it more of an adlfs
issue?
from duckdb_azure.
I think you're not setting the connection string correctly, you're setting it to your key it appears.
Let's move this discussion elsewhere though as this is no longer about this issue. Please check if you're actually using fsspec correctly. If things are still wrong and it appears to be duckdb side, feel free to open an issue in duckdb/duckdb
from duckdb_azure.
Hi. I guess I am facing the same issue. I am using kotlin. Is there any workaround for this?
Thanks
from duckdb_azure.
+1 for this feature
from duckdb_azure.
Related Issues (20)
- Unable to query entire parquet directory using anonymous authenticatoin HOT 3
- Sample query to read Parquet/CSV from Azure Blob Storage HOT 4
- Performance compared to querying virtual filesystem through `rclone mount`
- Test issue
- Support for specifying token directly HOT 13
- add support for AD Service Principal client/secret HOT 2
- Segfault with extension built on Ubuntu HOT 3
- Connecting without connection string. HOT 2
- Duck DB 0.10 seems to be broken HOT 3
- Segmentation fault when copying to Azure storage HOT 1
- MainDistributionPipeline build failed in main HOT 4
- Support of hierarchical namespace (dfs) HOT 3
- Querying data from public $web container without authentication HOT 5
- AzureStorageFileSystem Directory Exists not implemented HOT 2
- Support for Device Code Flow Authentication HOT 6
- Connection timeout behind proxy network
- Cannot read using abfss from fabric lakehouse, Error while getting a connection handle. Error Code: 12005: The URL is invalid
- Support R windows_amd64_rtools build HOT 1
- MalformedJsonError due to Databricks identity column
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb_azure.