Comments (9)
FWIW I think that the system referred to above is used by systems that are on AWS machines that have an IAM role attached and use that rather than AWS secret keys in environment variables. Systems know to go and read a local available endpoint to get access tokens.
from delta-rs.
As far as I know, deltalake expects all the AWS parameters to be defined in the environment, exactly as you noted @Shershebnev.
~/.aws/*
files on Linux/Mac are managed by AWS cli, and boto3 being an AWS SDK piggybacks on it to get the profile configs. If we implement the same/similar in deltalake then we'd have to either implement a dependency on AWS cli (or some sub-component of it) which, if possible, would just create another dependency we'd have to maintain, or use the "knowledge" of the possible locations of these files but this will have to be OS file system specific, and will also create a dependency on the corresponding file format.
To summarize, I don't see this as something that should be prioritized but If there is a strong support for implementing this I can have a stab
from delta-rs.
@r3stl355 Polars for example parses these files to grab the credentials, could likely take inspiration from that implementation
from delta-rs.
Yes @ion-elgreco, looks like Polars is using the second approach I mentioned - looking into specific files config and credentials files it "knows" may exist. However, it uses hard-code paths like "~/.aws/credentials" which, I believe, will break on Windows, hence there will be a need to handle OS specific file system as I mentioned
from delta-rs.
Some of this will go away with #1601 fwiw, right now there's kind of a hodge-podge of configuration possibilities between object_store
and some of the rusoto crates we depend on
from delta-rs.
I don't see this as something that should be prioritized but If there is a strong support for implementing this I can have a stab
Speaking from a Dask perspective I'd certainly like to throw weight behind this. We certainly find that people commonly use .aws
directories in their home directories. The ROI on looking for and parsing those files is hopefully fairly high. It's a common practice. With regards to Windows machines, I'm not sure what the convention there is, but I suspect that there is a fairly similar one. Hopefully addressing that convention as well is an easy switch.
In the meantime, can I ask what mechanisms are available to specify AWS credentials? Is it just environment variables? Is there something people can do to specify these programmatically in the meantime?
from delta-rs.
In the meantime, can I ask what mechanisms are available to specify AWS credentials?
Environment variables or passing to storage_options
parameter:
>>> storage_options = {"AWS_ACCESS_KEY_ID": "THE_AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY":"THE_AWS_SECRET_ACCESS_KEY"}
>>> dt = DeltaTable("../rust/tests/data/delta-0.2.0", storage_options=storage_options)
https://delta-io.github.io/delta-rs/usage/loading-table/
from delta-rs.
Thanks for the example showing AWS_ACCESS_KEY_ID
+ AWS_SECRET_ACCESS_KEY
being used @wjones127. That was useful
The system I'm running on uses AWS_CONTAINER_CREDENTIALS_FULL_URI
for managing AWS credentials (https://docs.aws.amazon.com/sdkref/latest/guide/feature-container-credentials.html). I tried passing AWS_CONTAINER_CREDENTIALS_FULL_URI
via storage_options
but unfortunately it didn't work (I access denied errors when trying to write a deltatable). It'd be great if other authentication options like AWS_CONTAINER_CREDENTIALS_FULL_URI
were supported.
from delta-rs.
Great to see this is a recent thread - I've gone down a rabbit hole determining if IAM roles could be used in delta-rs (but looking at arrow-rs issues: apache/arrow-rs#4556 and apache/arrow-rs#4238)
I'm trying to use delta-rs using IAM role attached to ECS task and finding it very hard to believe you can't (and that you have to use AWS KEYS).
Can you confirm that you cannot use IAM roles to write delta lake tables to S3?
+1 to the points above
from delta-rs.
Related Issues (20)
- Rust Engine write_deltalake Schema HOT 3
- DELTA_FILE_PATTERN regex is incorrectly matching tmp commit files
- Add analytics to documentation page HOT 1
- Unable to append to delta table without datafusion feature HOT 1
- z_order `max_spill_size` parameter incorrectly documented HOT 1
- add option to append only a subsets of columns HOT 1
- Handling of decimals in scientific notation HOT 1
- Merging to a table with multiple distinct partitions in parallel fails HOT 3
- Unable to merge column names starting from numbers HOT 2
- Get statistics metadata HOT 4
- Release GIL in deltalake.write_deltalake HOT 12
- Partition column comparison is an assertion rather than if block with raise exception HOT 3
- DeltaLake executed Rust: write method not found in `DeltaOps` HOT 1
- Property setting in `create` is not handled correctly
- Document how use "deletedFileRetentionDuration" HOT 4
- Rust writer panics on empty record batches HOT 1
- Do not load full source into RAM on write_to_deltalake HOT 5
- Inconsistent units of time
- DeltaTable is not resilient to corrupted checkpoint state
- Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from delta-rs.