Code Monkey home page Code Monkey logo

versitygw's Issues

UploadPartCopy currently not implemented

The aws cli seems to make use of UploadPartCopy and fails with some basic tests. We need to get this one implemented.
Marking this as bug since it fails some basic cli tests, but really this is just a "needs to be implemented".

In backend interface, we have both of these:

	CopyPart(srcBucket, srcObject, DstBucket, uploadID, rangeHeader string, part int) (*types.CopyPartResult, error)
	UploadPartCopy(*s3.UploadPartCopyInput) (*s3.UploadPartCopyOutput, error)

I think these are the same thing, so we can just go with UploadPartCopy() for now, and remove CopyPart.

GetObject invalid range error

Describe the bug
When calling GetObject action with invalid range values, it has to return InvalidRange error.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api get-object --bucket text-content --key dir/my_data --range bytes=8888-9999 my_data_range

Expected behavior
It has to return InvalidRange error(http status code 416)

aws sdk put object signature mismatch

SDK 2023/06/06 11:50:22 DEBUG Request Signature:
---[ CANONICAL STRING  ]-----------------------------
PUT
/testbucket1

accept-encoding:identity
amz-sdk-invocation-id:42d3bdd9-b354-4295-9308-c7afdb277af0
amz-sdk-request:attempt=1; max=3
host:127.0.0.1:7070
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20230606T185022Z

accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
---[ STRING TO SIGN ]--------------------------------
AWS4-HMAC-SHA256
20230606T185022Z
20230606/us-east-1/s3/aws4_request
7f4a998554899ac9a010bdf73d5afa89659d374f25d711c7758f71d8f43130c3
-----------------------------------------------------
SDK 2023/06/06 11:50:22 DEBUG Request
PUT /testbucket1 HTTP/1.1
Host: 127.0.0.1:7070
User-Agent: aws-sdk-go-v2/1.17.1 os/macos lang/go/1.20.4 md/GOOS/darwin md/GOARCH/amd64 api/s3/1.29.4
Content-Length: 0
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 42d3bdd9-b354-4295-9308-c7afdb277af0
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=myaccess/20230606/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=c7d29992e5fdc027174bc2ce723fe7e48baf35102d59c79b62718ccff8fb4612
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20230606T185022Z

SDK 2023/06/06 11:50:22 DEBUG Response
HTTP/1.1 200 OK
Content-Length: 0
Date: Tue, 06 Jun 2023 18:50:22 GMT
Server: VERSITYGW

SDK 2023/06/06 11:50:22 DEBUG Request Signature:
---[ CANONICAL STRING  ]-----------------------------
PUT
/testbucket1/myobject
x-id=PutObject
accept-encoding:identity
amz-sdk-invocation-id:bdc4044b-fbec-4d45-bcea-b079961764e5
amz-sdk-request:attempt=1; max=3
content-length:1234567
content-type:application/octet-stream
host:127.0.0.1:7070
x-amz-content-sha256:55f0788794756e3c48c17af4cd4bcfab00d981c1fb8575995cbc060a001ead73
x-amz-date:20230606T185022Z

accept-encoding;amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-content-sha256;x-amz-date
55f0788794756e3c48c17af4cd4bcfab00d981c1fb8575995cbc060a001ead73
---[ STRING TO SIGN ]--------------------------------
AWS4-HMAC-SHA256
20230606T185022Z
20230606/us-east-1/s3/aws4_request
b0ea3d004e61f8d721a66736d6c3b7f84c8608720abdd01b51654958b4ad26cc
-----------------------------------------------------
SDK 2023/06/06 11:50:22 DEBUG Request
PUT /testbucket1/myobject?x-id=PutObject HTTP/1.1
Host: 127.0.0.1:7070
User-Agent: aws-sdk-go-v2/1.17.1 os/macos lang/go/1.20.4 md/GOOS/darwin md/GOARCH/amd64 api/s3/1.29.4
Content-Length: 1234567
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: bdc4044b-fbec-4d45-bcea-b079961764e5
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=myaccess/20230606/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-content-sha256;x-amz-date, Signature=688e213f08f4d6fe1b6024ae8ac79f3c259b132d58886bec9fb4278d0e3a1467
Content-Type: application/octet-stream
X-Amz-Content-Sha256: 55f0788794756e3c48c17af4cd4bcfab00d981c1fb8575995cbc060a001ead73
X-Amz-Date: 20230606T185022Z

SDK 2023/06/06 11:50:22 DEBUG Response
HTTP/1.1 403 Forbidden
Content-Length: 281
Content-Type: text/plain; charset=utf-8
Date: Tue, 06 Jun 2023 18:50:22 GMT
Server: VERSITYGW

SDK 2023/06/06 11:50:22 DEBUG request failed with unretryable error https response error StatusCode: 403, RequestID: , HostID: , api error SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.

developer documentation

Let's add a section to the wiki for developer focused docs such as how to go about adding a new backend type.

S3 Directory and file objects with the same key.

Describe the bug
put-object action does not allow to create a directory and a file objects with the same key.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api put-object --bucket my-bucket-1 --key foo/bar

aws --endpoint-url http://localhost:7070 s3api put-object --bucket my-bucket-1 --key foo/bar/xyzzy
Result: An error occurred (ExistingObjectIsDirectory) when calling the PutObject operation: Existing Object is a directory.

Expected behavior
It should allow to create both directory and file objects.

IAM service compatible with AWS

Describe the solution you'd like
Change the implementation of IAM service to make it more compatible with AWS.

  1. Remove Admin CLI and use aws iam CLI as client side admin CLI.
  2. Register proper api handlers for user access creation and deletion.

Additional context
Helpfull resources:
AWS IAM

GET request results in "wrong api call"

server side:

08:59:04 | 500 |      0s |       127.0.0.1 | GET     | /mybucket/test/file  | wrong api call

client request:

2023-06-06 08:59:02,726 - ThreadPoolExecutor-0_0 - botocore.auth - DEBUG - CanonicalRequest:
GET
/mybucket/test/file

host:127.0.0.1:7070
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20230606T155902Z

host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-06-06 08:59:02,726 - ThreadPoolExecutor-0_0 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20230606T155902Z
20230606/us-east-1/s3/aws4_request
0caccb402e885b43a57fea1e83cf49ab0a7b237e4db9c3828ad261cac5611f0c
2023-06-06 08:59:02,726 - ThreadPoolExecutor-0_0 - botocore.auth - DEBUG - Signature:
c40598048b490c7d04185e5b9d705adc5fe91b9d4b523b6baa046cad4b285089

list buckets not serializing response

[{CreationDate:2023-05-30 21:21:18.936973823 -0700 PDT Name:0xc00032efb0 noSmithyDocumentSerde:{}}]
21:33:50 | 200 |      0s |       127.0.0.1 | GET     | /               

but client gets

$ aws --endpoint-url http://127.0.0.1:7070 s3api list-buckets
{
    "Buckets": [
        {},
        {}
    ]
}

document multi-tenant capabilities

This issue is just to track that we need to update IAM service interactions in the wiki, and expected gateway behavior when running multiple user accounts.

add requestid for better request tracking

add signal handler for shutdown

We need to catch termination signals and shutdown backends cleanly. I'm filing this as a bug since there is a shutdown method on the backend that never gets called currently.

CompleteMultipartUpload ETag error

Describe the bug
When calling CompleteMultipartUpload with incorrect ETag values in parts, it doesn't return error.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R

Expected behavior
It has to return InvalidParts error and 400 as status code.

GetObject don't set all the necessary headers

Describe the bug
GetObject action does not set x-amz-tagging-count and content-range headers.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api get-object --bucket text-content --key dir/my_data --range bytes=8888-9999 my_data_range

Expected behavior
It has to set all the necessary response headers.

Investigate more scalable IAM service options

The internal IAM service is just meant to be a simple IAM service for a handful of local accounts. For a more scalable approach, we would probably want to interface with a more scalable system. We will want to maintain the internal IAM service for the simple cases with no outside dependencies.

Some ideas to investigate:
Redis cluster
etcd
keycloak
okta

We will want to make this modular as well, so that we can implement more of these as different sites have different needs. Part of this investigation will determine if the current IAM interface is sufficient for all of the implementations:

// IAMService is the interface for all IAM service implementations
type IAMService interface {
	CreateAccount(access string, account Account) error
	GetUserAccount(access string) (Account, error)
	DeleteUserAccount(access string) error
	ListUserAccounts() ([]Account, error)
}

The important one to implement would be GetUserAccount, where the others are only needed if we want to manage users through the versitygw admin API.

first pass authorization

We need to implement the put/get bucket/object ACLs for the backend. Then we need something that will parse the ACLs and decide if the request is authorized. Here is documentation on AWS ACLs:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html

Do we need to worry about policies here too?
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-iam-policies.html
https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-alternatives-guidelines.html
https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-s3-evaluates-access-control.html

The most common policy for anticipated multi-tenant use of the gateway would be a superuser creates a bucket, and then gives full access to the bucket to a specific account. Maybe this workflow can help us fast-path an MVP here.

Object ACLs

Since the first pass authorization only included Bucket ACLs, I'm opening a ticket to backlog Object ACL work. This is lower priority right now, so we will come back to this one after a few other higher priority tasks done.

Test framework

Starting to put together a test framework for server validation and performance testing.

Th initial idea is to split up tests into groups, such as "quick", "perf", "stress", etc. The framework would allow running all or a subset of tests against a running server.

There should be clear test pass/fail status, and maybe some debug output on failure for further investigation.

scoutfs backend filesystem optimizations

This is a general ticket covering the scoutfs filesystem optimizations. The backend is in place now, and the move blocks mutlipart upload optimizations are done. Next up is interfaces for the tiered filesystem workflows.

user account is allowed to make buckets

Describe the bug
This will be non-standard (from AWS) behavior, but I think in the gateway case we only want to allow admin accounts to create buckets. This will allow admins to better control use of the system. We could decide later to add an option to allow user bucket creation.

To Reproduce
create user account

$ ./versitygw admin -a user -s password create-user -a acct1 -s password1 -r user

make a bucket

$ AWS_ACCESS_KEY_ID=acct1 AWS_SECRET_ACCESS_KEY=password1 aws --endpoint-url http://127.0.0.1:7070 s3 mb s3://bucket1
make_bucket: bucket1

Expected behavior
We should return an error that prevents user from creating buckets.

LlistObjectsV2 KeyCount property

Describe the bug
ListObjectsV2 action does not return KeyCount property, which is the total number of objects inside the bucket.

To Reproduce
Create a bucket, upload an object in it. List the bucket objects with:
aws --endpoint-url http://localhost:7070 s3api list-objects --bucket my-bucket

Expected behavior
The result should include KeyCount property

ListObjects incorrect marker.

Describe the bug
ListObjects takes incorrect marker as an argument. Now it takes continuation-token query arg as marker, instead it has to be "marker" query arg.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api list-objects --bucket my-bucket --marker mrk

Expected behavior
It has to take the correct marker value.

list buckets from user account gets error

Describe the bug
A user level account is unable to list their buckets. Only admin is able to list buckets.

To Reproduce
create user account

$ ./versitygw admin -a user -s password create-user -a acct1 -s password1 -r user

issue list buckets

$ AWS_ACCESS_KEY_ID=acct1 AWS_SECRET_ACCESS_KEY=password1 aws --endpoint-url http://127.0.0.1:7070 s3 ls s3://

An error occurred (InternalError) when calling the ListBuckets operation (reached max retries: 2): We encountered an internal error, please try again.

server side error:

2023/07/21 09:08:59 Internal Error, only admin users have access to this resource
09:08:58 | 500 |      0s |       127.0.0.1 | GET     | /               

Expected behavior
A user issuing list buckets should get a list of their buckets.

CompleteMultipartUpload empty upload

Describe the bug
When calling CompleteMultipartUpload with empty multipart-upload, it has to return MalformedXML error.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api complete-multipart-upload --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF

Expected behavior
It has to return MalformedXML error and 400 as http status code.

internal server error log

The internal server errors usually mean something happened server side that an admin would like to know about. We should consider an internal serve error log so that we can monitor for when these types of errors happen.

Remove GetObject args etag

I think the etag should be retrieved from the backend and sent back in the response. So we probably don't want this as a arg passed to GetObject.

ListObjectsV2/ListObjects should ignore hidden dot files

Describe the bug
ListObjectsV2/ListObjects actions return .DsStore files as s3 objects on MacOs, which is a file created in every directory, containing some metadata about the folder.

To Reproduce
aws --endpoint-url http://localhost:7070 s3api list-objects-v2 --bucket my-bucket
run in the bucket which contains directory objects.

Expected behavior
It has to skip these files as s3 objects.

quickstart documentation

Once we have the ability to start running the command, we should document a quickstart guide in the wiki.

gateway metrics

This issue is just to capture the feature for general metrics. This should be something that can track request rates, throughput, etc. At a minimum we should allow configuring a statsd endpoint, but like everything else we may want to allow configuration of different types of metrics endpoints.

first pass authentication

Let's see if we can re-use the aws sdk v4 auth signer that already exists:
https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/aws/signer/v4#Signer.PresignHTTP

The PresignHTTP does not modify the request, so we should be able to pass the incoming request to the PresignHTTP and validate signatures match.

Get the account from the request headers, and use the secret key from the GetIAMConfig() IAM service interface. We may want to store the full aws.Credentials instead of just secret key from the IAM interface.
https://pkg.go.dev/github.com/aws/[email protected]/aws#Credentials

Be careful of special cases such as:
The payloadHash is the hex encoded SHA-256 hash of the request payload, and must be provided. Even if the request has no payload (aka body). If the request has no payload you should use the hex encoded SHA-256 of an empty string as the payloadHash value.
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Some services such as Amazon S3 accept alternative values for the payload hash, such as "UNSIGNED-PAYLOAD" for requests where the body will not be included in the request signature.

We should have an admin account defined outside of the IAM config so that the gateway can be useful without adding accounts (mainly useful for testing and quickstart, etc).

The rest of the accounts will come from GetIAMConfig() from an IAM service interface.

Authorization is done through ACL checks, and is out of scope for this feature.

audit logs

We need to think about a way to do audit logging. I think we will want flexibility here so that the user can select if they want:

  • none
  • syslog
  • webhooks
  • other?

And we need to decide how much detail goes into the audit logs. Probably almost everything about the incoming requests and the return status.

v4 auth signature fails in some cases

We need to add support for
UNSIGNED-PAYLOAD
STREAMING-UNSIGNED-PAYLOAD-TRAILER
STREAMING-AWS4-HMAC-SHA256-PAYLOAD
STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER
STREAMING-AWS4-ECDSA-P256-SHA256-PAYLOAD
STREAMING-AWS4-ECDSA-P256-SHA256-PAYLOAD-TRAILER

see: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html

also, sometimes I am finding that the authorization header has spaces, and other times not, for example:

Authorization: AWS4-HMAC-SHA256 Credential=user/20230701/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=0c5624bf2dcd65e02a3a9e90f7e4ee0bdde51a4851613d467659e1b5a9dfe0e3
Authorization: AWS4-HMAC-SHA256 Credential=user/20230701/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length,Signature=c79650f388295ef8dc7e82158a1134c50483814cc857e4f5b70ffd748c28d8aa

head object doesnt handle attributes map

21:30:41 | 500 |      0s |       127.0.0.1 | HEAD    | /mybucket/file.prof  | xml: unsupported type: map[string]string
$ aws --endpoint-url http://127.0.0.1:7070 s3api head-object --bucket mybucket --key file.prof

An error occurred (500) when calling the HeadObject operation (reached max retries: 2): Internal Server Error

posix: list objects not honoring prefix

$ aws --endpoint-url http://127.0.0.1:7070 s3api list-objects --bucket mybucket --prefix file1
{
    "Contents": [
        {
            "Key": "file.prof",
            "LastModified": "2023-05-30T21:21:18.936778-07:00",
            "ETag": "",
            "Size": 132533,
            "StorageClass": ""
        }
    ]
}

should not be listing any objects that dont match prefix

posix: temp file naming not correct

For the fallback openTmpFile, we are using os.CreateTemp(). The current args are not correct though. The file can't have a path separator, the directory must exist, and the file shouldn't end with a newline. We should also have the name be . with the . between name and random strong.

server logging

This is different than audit logs which is a separate issue. This one will track server logging such as access, error, etc logs like would be typical from http services.

Do we need this to be modular? So we can plug into other types of logging services?
Should we use syslog when available?
Allow enable/disable of logging?

cleanup backend interface args

We have some interface functions that take a bunch of args, and others that just get the s3 arg type. We need to just standardize on the s3 types if possible. This will allow us to pass all the info to the backends and not have to keep changing the interface as we add functionality.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.