Code Monkey home page Code Monkey logo

storage's Introduction

chartmuseum/storage

GitHub Actions status Go Report Card GoDoc

Go library providing a common interface for working across multiple storage backends.

Supported storage backends:

This code was originally part of the Helm project: ChartMuseum, but has since been released as a standalone package for others to use in their own projects.

Primary Components

Backend (interface)

Backend is a common interface that is implemented by all the supported storage backends and their associated types:

type Backend interface {
    ListObjects(prefix string) ([]Object, error)
    GetObject(path string) (Object, error)
    PutObject(path string, content []byte) error
    DeleteObject(path string) error
}

Object (struct)

Object is a struct that represents a single storage object:

type Object struct {
    Path         string
    Content      []byte
    LastModified time.Time
}

ObjectSliceDiff (struct)

ObjectSliceDiff is a struct that represents overall changes between two Object slices:

type ObjectSliceDiff struct {
    Change  bool
    Removed []Object
    Added   []Object
    Updated []Object
}

GetObjectSliceDiff (function)

GetObjectSliceDiff is a function that takes two Object slices, compares them, and returns an ObjectSliceDiff:

func GetObjectSliceDiff(prev []Object, curr []Object, timestampTolerance time.Duration) ObjectSliceDiff

Usage

Simple example

The following is a simple program that will upload a file either to an Azure Blob Storage bucket (container) or a Google Cloud Storage bucket based on the command line options provided:

// Usage: go run example.go <cloud> <bucket> <file>

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"path/filepath"

	"github.com/chartmuseum/storage"
)

type (
	Uploader struct {
		Backend storage.Backend
	}
)

func NewUploader(cloud string, bucket string) *Uploader {
	var backend storage.Backend
	switch cloud {
	case "azure":
		backend = storage.NewMicrosoftBlobBackend(bucket, "")
	case "google":
		backend = storage.NewGoogleCSBackend(bucket, "")
	default:
		panic("cloud provider " + cloud + " not supported")
	}
	uploader := Uploader{Backend: backend}
	fmt.Printf("uploader created (cloud: %s, bucket: %s)\n", cloud, bucket)
	return &uploader
}

func (uploader *Uploader) Upload(filename string) {
	basename := filepath.Base(filename)
	content, err := ioutil.ReadFile(filename)
	if err != nil {
		panic(err)
	}
	err = uploader.Backend.PutObject(basename, content)
	if err != nil {
		panic(err)
	}
	fmt.Printf("%s successfully uploaded\n", basename)
}

func main() {
	args := os.Args[1:]
	uploader := NewUploader(args[0], args[1])
	uploader.Upload(args[2])
}

Example of using to upload the file index.html to an Azure bucket:

go run example.go azure mycontainer index.html

Example of using to upload the file index.html to a Google Cloud bucket:

go run example.go google mybucket index.html

Per backend

Each supported storage backend has its own type that implements the Backend interface. All available types are described in detail on GoDoc.

In addition, authentication methods are based on the runtime environment and vary from cloud to cloud.

storage's People

Contributors

adlecluse avatar auhlig avatar auifzysr avatar cbuto avatar cgroschupp avatar choujimmy avatar clarklee92 avatar davidovich avatar denverdino avatar dependabot[bot] avatar gmauleon avatar hezhizhen avatar hoesler avatar iampastor avatar jayme-github avatar jdolitsky avatar kevinfeng avatar mingzhang-ybps avatar mkerix avatar philliphoff avatar rajiteh avatar retenodus avatar scaat avatar scbizu avatar tedgxt avatar vanto avatar warjiang avatar williamfeng323 avatar yuanjumao avatar yylt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

storage's Issues

Tencent storage time parse error

In tencent storage, there is an error of for parse time:
https://github.com/chartmuseum/storage/blob/master/tencent.go#L117

lastModified, _ := http.ParseTime(obj.LastModified)

The obj.LastModified format is time.RFC3339, such as 2019-08-29T06:58:42.000Z, we can't use http.ParseTime to parse it. We should use time.Parse(time.RFC3339, obj.LastModified) instead.

And the DebugRequestTransport should set to false, or else there will be a lot of logs:

client := cos.NewClient(baseURL, &http.Client{
	Transport: &cos.AuthorizationTransport{
		SecretID:  secretID,
		SecretKey: secretKey,
		Transport: &debug.DebugRequestTransport{
			RequestHeader: true,
			// Notice when put a large file and set need the request body, might happend out of memory error.
			RequestBody:    false,
			ResponseHeader: true,
			ResponseBody:   true,
		},
	},
})

Amazon S3 needs credentials in the configuration

Hi

While using the Amazon S3 functionality I came across following error

NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

I'm not sure what the best solution for this is, but when I include some credentials in the aws.Config object it works. But the NewAmazonS3Backend function has no parameter to pass credentials.

func NewAmazonS3Backend(bucket string, prefix string, region string, endpoint string, sse string) *AmazonS3Backend {
	service := s3.New(session.New(), &aws.Config{
		Credentials:      credentials.NewStaticCredentials("foo", "foo", ""), // <-- adding this fixes my issue
		Region:                        aws.String(region),
		Endpoint:                      aws.String(endpoint),
		DisableSSL:                    aws.Bool(strings.HasPrefix(endpoint, "http://")),
		S3ForcePathStyle:              aws.Bool(endpoint != ""),
	})
...

I think it should also be able to pass credentials to the function?
Or what steps should I take to have this implemented. I'm a newbie in Amazon related stuff and as there a multiple ways of creating credentials to connect to S3, I'm not sure what the best way is to fix this in a quite general way.

GetAllCharts takes too much time when there is a considerable amount of charts in backend storage

The time analysis of the /api/:repo/charts api as follows:

2020-03-11T19:18:55.184+0800 DEBUG [1] Incoming request: /api/datacollect/charts {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:06.957+0800 DEBUG [1] index-cache.yaml loaded {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:06.957+0800 DEBUG [1] Fetching chart list from storage {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.519+0800 DEBUG [1] start get object slice {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.531+0800 DEBUG [1] objects length {"o1": 57376, "o2": 57370, "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.531+0800 DEBUG [1] start get object slice diff {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}

2020-03-11T19:19:58.470+0800 DEBUG [1] Change detected between cache and storage {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}

Note: the bold parts are added by myself for easy analysis.

When the back-end storage (BOS) has 57376 charts, after time-consuming analysis, it takes 8 seconds to get all the files from the back-end. It takes 44s to calculate the difference between cache and back-end data by cm_storage.GetObjectSliceDiff(objects, fo.objects).

image

Is ETCD backend ready for release?

I do see the etcd support in the master. Is there going to be tagged release for that version that could be used in the chartmuseum project?

Test harness for etcd backend

Currently the etcd backend tests are failing, due to hardcoded endpoints = "https://127.0.0.1:2379"

This is somewhat different than the other backends. Ideally, we can start an etcd server (or mock server) locally, on a random port, and dynamically use this in the unit tests

Feature request: upgrade aws-sdk-go dependency to support IRSA

Amazon recently released support for IRSA (IAM Roles for Service Accounts), which is supported starting in aws-sdk-go 1.23.13.

From my understanding the IRSA workflow checks environment variables for relevant credentials. If that's true, supporting this should just require updating the aws-sdk-go version >= 1.23.13.

References

s3 storage listobject bugs

while the s3 bucket have object like this

test-0.0.1.tgz
test/consul-0.0.1.tgz

user the S3Backend list object method test

func Test_ListObjects(t *testing.T) {
        backend := NewAmazonS3Backend("a-bucket-test", "", "ap-southeast-1", "s3-ap-southeast-1.amazonaws.com", "")
	objects, err := backend.ListObjects("test")
	if err != nil {
		fmt.Println(err)
		t.Fail()
	}
	for _, obj := range objects {
		fmt.Println(obj.Path)
	}
}

while get he file result

test-0.0.1.tgz
test/consul-0.0.1.tgz

while get the repo index.yaml, will get no such key error from s3, it while get file test/test-0.0.1.tgz, but this key is not exist

Path issues for etcd backend

hi all,

I've been used etcd backend for test and ran into two issues:

  1. Newpath is generated by newpath := e.base+path, which should be use pathutil.Join like other backends

  2. In ListObjects, object.Path is assigned by kv.Key, which should remove prefix first by using removePrefixFromObjectPath, like other backends

Object missing metadata fields to store the chart metadata information

Related to chartmuseum's #22 and #220 , we still have a bug of resolving chart deletion between cache and the storage , we temporarily fix it by treating the cache one as no content chart file and parsing the delete chart version by ourselves to find the correct version to delete.

But parsing the version correctly is hard even if the stored chart version is non-semver , I raise a pr#363 to fix the issue but it can not cover all the cases. So I think parsing the delete version by ourselves maybe is not the right way to fix this issue.

The root cause of this issue is the storage.Object can not store the metadata of our chart file , if we can add the such field into the storage.Object, the issue can be fixed in the right way .

/cc @jdolitsky , if ok , I can work on another PR to fix this issue .

Request: Support atomic writes for local storage backend

It looks like there could be a risk of partially-written files with the local backend:

err = ioutil.WriteFile(fullpath, content, 0644)

That code appears to be writing directly to the real location of the file. So if the write is large and gets interrupted partway through (like chartmuseum is SIGKILL'd or if the file is copied/snapshotted by an external backup program), that could leave a partially-written file on disk.

Minio solves this problem by writing all files to a temporary directory, then atomically moving them into their real location after the write is fully complete: https://github.com/minio/minio/blob/d0862ddf866e6ac358155e3ca660f36610d8834e/cmd/fs-v1.go#L1105

This feature would allow users to backup their chartmuseum data the naive way (cping or taring all the files without having to stop/quiesce the application ahead of time).

Thoughts?

Openstack: Perpetual index regeneration

In Openstack object store, getting the last modified date from the list command return something like:

$openstack object list container --long
==> for object it shows last-modified as `2019-02-27T20:14:09.203080`

While getting the same date from a show command return:

$openstack object show container object
==> last-modified  as `Wed, 27 Feb 2019 20:14:10 GMT`

Which is somehow always rounded up to the second. The golang client show the same behavior.
So this cause the index to always be regenerated as times are always different when comparing the cache to the storage

feature: support for minio server

I love using minio as a self hosted alternative to aws s3. Seeing as chartmuseum storage makes the whole thing even easier I really would love to make a pr for minio storage interface implementation.

GitOps Storage: add support for specifying a Git provider for storage backend

TL;DR: We are proposing to add a new storage backend for GitHub/GitLab to store/mirror the charts.


GitOps is having a Git repository that always contains declarative descriptions of the infrastructure currently desired in the production environment and an automated process to make the production environment match the described state in the repository. 1

Let's assume we have a self-managed GitLab server. And we prefer not to use for the external storage in the cloud services.

Here is the idea for the GitLab:
0. GitLab Hostname: gitlab.internal.com, base group /helm-charts

  1. We want to store https://falcosecurity.github.io/charts chart in the repo called falcosecurity/charts
  2. Create a root group called /helm-charts, if not exist
  3. Create a subgroup called falcosecurity, if not exist
  4. Create a repo called charts, if not exist
  5. Force push the all files (or commits) to store

We can create custom gitlab.go and github.go providers for this if it makes sense!

cc @jdolitsky

Footnotes

  1. https://www.gitops.tech/#what-is-gitops

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.