pravega / pravega-operator Goto Github PK

Pravega Kubernetes Operator

License: Apache License 2.0

Go 97.21% Shell 1.65% Dockerfile 0.37% Makefile 0.77%

pravega-operator's Introduction

Pravega Operator

Overview

Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.

The Pravega Operator manages Pravega clusters deployed to Kubernetes and automates tasks related to operating a Pravega cluster.The operator itself is built with the Operator framework.

Project status

The project is currently beta. While no breaking API changes are currently planned, we reserve the right to address bugs and change the API before the project is declared stable.

Quickstart

For setting up pravega quickly, check our complete Pravega Installation script. For installing pravega on minikube,refer to minikube.

Install the Operator

To understand how to deploy a Pravega Operator refer to Operator Deployment.

Upgrade the Operator

For upgrading the pravega operator check the document Operator Upgrade.

Features

Development

Check out the development guide.

Releases

The latest Pravega operator releases can be found on the Github Release project page.

Contributing and Community

We thrive to build a welcoming and open community for anyone who wants to use the operator or contribute to it. Here we describe how to contribute to pravega operator.Contact the developers and community on slack (signup) if you need any help.

Troubleshooting

Check out the pravega troubleshooting for pravega issues and for operator issues operator troubleshooting.

License

Pravega Operator is under Apache 2.0 license. See the LICENSE for details.

pravega-operator's People

Contributors

Stargazers

Watchers

Forkers

aparnarr fqhuang maddisondavid fpj raulgracia srisht liangniao prabhaker24 anishakj ldr7 shelestgadzy andreykoltsov1997 kevinhan88 laashub-soa veronikakochugova ranjan-padhi mmll redstarxz isabella232 guanzx tonyc2016 leapsky fedor-chemashkin realaaronwu craneshiemc manishkumarkeshri dedalozzo nishant-yt suyambuganesh82 derekm rwx42 alfred-landrum tmarcu amit-singh40 luis-sousa-pinto amit-kumar59 rohankumardubey vedanthh

pravega-operator's Issues

Add Travis build

Operator crashes if Spec is invalid

If the PravegaCluster resource spec is invalid the operator crashes. It should handle invalid specifications without crashing.

For example, the following specification will crash the operator because the Options expects map[string]string and instead numbers have been specified

apiVersion: "pravega.pravega.io/v1alpha1"
kind: "PravegaCluster"
metadata:
  name: "nautilus"
  namespace: {{ .Values.pravegaNamespace }}
spec:
  zookeeperUri: nautilus-pravega-zookeeper-client:2181

  bookkeeper:
    image:
      repository: pravega/bookkeeper
      tag: 0.3.0
      pullPolicy: IfNotPresent

    replicas: 3

    storage:
      ledgerVolumeClaimTemplate:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "standard"
        resources:
          requests:
            storage: 10Gi

      journalVolumeClaimTemplate:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "standard"
        resources:
          requests:
            storage: 10Gi

    autoRecovery: true

  pravega:
    controllerReplicas: 1
    segmentStoreReplicas: 3

    cacheVolumeClaimTemplate:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 20Gi

    image:
      repository: pravega/pravega
      tag: 0.3.0
      pullPolicy: IfNotPresent

    options:
      metrics.enableStatistics: "true"
      metrics.statsdHost: telegraf
      metrics.statsdPort: 8125

    tier2:
      filesystem:
        persistentVolumeClaim:
          claimName: nautilus-pravega-tier2

Results in the operator crashing whilst parsing the options, which should be strings:

    options:
      metrics.enableStatistics: "true"
      metrics.statsdHost: "telegraf"
      metrics.statsdPort: "8125"

Add OpenAPI Schema Validation

Add a validation section to the CRD definition to validate the Custom Resources

References:

Ensure data protection (DU/DL)

Operator should ensure that components (e.g. bookies) are deployed in a reasonable way from a data protection perspective. Kubernetes has features including anti-affinity, multi-zone, and pod disruption budgets.

Achieving adequate data protection likely has deployment- and upgrade-time considerations, e.g. anti-affinity to prevent DL, pod disruption budget to prevent DU during upgrade/maintenance. Note that Kubernetes has support for "planned" node maintenance operations, e.g. kubectl drain (ref). Such operations respect the pod disruption budget.

For a conceptual overview, please read the Disruptions section of the Kubernetes documentation.

Instructions and conditions for contributing to the project

Need to add a CONTRIBUTING.md with the details on how to contribute to the project and a Developer Certificate of Origin (DCO) file.

Add CONTRIBUTING.md
Configure DCO

Push docker image to docker hub

We are going to open pravega-operator to public, need move docker images from the private location to a well known place for Pravega community.

Wait for Zookeeper to become available

Pravega has the option to WAIT_FOR zookeeper to be available, all components of the cluster should use this.

Admission Controller for PravegaCluster resource

Currently the resource contains the Bookkeeper and Pravega image definitions. The operator should have sensible defaults set in its configuration and apply these to the resource via a mutating Admission Controller.

Add support for multiple Bookie ledger directories

Bookkeeper supports writing to multiple ledger directories for better performance. The ledgers should each be a separate volume and supplied to bookie with ledgerDirs which is a comma separated list of mounted directories.

Support Pravega upgrades

As new Pravega versions are released, users will want to upgrade to a newer version. The operator should provide support for such upgrades with minimal to no disruption.

Handle Go errors correctly

The ReconcilePravegaCluster() function does not seem to handle Go errors correctly:

deployBookie(pravegaCluster)
if err != nil {
	return err
}

err in this case is never going to be set to an error if it occurs from deployBookie(). We should change that first line in this scenario to err = deployBookie(pravegaCluster).

Pravega IO workload failed when Pravega Controller end point is accessed from outside cluster using 'kubectl' port forward

Running Pravega-Benchmark IO tool on Pravega controller end point through 'kubectl' port forwarding is failing. It may be due to the pravega-segment-store end points are not being forwarded through 'kubectl'.

However the expectation here is, client should be able to access Pravega by using Pravega Controller end point from outside of the K8 cluster. For example, if customer wants to process or do analytics on their existing IO workload and IOT sensors which is outside of the k8 environement then is there a way to achieve that on pravega over k8 cluster.

[root@rhel ~]# kubectl port-forward -n default pravega-pravega-controller-7489c9776d-vdqxl 9090:9090 10080:10080
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from 127.0.0.1:10080 -> 10080
Handling connection for 9090
Handling connection for 9090

Error logs:
[root@rhel pravega-benchmark]# ./pravega-benchmark/bin/pravega-benchmark  --controller tcp://127.0.0.1:9090 --stream StreamName-r1  -producers 1 --size 1000  -eventspersec 300000 --runtime 60 --randomkey true  --writeonly true
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] INFO io.pravega.client.stream.impl.ControllerImpl - Controller client connecting to server at 127.0.0.1:9090
[StreamManager-Controller-1] INFO io.pravega.client.stream.impl.ControllerResolverFactory - Updating client with controllers: [[addrs=[/127.0.0.1:9090], attrs={}]]
[grpc-default-executor-0] WARN io.pravega.client.stream.impl.ControllerImpl - Scope already exists: Scope
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] INFO io.pravega.client.stream.impl.ControllerImpl - Controller client connecting to server at 127.0.0.1:9090
[main] INFO io.pravega.client.admin.impl.StreamManagerImpl - Creating scope/stream: Scope/StreamName-r1 with configuration: StreamConfiguration(scope=Scope, streamName=StreamName-r1, scalingPolicy=ScalingPolicy(scaleType=FIXED_NUM_SEGMENTS, targetRate=0, scaleFactor=0, minNumSegments=1), retentionPolicy=null)
[grpc-default-executor-0] WARN io.pravega.client.stream.impl.ControllerImpl - Stream already exists: StreamName-r1
[pool-1-thread-1] INFO io.pravega.client.stream.impl.ControllerResolverFactory - Updating client with controllers: [[addrs=[/127.0.0.1:9090], attrs={}]]
Current segments of the stream: StreamName-r1 = 1
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] INFO io.pravega.client.stream.impl.ControllerResolverFactory - Shutting down ControllerNameResolver
[main] INFO io.pravega.client.stream.impl.ClientFactoryImpl - Creating writer for stream: StreamName-r1 with configuration: EventWriterConfig(initalBackoffMillis=1, maxBackoffMillis=20000, retryAttempts=10, backoffMultiple=10, transactionTimeoutTime=29999)
[main] INFO io.pravega.client.stream.impl.SegmentSelector - Refreshing segments for stream StreamImpl(scope=Scope, streamName=StreamName-r1)
[clientInternal-1] INFO io.pravega.client.segment.impl.SegmentOutputStreamImpl - Fetching endpoint for segment Scope/StreamName-r1/3.#epoch.1, writerID: b9782b6b-ca3f-472c-81bd-2413d69a7887
[clientInternal-1] INFO io.pravega.client.segment.impl.SegmentOutputStreamImpl - Establishing connection to PravegaNodeUri(endpoint=10.32.0.8, port=12345) for Scope/StreamName-r1/3.#epoch.1, writerID: b9782b6b-ca3f-472c-81bd-2413d69a7887
[epollEventLoopGroup-4-1] INFO io.pravega.client.netty.impl.ClientConnectionInboundHandler - Connection established ChannelHandlerContext(ClientConnectionInboundHandler#0, [id: 0xa981360b])
[epollEventLoopGroup-4-1] WARN io.pravega.client.netty.impl.ClientConnectionInboundHandler - Keep alive failed, killing connection 10.32.0.8 due to DefaultChannelPromise@31ffd58e(uncancellable)
[epollEventLoopGroup-4-1] WARN io.pravega.client.segment.impl.SegmentOutputStreamImpl - b9782b6b-ca3f-472c-81bd-2413d69a7887 Failed to connect:
java.util.concurrent.CompletionException: io.pravega.shared.protocol.netty.ConnectionFailedException: java.nio.channels.ClosedChannelException
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1088)
        at java.util.concurrent.CompletableFuture$BiApply.tryFire(CompletableFuture.java:1070)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at io.pravega.client.netty.impl.ConnectionFactoryImpl$2.operationComplete(ConnectionFactoryImpl.java:166)
        at io.pravega.client.netty.impl.ConnectionFactoryImpl$2.operationComplete(ConnectionFactoryImpl.java:155)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
        at io.netty.channel.epoll.AbstractEpollChannel.doClose(AbstractEpollChannel.java:163)
        at io.netty.channel.epoll.AbstractEpollStreamChannel.doClose(AbstractEpollStreamChannel.java:686)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:763)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:740)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:611)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1301)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465)
        at io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:973)
        at io.netty.channel.AbstractChannel.close(AbstractChannel.java:238)
        at io.pravega.client.netty.impl.ClientConnectionInboundHandler.close(ClientConnectionInboundHandler.java:164)
        at io.pravega.client.netty.impl.ClientConnectionInboundHandler$KeepAliveTask.run(ClientConnectionInboundHandler.java:195)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:126)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.pravega.shared.protocol.netty.ConnectionFailedException: java.nio.channels.ClosedChannelException
        ... 34 more
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.channel.epoll.AbstractEpollChannel.doClose()(Unknown Source)
[epollEventLoopGroup-4-1] ERROR io.pravega.shared.protocol.netty.ExceptionLoggingHandler - Uncaught exception on connection 10.32.0.8
java.nio.channels.ClosedChannelException

Note:-
Here, 10.32.0.8 IP address belongs to pravega-segment-store-2 where Pravega-Benchmark client tries to access segment store directly from outside the cluster and eventually fails.

Add extra options to configurations

The PravegaCluster resource allows extra options to be specified for both Bookkeeper and Pravega, however these are currently NOT added to the configuration ConfigMaps.

Headless service is needed by Bookies

Since we are using hostname as bookieID, headless service is a must have

Allow k8 service account to be configured for the PravegaCluster resource

In some environments, the various components of the Pravega cluster may need to run under specific K8 service accounts. Such service accounts would have different types of secrets, permissions and/or annotations.

Two components of interest are the controller and segment stores. Webhooks may attach different secrets to the service accounts associated with the pods running these components. Pravega's auth plugin implementations will often require a way to inject credentials related data.

Taking as an example an implementation where the controller has a plugin that can validate an oAuth token, the controller will need some configuration injected related to the issuer. The segment store, will need some secret/configuration in order to obtain a token when talking to the controller.

All this can be facilitated with opaque (to the operator) annotations on the K8 service accounts that get associated with the pods. What we need here is a way to specify which K8 service account need to be associated with the controller and the segment store.

Something along the lines of:

...
pravega:
   controller-service-account: foo
   segment-store-service-account: bar
...

The chart deploying the PravegCluster resource will be responsible for requesting foo and bar. But the operator should responsible for settings those appropriate service account names to the correponding pods.

Support cluster rescaling

Do the needful to support scale changes for the various components (e.g. bookies, segment stores).

Scale down might not be supported by all components, or may require special attention. For example, it may be necessary to drain the tier-1 storage when tearing down a given bookie. Similar functionality may be implied by #60.

Zookeeper resource structure

Proposal for initial zookeeper resource structure:

Zookeeper should be defined as a StatefulSet with Read Write Once (RWO) volumes identified for each Pod. A single zookeeper service provides access to the cluster.

Update ReadMe with further information

The ReadMe file needs to be updated with the following:

References to Zookeeper Operator removed
Specify that an instance of Zookeeper 3.5 must be installed (perhaps suggest the Zookeeper Operator)
An example of using kubectl get all -l apps=example to show the running resources

Update the ZooKeeper operator reference

Update the ZooKeeper operator reference when our own operator is released.

Add SegmentStore RocksDB cache Volume

Segmentstore uses a RocksDB cache that should be on a dedicated fast Volume. The volume is ephemeral and not required if the process is restarted.

From Pravega Configuration (https://github.com/pravega/pravega/blob/master/config/config.properties#L347):

#Path to the working directory where RocksDB can store its databases. The contents of this folder can be discarded after
#the process exits (and it will be cleaned up upon startup), but Pravega requires exclusive use of this while running.
#Recommended values: a path to a locally mounted directory that sits on top of a fast SSD.
#rocksdb.dbDir=/tmp/pravega/cache

Automatically create a PersistentVolumeClaim for tier-2 storage

To reduce the amount of manual work needed to deploy a cluster, the operator should automatically create a claim for tier-2 based on a template. The current implementation relies on an existing claim. Seems useful to support both options.

The claim would always be of ReadWriteMany. The storageClassName is the more important input, in addition to resources.

The template approach will combine nicely with CSI plugins, e.g. gcp-filestore-csi-driver, which automatically create PVs given PVCs.

Update operator to use Pravega 0.3.2 release

We will need update operator to use Pravega 0.3.2 (coming release) release and have regression tests done for Pravega-operator.

Support for `PravegaCluster` resource status

Report the high-level status / conditions via the status section of the custom resource. A key scenario is to enable Helm to --wait on the deployment of the cluster.

Ideally the status would accurately indicate whether Pravega is ready to create streams. Note that the Pravega controller startup routine involves more than exposing a TCP endpoint; some system streams are created before Pravega is truly ready. Could the true readiness be detected somehow?

The solution will involve adding a status section to the CRD, and using the UpdateStatus method.

Note that the status section is expected to have some well-known elements, e.g. conditions. The word conditions has a meaning like "weather conditions" not "conditional logic". There is a mindful design underlying this (example).

Controller resource structure

Proposal for initial controller resource structure:

Controller should be defined as either a Deployment or ReplicaSet. A single controller service provides access to the cluster.

Bookkeeper resource structure

Proposal for initial bookkeeper resource structure:

Bookkeeper should be defined as a StatefulSet with Read Write Once (RWO) volumes identified for each Pod. A single bookkeeper service provides access to the cluster.

Update Pravega version to 0.3.2

As v0.3.2 has been released, we should update our reference to use the latest version.

Pravega-operator Document structure planning

Will decide on the list of docs required

Identify Pravega top-level structure

The Pravega project is made up of a number of components, which will each be defined as independent Kubernetes resources. For each component, we must evaluate both the appropriate resource type, resource requirements (connectivity, volumes, configuration, etc.) and inter-dependencies.

Configure Bookie metrics

The operator should support bookie metrics, notably graphite (as provided by codahale) and prometheus.

Segment store resource structure

There are a few open questions regarding the proper design for the segment store resources:

Does the segment store require a persistent volume, or should the local disk be considered ephemeral?
Should the Bookkeeper be included within the segment store pod? If these are always expected to scale in concert, then it would make sense to keep them together.
Segment stores find the bookkeeper instances via Zookeeper. Does each segment store write directly to the associated Bookee?

Validate that HDFS Tier 2 storage is working correct

Validate that when Pravega is setup with Tier2 HDFS storage that everything works as expected.

Add the support for the replica of bookkeeper and pravega-segmentstore updating

To support update the replica ( bookkeeper and pravega-segmentstore) by "kubectl apply -f cr.yaml“， the operator need add the related updating logic.

Handle Resource Deletions

Currently the Operator only handles creation of resources when a PravegaCluster resource is created. It should also handle the PravegaCluster resource being deleted and destroy all corresponding Kubernetes resources for that PravegaCluster resource

Add LICENSE information

Before opening up the repo we'll need to upload a LICENSE file. I guess it'll be Apache 2, but I would like to hear from @spiegela @maddisondavid @mcgG @fpj.

Operator versioning strategy

The operator will use SemVer for versioning. The initial release will be 0.1.0, with the following meaning:

0: indicates initial development. Anything may change at any time. The public API should not be considered stable.
1: first set of features.
0: no bug fixes yet.

After each release, we will immediately append the +git suffix to the version number to indicate that there are changes on top of that version. E.g. after the first release, we will change the version to 0.1.0+git. This is what the Operator SDK and other operators are also doing.

Fixing formatting issues in README.md

Error trying to run the operator locally

I'm trying to run the operator locally as instructed in the operator-sdk user guide using the operator-sdk up local command. However, the operators panics and outputs the following log:

$ operator-sdk up local
INFO[0000] Go Version: go1.11                                                                                                       
INFO[0000] Go OS/Arch: linux/amd64                                                                                                  
INFO[0000] operator-sdk Version: 0.0.5+git                                                                                          
INFO[0000] Watching pravega.pravega.io/v1alpha1, PravegaCluster, default, 5                                                        
panic: No Auth Provider found for name "gcp"

goroutine 1 [running]:
github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes/typed/admissionregistration/v1alpha1.NewForConfigOrDie(0xc0001ef0e0, 0xc0002b82d0)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes/typed/admissionregistration/v1alpha1/admissionregistration_client.go:58 +0x65                                                               
github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes.NewForConfigOrDie(0xc0001ef0e0, 0x0)                        
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes/clientset.go:529 +0x49
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.mustNewKubeClientAndConfig(0x56, 0xc00018bc20, 0xe83800)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:138 +0x68
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.newSingletonFactory()          
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:52 +0x34
sync.(*Once).Do(0x1b9d370, 0x114b4b8)
        /home/adrian/.gvm/gos/go1.11/src/sync/once.go:44 +0xb3
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.GetResourceClient(0x10e5c3e, 0x1b, 0x10da599, 0xe, 0xc00003e450, 0x7, 0xc000429380, 0xc00018be10, 0xe8268e, 0xc0000e14f0, ...)                                     
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:70 +0x3d
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.Watch(0x10e5c3e, 0x1b, 0x10da599, 0xe, 0xc00003e450, 0x7, 0x12a05f200, 0x0, 0x0, 0x0)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/api.go:45 +0x84
main.main()
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/cmd/pravega-operator/main.go:33 +0x287     
exit status 2
Error: failed to run operator locally: exit status 1

I made a quick research and found some sources that suggest importing the gcp from the k8s.io/client-go/plugin/pkg/client/auth/gcp package in the main.go file. I'll give it a try and submit a PR if it fixes the issue.

Allow Authorization to be configured

Currently the operator configures Pravega with authorization turned off. This should be modified to allow authorization to be configured and turned on.

An example from Pravega controller configuration:

pravega-operator/pkg/pravega/pravega_controller.go

Lines 114 to 125 in 2fd3108

    
           configData := map[string]string{ 
        
           	"CLUSTER_NAME":           pravegaCluster.Name, 
        
           	"ZK_URL":                 pravegaCluster.Spec.ZookeeperUri, 
        
           	"JAVA_OPTS":              strings.Join(javaOpts, " "), 
        
           	"REST_SERVER_PORT":       "10080", 
        
           	"CONTROLLER_SERVER_PORT": "9090", 
        
           	"AUTHORIZATION_ENABLED":  "false", 
        
           	"TOKEN_SIGNING_KEY":      "secret", 
        
           	"USER_PASSWORD_FILE":     "/etc/pravega/conf/passwd", 
        
           	"TLS_ENABLED":            "false", 
        
           	"WAIT_FOR":               pravegaCluster.Spec.ZookeeperUri, 
        
           }

Update ReadMe with GKE Tier2 Storage Options

The current 'demo' uses the NFS provisioner to provide some basic Tier2 storage. ECS is obviously not an option and Google Cloud Storage, although it has an HDFS connector, does not support the HDFS Append command required by Pravega.

Google FileStore presents itself as an NFS share therefore which WOULD be a compatible native storage option for installing on GKE.

Apply API Conventions

Let's review the CRD in the light of the official API Conventions.

Identify Operator goals and required actions

Some operator goals are fairly obvious (like deployment and scaling) we should identify which Operator goals are required for the operator to be "minimally viable" for release. Also, the separation of duties between the Pravega Operator and other projects (Nautilus, Flink, etc.) should be considered.

This issue should track the goals considered for initial release. Once identified, these will be tracked as issues and added to the 0.0.1 milestone.

Add an initContainer field to the PravegaClusterResource

Motivation:

To facilitate the customizing of Pravega cluster installation, it would be very useful to have an initContainer "hook", such that one can inject files in the controller's runtime environment. The concrete use of it for now, is the ability to provide plugin implementations (the auth handler being the primary one at the moment, I think).

Proposal:

 pravega:
    controllerReplicas: 1
    segmentStoreReplicas: 3

    plugins:  <-- new element
      images: [ "repohost/reponame/imagename:tag" ]  

    cacheVolumeClaimTemplate:

When the plugin element is seen, the operator would do the following:

add an empty volume in the controller pod
map this volume to /opt/pravega/plugin/lib to an init-container spec of the main controller deployment spec, using the specified image from the plugins spec
map that same volume to the same location in the controller container itself.

Let k8 do the rest. :)

The requirement for the init-container image implementation is simply to:

contain a fat jar for the desired plugin to implement
have an entry point/command that automatically does a cp of the jar into /opt/pravega/plugin/lib.

This will allow the said jars files to eventually show up in the controller's main container's /opt/pravega/plugin/lib directory and be ready to use.

note: the proposal uses a list for images, as one could think there could be multiple init containers back to back which, according to k8 documentation, would execute one after the other in the order of the list.

Cleanup Zookeeper MetaData on PravegaCluster Delete

When deleting a PravegaCluster the Kubernetes resources are removed but the metadata for the cluster remains in Zookeeper. This should be removed in order to clean everything up as well as stop it interfering if a cluster with the same name is recreated.

Bookkeeper's ledger & journal volumes are not getting cleanup after destroying Pravega cluster

After destroying Pravega Cluster, Bookkeeper volumes have not been cleaned up, so they contain data from older deployments. Due to the stale volumes present, new deployment of bookie pods are failing with below error

2018-10-31 10:27:44,400 - INFO  - [main-SendThread(10.100.200.42:2181):ClientCnxn$SendThread@1381] - Session establishment complete on server 10.100.200.42/10.100.200.42:2181, sessionid = 0x1000384e32b00a7, negotiated timeout = 10000
2018-10-31 10:27:44,402 - INFO  - [main-EventThread:ZooKeeperWatcherBase@131] - ZooKeeper client is connected now.
2018-10-31 10:27:44,491 - INFO  - [main:BookieNettyServer@382] - Shutting down BookieNettyServer
2018-10-31 10:27:44,497 - ERROR - [main:BookieServer@435] - Exception running bookie server :
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost: "10.200.84.29:3181"
journalDir: "/bk/journal"
ledgerDirs: "1\t/bk/ledgers"
instanceId: "f3db659f-6039-4399-b76e-8d5bedbf2bd7"
] is not matching with [4
bookieHost: "10.200.59.7:3181"
journalDir: "/bk/journal"
ledgerDirs: "1\t/bk/ledgers"
instanceId: "f3db659f-6039-4399-b76e-8d5bedbf2bd7"
]
       at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:141)
        at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:152)
        at org.apache.bookkeeper.bookie.Bookie.checkEnvironment(Bookie.java:329)
        at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:687)
        at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:124)
        at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:100)
        at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:418)

Add Helm Chart for installation

Develop a Helm chart for the operator.

Add versioning

Since we are about to open up the repo and make the first release. We need to add versioning to the project.

Validate Tier2 Configuration with Admissions Webhook

Validate that the provided Tier2 configuration is correct before the PravegaCluster Resource is committed to the API. It is assumed that an Admissions Webhook would be put into the controller in order to perform this validation.

Allow empty namespace to be passed in

Currently the operator requires a namespace, however passing in a blank namespace "" specifies that the operator should listen on all namespaces and should be an allowed value.

Support external connectivity

Overview

The Pravega clusters that are produced by the operator should support external connectivity (i.e. connectivity from outside the Kubernetes cluster). The specific endpoints in question are the controller RPC/REST ports, and the segment store RPC port.

Challenges

Pravega ingests data directly from client to a dynamic set of segment stores, unlike a conventional service that relies on a stable, load-balanced endpoint. The client discovers the segment stores with the help of the controller, who's aware of active segment stores and their endpoint addresses. Specific challenges include:

advertising usable addresses to the client
facilitating transport encryption (TLS) to the segment store (e.g. supporting hostname verification)
optimizing internal connectivity vs external connectivity (e.g. avoiding an expensive route when possible)

Vendor Specifics

PKS: has option to use NSXT for Ingress. Istio is apparently on the roadmap.
GKE: see references at bottom

Implementation

For conventional services, external connectivity is generally accomplished with an Ingress resource. Ingress primarily supports HTTP(s) and it is unclear whether gRPC (which is HTTP/2-based) is supported (ref).

Ingress is probably not suitable for exposing the segment store. For workloads that are similar to Pravega, e.g. Kafka, the typical solution is to use a NodePort.

Keep in mind that Ingress and services of type LoadBalancer may incur additional costs in cloud environments (GCP pricing).

Multiple Advertised Addresses

Certain Pravega clients will be internal to the cluster, others external. Imagine that the segment store advertised only an external address (backed by a NodePort or other type of service); would the performance of internal clients suffer due to a needlessly expensive route? A mitigation would be to introduce support for numerous advertised advertised addresses ("internal"/"external"). Given a prioritized list, the client could strive to connect to the cheapest endpoint.

This idea could extend to full-fledged multi-homing, where the segment store binds to a separate interface/port per endpoint, possibly with a separate SSL configuration per endpoint.

NodePort Details

Be sure to set the externalTrafficPolicy field of the Service to local. This will ensure that traffic entering a given VM will be routed to the segment store on that same VM.

One limitation of NodePort is that only a single segment store may be scheduled on a given cluster node. If multiple were to be scheduled, some would fail with a port-unavailable error. One way to avoid this is to use a DaemonSet to manage the segment store pods.

References

Add Initial Scaling Support

Add support for syncing the size of components to an updated PravegaCluster resource

	configData := map[string]string{
	"CLUSTER_NAME": pravegaCluster.Name,
	"ZK_URL": pravegaCluster.Spec.ZookeeperUri,
	"JAVA_OPTS": strings.Join(javaOpts, " "),
	"REST_SERVER_PORT": "10080",
	"CONTROLLER_SERVER_PORT": "9090",
	"AUTHORIZATION_ENABLED": "false",
	"TOKEN_SIGNING_KEY": "secret",
	"USER_PASSWORD_FILE": "/etc/pravega/conf/passwd",
	"TLS_ENABLED": "false",
	"WAIT_FOR": pravegaCluster.Spec.ZookeeperUri,
	}

pravega / pravega-operator Goto Github PK

pravega-operator's Introduction

Pravega Operator

Overview

Project status

Quickstart

Install the Operator

Upgrade the Operator

Features

Development

Releases

Contributing and Community

Troubleshooting

License

pravega-operator's People

Contributors

Stargazers

Watchers

Forkers

pravega-operator's Issues

Overview

Challenges

Vendor Specifics

Implementation

Multiple Advertised Addresses

NodePort Details

References

Recommend Projects

Recommend Topics

Recommend Org