longhorn / longhorn Goto Github PK

View Code? Open in Web Editor NEW

5.6K 98.0 562.0 3.63 MB

Cloud-Native distributed storage built on and for Kubernetes

Home Page: https://longhorn.io

License: Apache License 2.0

Shell 90.05% Python 6.48% Mustache 3.47%

kubernetes longhorn k8s-sig-storage distributed-systems high-availability storage cncf

longhorn's Introduction

A CNCF Incubating Project. Visit longhorn.io for the full documentation.

Longhorn is a distributed block storage system for Kubernetes. Longhorn is cloud-native storage built using Kubernetes and container primitives.

Longhorn is lightweight, reliable, and powerful. You can install Longhorn on an existing Kubernetes cluster with one kubectl apply command or by using Helm charts. Once Longhorn is installed, it adds persistent volume support to the Kubernetes cluster.

Longhorn implements distributed block storage using containers and microservices. Longhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. The storage controller and replicas are themselves orchestrated using Kubernetes. Here are some notable features of Longhorn:

Enterprise-grade distributed storage with no single point of failure
Incremental snapshot of block storage
Backup to secondary storage (NFSv4 or S3-compatible object storage) built on efficient change block detection
Recurring snapshot and backup
Automated non-disruptive upgrade. You can upgrade the entire Longhorn software stack without disrupting running volumes!
Intuitive GUI dashboard

You can read more technical details of Longhorn here.

Releases

NOTE:

<version>* means the release branch is under active support and will have periodic follow-up patch releases.

Latest release means the version is the latest release of the newest release branch.

Stable release means the version is stable and has been widely adopted by users.

Release EOL: One year after the first stable version. For the details, please refer to Release Support.

https://github.com/longhorn/longhorn/releases

Release	Latest Version	Stable Versions	Release Note	Important Note	Supported
1.6*	1.6.1	1.6.1	🔗	🔗	✅
1.5*	1.5.5	1.5.4, 1.5.3	🔗	🔗	✅
1.4	1.4.4	1.4.4, 1.4.3, 1.4.2, 1.4.1	🔗	🔗
1.3	1.3.3	1.3.3, 1.3.2	🔗	🔗
1.2	1.2.6	1.2.6, 1.2.5, 1.2.4, 1.2.3, 1.2.2	🔗	🔗
1.1	1.1.3	1.1.3, 1.1.2	🔗

Roadmap

https://github.com/longhorn/longhorn/wiki/Roadmap

Components

Longhorn is 100% open-source software. Project source code is spread across several repositories:

Engine:
Manager:
Instance Manager:
Share Manager:
Backing Image Manager:
UI:

Component	What it does	GitHub repo
Longhorn Backing Image Manager	Backing image download, sync, and deletion in a disk	longhorn/backing-image-manager
Longhorn Instance Manager	Controller/replica instance lifecycle management	longhorn/longhorn-instance-manager
Longhorn Manager	Longhorn orchestration, includes CSI driver for Kubernetes	longhorn/longhorn-manager
Longhorn Share Manager	NFS provisioner that exposes Longhorn volumes as ReadWriteMany volumes	longhorn/longhorn-share-manager
Longhorn UI	The Longhorn dashboard	longhorn/longhorn-ui

Library	What it does	GitHub repo
Longhorn Engine	V1 Core controller/replica logic	longhorn/longhorn-engine
Longhorn SPDK Engine	V2 Core controller/replica logic	longhorn/longhorn-spdk-engine
iSCSI Helper	V1 iSCSI client and server libraries	longhorn/go-iscsi-helper
SPDK Helper	V2 SPDK client and server libraries	longhorn/go-spdk-helper
Backup Store	Backkup libraries	longhorn/backupstore
Common Libraries		longhorn/go-common-libs

Get Started

Requirements

For the installation requirements, refer to the Longhorn documentation.

Installation

NOTE: Please note that the master branch is for the upcoming feature release development. For an official release installation or upgrade, please take a look at the ways below.

Longhorn can be installed on a Kubernetes cluster in several ways:

Documentation

The official Longhorn documentation is here.

Get Involved

Discussion, Feedback

If having any discussions or feedback, feel free to file a discussion.

Features Request, Bug Reporting

If having any issues, feel free to file an issue. We have a weekly community issue review meeting to review all reported issues or enhancement requests.

When creating a bug issue, please help upload the support bundle to the issue or send it to longhorn-support-bundle.

Report Vulnerabilities

If any vulnerabilities are found, please report them to longhorn-security.

Community

Longhorn is open-source software, so contributions are greatly welcome. Please read Code of Conduct and Contributing Guideline before contributing.

Contributing code is not the only way of contributing. We value feedback very much and many of the Longhorn features originated from users' feedback. If you have any feedback, feel free to file an issue and talk to the developers at the CNCF #longhorn Slack channel.

If you are having any discussion, feedback, requests, issues, or security reports, please follow the below ways. We also have a CNCF Slack channel: longhorn for discussion.

Community Meeting and Office Hours

Hosted by the core maintainers of Longhorn: 4th Friday of every month at 09:00 (CET) or 16:00 (CST) at https://community.cncf.io/longhorn-community/.

Longhorn Mailing List

Stay up to date on the latest news and events: https://lists.cncf.io/g/cncf-longhorn

You can read more about the community and its events here: https://github.com/longhorn/community

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Longhorn is a CNCF Incubating Project

longhorn's People

Contributors

Stargazers

Watchers

Forkers

yasker sheng-liang tomzhang yunlionfan yunlion godblessyouyou ellerbrock 40a djmaze davidoster mangeshvc cloud-architecture icaas wusendong maniacs-ops nevercgoodbye jimmy-peng xuefengchang th3architect zerolugithub raymondlwb keramist squawell edroot wchao1241 peron llparse jaciechao epowell101 ttpcodes zhenghc niusmallnan robbilie gridl recall704 gooqhy lnikell maduhu forcoder byondrak fredkan smileusd neillin xyjwsj amreeshtyagi chenjun3092 anypm limx59 xiaoluhong cloud-native-collections smallteeths shuo-wu sjas fufuhu doctorsnail onophris meldafrawi snowflying ywei88 cloud-sandbox deepljh0001 aslan-mu marcelmalik slpcat oskapt preachermanx wusphinx kingsqv rkamisetti792 ceyhunn pavel-tarasenko victoriaplum-com onisimchukv ansodou ajjaiii gfk-kube nandanbharadwaj kaggledevs forkkit liaozhenqiu ajesse11x rbq junneyang lalinsky andyjeffries tedder wangxiao86 jamiejoin pengshao1942 jubayerarefin franklinharry sellto mdheller socioprophet jerlosan shaneutt tonynoble wuwx couldntcatchthebus guichuanghua

longhorn's Issues

Progress report for backup

Backup should report progress.

It's not straightforward in the current engine architecture when the backups are done out of band without the engine knowing it.

Design needed.

volume created via WebUI not usable directly

Hi,

going further on test LongHorn, i've found something strange,
i've created 3 volumes via the WebUI .
on a node, here's what i have:

docker volume ls
DRIVER              VOLUME NAME
local               033816fae07fd2075579606ef1e89647198fdba359c17f26a99ddde6b6b83905
local               19c66b299999da7281a2b3790d7c8c35cdb80f9dbe6c6b09a00f5b429c6d9133
local               4491e34682b18821a9f15b135524db252e2b00d29bb3c7571c0668ae45b42e3e
local               7a4a251058bde34b71689bbcaeea2a6c306fa28160d2755f2cf7c469088a937a
local               96e1437ddba02155fc21eac5e7241d43ef6145dc0a539a66335c2efdba154713
local               bc93ca1ba1449d6759ce50bb3364ec5a85f4cb4b1739983f3c9aceb927201101
local               c0b3f21dec349e0319c92c9ceda97b730328255333aef17a4fc803cc4fa4af51
local               de645596f80be62b0f29d8f4503f011977abfbbe1f3619573e31d78d33f85f08
local               eb8245263edea416da72985e895d6c75f78094f526d008104eb0626ca3da5f02

if i do : docker volume create -d longhorn vol2XinityTest

if have suddenly these volumes available:

docker volume ls
DRIVER              VOLUME NAME
local               033816fae07fd2075579606ef1e89647198fdba359c17f26a99ddde6b6b83905
local               19c66b299999da7281a2b3790d7c8c35cdb80f9dbe6c6b09a00f5b429c6d9133
local               4491e34682b18821a9f15b135524db252e2b00d29bb3c7571c0668ae45b42e3e
local               6e9a04b525cbae78023af6c82282b76ea7cd029a6ab41a67a8c4e73b47d0eb45
local               7a4a251058bde34b71689bbcaeea2a6c306fa28160d2755f2cf7c469088a937a
local               8e1c21a1b1c9245ca160faa0390230181030ebf72ba9d6915c2f236183116fde
local               96e1437ddba02155fc21eac5e7241d43ef6145dc0a539a66335c2efdba154713
longhorn            Xinity_Vol1
longhorn            Xinity_Vol2
longhorn            Xinity_Vol3
local               bc93ca1ba1449d6759ce50bb3364ec5a85f4cb4b1739983f3c9aceb927201101
local               c0b3f21dec349e0319c92c9ceda97b730328255333aef17a4fc803cc4fa4af51
local               de645596f80be62b0f29d8f4503f011977abfbbe1f3619573e31d78d33f85f08
local               eb8245263edea416da72985e895d6c75f78094f526d008104eb0626ca3da5f02
longhorn            vol1XinityTest
longhorn            vol2XinityTest

Have i did something wrong ?
How can i have the volumes created via the WebUI directly available ?

Thanks a lot !

Missing tools and/or documentation how to recover from server/cluster crash

I managed to overload my testing kubernetes cluster and all nodes rebooted at the same time.
Now I have 13 volumes (replicas per node) in Error state and only one working.

Is there tools or documentation how to recover / rebuild /run fsck on the failed replicas.
I'm still running tests so this was not a catastrophe (and no backups exist) but I'd like to know if there is a way to recover manually at least some of the replicas.

`storage-longhorn` code access

Hey,

While trying to understand the fundamentals of getting Longhorn to work across multiple nodes (not in a HA multi-host environment, but accessing volumes on other nodes while keeping storage on one host), I realised that the code repo for the storage-longhorn component isn't available publicly.

Is this something that will be released in due course, or will it remain closed?

For reference, the following components have public repositories:

longhorn-ui - https://github.com/rancher/longhorn-ui
longhorn-engine - https://github.com/rancher/longhorn-engine
longhorn-manager - https://github.com/rancher/longhorn-manager

Just missing a crucial part of the equation - storage-longhorn

Project milestone

This project seems very interesting XD

It would be nicer if this project has public milestone for alpha/beta release :)
Maybe that can give courage to our dev team to give it a try

Anyway, thanks for this powerful project

Mount the same volume in readonly mode on multiple machines

Can the read-only mount to multiple instances be done with longhorn?
Is it a planned feature?

Recurring Snapshot and Backup doesnt seem to work correclty

if i create a recurring snapshot / backup for a simple volume nothing happens...

the manager logs :

WARN[57750] Dropping Longhorn volume longhorn/pvc-9f111b98-53a3-11e8-90d5-e0d55e2e2e44 out of the queue: fail to sync longhorn/pvc-9f111b98-53a3-11e8-90d5-e0d55e2e2e44: fail to update recurring jobs for pvc-9f111b98-53a3-11e8-90d5-e0d55e2e2e44: CronJob.batch "pvc-9f111b98-53a3-11e8-90d5-e0d55e2e2e44-recurring1525882946558-recurring" is invalid: [spec.jobTemplate.spec.template.spec.containers[0].name: Invalid value: "pvc-9f111b98-53a3-11e8-90d5-e0d55e2e2e44-recurring1525882946558-recurring": must be no more than 63 characters, metadata.name: Invalid value: "pvc-9f111b98-53a3-11e8-90d5-e0d55e2e2e44-recurring1525882946558-recurring": must be no more than 52 characters]`

Add a button in UI to create snapshot then backup in one step

How can we manually create a backup without first making a snapshot ?

Kuberenetes storage controller not adding size option to created PVs

When the storage controller creates a PV in response to an unsatisfied PVC, the size option is not added, meaning the pod fails to be created as the flex volume script fails.

cloud-init.config.url file not found

Using docker-machine and vsphere 6.5 and rancheros 1.1.0

I'm trying to get my cloud config file from a gitlab snippet (PUBLIC) but the log says it is "File not found".

time="2017-11-16T13:12:22Z" level=info msg="cloud-init: Checking availability of "cloud-drive"" 
time="2017-11-16T13:12:22Z" level=info msg="cloud-init: Checking availability of "VMWare"" 
time="2017-11-16T13:12:22Z" level=info msg="cloud-init: Datasource available: VMWare:  (lastError: %!s(<nil>))" 
time="2017-11-16T13:12:22Z" level=info msg="Fetching user-data from datasource VMWare:  (lastError: %!s(<nil>))" 
time="2017-11-16T13:12:22Z" level=info msg="Read from "cloud-init.data.encoding": ""
" 
time="2017-11-16T13:12:22Z" level=info msg="Read from "cloud-init.config.data": ""
" 
time="2017-11-16T13:12:22Z" level=info msg="Read from "cloud-init.config.url": "http://git.bucker.net/snippets/1/raw"
" 
time="2017-11-16T13:12:22Z" level=error msg="no such file or directory" 
time="2017-11-16T13:12:22Z" level=info msg="cloud-init: Datasource unavailable, skipping: cloud-drive: /media/config-2 (lastError: no such file or directory)"

I tried to WGET from the command line:

$ wget http://git.bucker.net/snippets/2
Connecting to git.bucker.net (192.168.86.108:80)
wget: error getting response

Fail creating load balancer

Hi,
I deploy on ubuntu kuberneted cluster. I have received this error in deploying
"Failed to ensure load balancer for service longhorn-system/longhorn-frontend: Unsupported load balancer affinity: ClientIP"

Support object storage for backups (S3/Minio/etc)

Would be great to see support for object based storage for the backup location, ideally support for S3 api v2/v4.

Sometimes no volumes are displayed in longhorn-ui

Volume listing keeps loading and loading
Host count is displayed but volumes are always 0
But, sometimes volumes are listed and counts are also displayed
What might be causing this ?

More information:

longhorn-frontend service always remain in pending state
No errors in longhorn-ui logs

Live upgrade support for Longhorn Engine

It's possible for Longhorn Engine to implement the live upgrade support.

The basic idea is we should able to create the new version of Longhorn Engine running with the same data as the old ones, and connect to the same TGT server as before. TGT server should able to redirect the traffic to the new Longhorn Engine at some points. In general, it should work like this:

TGT pause receiving the data from block device for a moment. Disk IO would stop at this point.
TGT wait for the existing requests issued to the old version of engine to be completed.
TGT redirect the traffic to the new Longhorn engine. Disk IO resumed.
TGT signal switch complete. Old version of engine is safe to be removed.

Create a Helm Chart for Longhorn deployment

Feature Request: Support for ARM64 Architecture

If not already working I would like to request support for the ARM architecture in order to run a full Rancher toolset on baremetal ARM providers like Packet or Scaleway.

Reboot all nodes when volume attached result in volume stuck in `attaching` state

Reproduce steps (simplified from #60 (comment) )

Create a new volume, attach to a node
Reboot all the nodes in the cluster
After K8S cluster back online, the volume stuck in attaching state.

Mount command failed, status: Failure, The volume is existed and is attached

While upgrading / rolling update on the deployment, the system tries to create new pod, attach and then shutdown old pods. In this process, new replica sets are unable to attach to volume because old ones are already attached.

Error :

MountVolume.SetUp failed for volume "myservice-db" : mount command failed, status: Failure, reason: myservice-db is existed and is attached

Authentication support

To enable the management UI on the net.

https://github.com/rancher/longhorn-ui/issues/14

Delete volumes entirely with Kubernetes

Hi,

I have launched Longhorn with my Kubernetes Setup, attaching volumes and detaching volumes works fine, but how can I enforce that a volume gets really deleted and not only detached? Adding persistentVolumeReclaimPolicy: Delete to the PersistentVolume seems to be the wrong option...

Volume expansion (resize)

Is it possible to update a running volume's size?

Docker Swarm support

Hi,
Does Longhorn support Docker Swarm?

Multiple disk with capacity based scheduling

Current Longhorn replicas will use the directory at /var/lib/rancher/longhorn to store the data. We should support multiple user-specified locations on the node to use Longhorn.

We can support the same locations (e.g. in addition to the default location, user can add /opt/storage as the second location, and make sure the directory is available to use on all the nodes) for all the nodes as the first step if we don't need to store node-specific information (hopefully).

In the end, the user can update the node config and set desired disk (mounted to a node directory) to store Longhorn volume data. Longhorn can choose where to place the volume data depends on how much space the disk had left.

Progress report for rebuild

Similar to backup. Rebuild needs a progress report as well.

Mount fail in Kubernetes deployment

I check isciadm is OK. Any idea ?

ERRO[0000] Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host
Environment check failed: Invalid mount namespace /host/proc/1/ns/mnt, error Failed to execute: nsenter [--mount=/host/proc/1/ns/mnt mount], output nsenter: failed to execute mount: No such file or directory
, error exit status 1

After snapshot revert Input/output error

I created a single box setup and created volume with a 2 replica with 20GB after that
I followed these steps.

mkfs.xfs /dev/longhorn/test
mount /dev/longhorn/test /mnt
cd /mnt
touch test1 test2

On UI I took the snapshot after that I took snapshot
rm test1
Again back to UI and I click the snapshot are revert
cd /mnt
ls
ls: cannot open directory '.': Input/output error

Named targetPort cause canal/calico to crash

Try to deploy on Rancher/Server Beta 3 > Custom Nodes
version: v1.8.9-rancher1

Using default config: https://raw.githubusercontent.com/rancher/longhorn/v0.2/deploy/longhorn.yaml
I get the following errors

kubectl -n longhorn-system logs longhorn-flexvolume-driver-deployer-74869d8788-wd2kd`
WARN[0030] Failed to detect flexvolume dir, fall back to default: %!(EXTRA *errors.withStack=cannot discover Flexvolume Dir: cannot reach node configURI /api/v1/proxy/nodes/do-node1/configz: Get https://10.43.0.1:443/api/v1/proxy/nodes/do-node1/configz: dial tcp 10.43.0.1:443: i/o timeout)
FATA[0060] Error deploying Flexvolume driver: Get https://10.43.0.1:443/apis/extensions/v1beta1/namespaces/longhorn-system/daemonsets/longhorn-flexvolume-driver: dial tcp 10.43.0.1:443: i/o timeout

Using modified config with FLEXVOLUME_DIR = "/var/lib/kubelet/volumeplugins" (path from ps aux|grep kubelet)
I get the following error

kubectl -n longhorn-system logs longhorn-flexvolume-driver-deployer-77cffbcb74-849dj
FATA[0030] Error deploying Flexvolume driver: Get https://10.43.0.1:443/apis/extensions/v1beta1/namespaces/longhorn-system/daemonsets/longhorn-flexvolume-driver: dial tcp 10.43.0.1:443: i/o timeout

Longhorn running, hosts activated, volume attached but pods timing out waiting for volume

Volume shows as "Healthy" in longhorn-ui but pod is timing out waiting for volume. The longhorn is deployed in separate namespace than requesting pod. I hope that is not the issue here.

Logs from longhorn-manager :

DEBU[1477] volume myservice-db state is healthy
DEBU[1477] volume myservice-db desire state is healthy
DEBU[1477] volume myservice-db state is healthy
DEBU[1482] volume myservice-db state is healthy
DEBU[1482] volume myservice-db desire state is healthy
DEBU[1482] volume myservice-db state is healthy

Thanks in anticipation.

Attaching volume fails due to high revision counter

When attempting to attach a volume after a crash, I noticed the volume controller failing to start with the following error:
2018/05/17 20:08:20 json: cannot unmarshal number 1.648295e+06 into Go value of type int64.
If I examine the revision.counter file on one of the replicas, I see 1648295, which seems to line up with the error I get. This appears to prevent the controller from starting and the volume from being attachable.

I launched longhorn from the Rancher 2 catalog, the controllers and replicas are running version longhorn-engine:de88734.

Attach support for Flexvolume driver

We may need to support attach/detach in addition to current mount/umount. When Kubernetes think one pod is down on one node, it will create the same pod on another node. In the process, Kubernetes will try to detach the PVC from volume if possible, then attach to the new node and use it for the restarted pod.

Flexvolume document regarding attach support https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#driver-invocation-model

But it also mentioned that
This call-out (attach) does not pass "secrets" specified in Flexvolume spec. If your driver requires secrets, do not implement this call-out and instead use "mount" call-out and implement attach and mount in that call-out.

Need to check if it's worth doing it, or simply do detach inside mount call is enough.

Capacity based scheduling

We need to create a Kubernetes scheduler to make decisions based on capacity and IOPS of each disk on the host.

This is related to #47 .

Reference: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/

Helm Chart: cannot detect valid service IP

longhorn version: longhorn-manager:1ebf5cb / longhorn-engine:de88734

Error output when use different app name:

+ echo Dependency checking
++ nsenter --mount=/host/proc/1/ns/mnt -- nsenter -t 1 -n findmnt --version
Dependency checking
+ OUT='findmnt from util-linux 2.27.1'
++ nsenter --mount=/host/proc/1/ns/mnt -- nsenter -t 1 -n curl --version
+ OUT='curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets '
++ nsenter --mount=/host/proc/1/ns/mnt -- nsenter -t 1 -n blkid -v
+ OUT='blkid from util-linux 2.27.1  (libblkid 2.27.0, 02-Nov-2015)'
+ exit 0
Detecting backend service IP for longhorn-backend
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Cannot detect valid service IP, retrying
Fail to detect valid service IP. Aborting

required to set LONGHORN_BACKEND_SVC in the longhorn-manager as dynamic input, which is currently set to longhorn-backend by default.

Expected Result: to config longhorn backend service through the environment parameter

...
   env:
       - name: LONGHORN_BACKEND_SVC
          value: {{ .Release.name  }}-backend

Unable to attach storage to pod due to timeout issues

Hi,

Using longhorn as storage system. I am able to attach using longhorn UI to host but the pods can not use the storage volume. Same issue with OpenEBS storage driver too.


kubectl describe pod jenkins
Name:         jenkins
Namespace:    default
Node:         kub1/<IP>
Start Time:   Sun, 01 Apr 2018 09:48:49 +0530
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:
Containers:
  jenkins:
    Container ID:
    Image:          jenkins/jenkins:lts
    Image ID:
    Port:           80/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/jenkins_home from jenkins-home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vhpq2 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  jenkins-home:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  jenkins-data
    ReadOnly:   false
  default-token-vhpq2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vhpq2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age              From               Message
  ----     ------                 ----             ----               -------
  Warning  FailedScheduling       6m               default-scheduler  PersistentVolumeClaim is not bound: "jenkins-data" (repeated 3 times)
  Normal   Scheduled              6m               default-scheduler  Successfully assigned jenkins to kub1
  Normal   SuccessfulMountVolume  6m               kubelet, kub1      MountVolume.SetUp succeeded for volume "default-token-vhpq2"
  Warning  FailedMount            1m (x2 over 4m)  kubelet, kub1      Unable to mount volumes for pod "jenkins_default(c767552b-3563-11e8-a4b1-027e1d654f4a)": timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins". list of unattached/unmounted volumes=[jenkins-home]

Storage Class

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn
provisioner: rancher.io/longhorn
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "30"
  fromBackup: ""

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jenkins-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10G
---
apiVersion: v1
kind: Pod
metadata:
  name: jenkins
  namespace: default
spec:
  containers:
  - name: jenkins
    image: jenkins/jenkins:lts
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: jenkins-home
      mountPath: "/var/jenkins_home"
    ports:
    - containerPort: 80
  volumes:
  - name: jenkins-home
    persistentVolumeClaim:
      claimName: jenkins-data

[Question] Procedure to recover a volume

What should be the best procedure to recover a snapshot or a backup in rancher2/longhorn ?

Segmentation Fault

longhorn-manager is giving following error :

sac@node2:~$ kubectl logs longhorn-manager-j6v7v -n longhorn-system
INFO[0000] Detected IP is 10.32.0.12
INFO[0000] Kubernetes orchestrator is ready
INFO[0002] Listening on 10.32.0.12:9500
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x110aec9]

goroutine 102 [running]:
github.com/rancher/longhorn-manager/manager.(*VolumeManager).VolumeCreateBySpec(0xc4203fad20, 0xc420146da0, 0x17, 0x0, 0x0)
	/go/src/github.com/rancher/longhorn-manager/manager/manager.go:406 +0x529
created by github.com/rancher/longhorn-manager/manager.(*VolumeManager).startProcessing
	/go/src/github.com/rancher/longhorn-manager/manager/process.go:38 +0x3c7

For second time setup, do I need to clean any directories on the node ?

Deploying databases

Is Longhorn a good choice to deploy databases with replication?

Progress report for restore

The restore process is different from rebuild/backup. The restore was done as a part of replica starting, so at that time the controller wasn't started yet.

On the manager side, it should be reported as a part of replica status periodically, filled by the replica controller.

Rolling back snapshots

Hey everyone!

Started experimenting with Longhorn to use it for persistant storage, however when i try to rollback a snapshot i experience some problems.

ls: reading directory .: Input/output error

Is there currently a problem with snapshots in Longhorn or have i made an error with my setup?

RancherOS support

TGT frontend doesn't work on RancherOS, but TCMU does (tried with these commands). Is there a way to tell the manager to use use TCMU mode?

Some changes on Wiki

Since we can't pull request wiki, I post a remark here on it:

On the "Multi Host Setup Guide" wiki, docker test commands should use the "--rm" flag to remove test containers after run. This will allow to not have X stopped containers when doing X tests.

The lines concerned are :

docker run --net longhorn-net rancher/longhorn-manager curl http://[etcd_ip]:2379/v2/stats/leader

and

docker run --net longhorn-net --privileged rancher/longhorn-manager /bin/bash -c "mount -t nfs4 [nfs_ip]:[nfs_path] /mnt && umount /mnt"

that will become:

docker run --rm --net longhorn-net rancher/longhorn-manager curl http://[etcd_ip]:2379/v2/stats/leader

and

docker run --rm --net longhorn-net --privileged rancher/longhorn-manager /bin/bash -c "mount -t nfs4 [nfs_ip]:[nfs_path] /mnt && umount /mnt"

unable to create CRDs

Error stack from longhorn-manager
Kubernetes : 1.9.x
Cloud : Bare Metal Ubuntu 16.x

unable to create CRDs
github.com/rancher/longhorn-manager/datastore.NewCRDStore
	/go/src/github.com/rancher/longhorn-manager/datastore/k8scrd.go:50
main.RunManager
	/go/src/github.com/rancher/longhorn-manager/main.go:124
github.com/rancher/longhorn-manager/vendor/github.com/urfave/cli.HandleAction
	/go/src/github.com/rancher/longhorn-manager/vendor/github.com/urfave/cli/app.go:485
github.com/rancher/longhorn-manager/vendor/github.com/urfave/cli.(*App).Run
	/go/src/github.com/rancher/longhorn-manager/vendor/github.com/urfave/cli/app.go:259
main.main
	/go/src/github.com/rancher/longhorn-manager/main.go:84
runtime.main
	/usr/local/go/src/runtime/proc.go:185
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:2197
fail to create CRD store
main.RunManager

Use with Rancher Cattle

You mention in the blog that Longhorn is compatible with Swarm and Kubernetes - what about Cattle?

iSCSI endpoint support

Add iSCSI endpoint support.

The live upgrade support will impact this.

S3 backups

Is there any plan to support S3 as a backend for backups?

Kubernetes storage class support

It would be nice to support Kubernetes storage class for dynamic provision.

Rancher 2.0 longhorn is not working.

rancher version: v2.0.0

kubernetes:

longhorn version:

Result:
The volume cant create automatically according to the storage class after the deployment from catalog.

deployment:

storage class:

cant bond volume cuz volume didnt create:

longhorn-ui cant find any volumes:

Attempt to use volume failed - timeout waiting for API

I have four servers (RHEL 7.3 (Maipo)) in Docker Swarm cluster, then followed https://github.com/rancher/longhorn/wiki/Multi-Host-Setup-Guide.

ID                            HOSTNAME                  STATUS              AVAILABILITY        MANAGER STATUS
39usnnx3hel30rc8mu2ifut4j *   rancher-longhorn-lab004   Ready               Active              Leader
k9tnmupa05bnn3ezbfkmmtysk     rancher-longhorn-lab002   Ready               Active
sk83z70y863oyzfseium4oeb5     rancher-longhorn-lab001   Ready               Active
u79op8ofsihs61sc1ghl2x0g8     rancher-longhorn-lab003   Ready               Active

Cluster seems to be ok, in UI I see all four servers, also created volumes and replicase. But I'm unable to do basic thing - mount volumes from different hosts than was used for the first time. My steps:

Create volume on server1
docker volume create -d longhorn vol1
Use volume on server2
docker run -it --volume-driver longhorn -v vol1:/vol1 ubuntu bash
so far no problem, stop ubuntu test container and then
Try to use volume on server3
docker run -it --volume-driver longhorn -v vol1:/vol1 ubuntu bash

docker: Error response from daemon: error while mounting volume \
'/var/lib/rancher/volumes/longhorn/vol1': VolumeDriver.Mount: unable to attach volume: \
failed to start the controller for volume 'vol1': \
Fail to create controller for vol1: fail to process schedule request: failed to process schedule: \
fail to wait for api endpoint at \
http://10.0.0.14:9501/v1: timeout waiting for http://10.0.0.14:9501/v1.

longhorn-manager log

ERRO[0739] Attempting to Write an unknown type:
ERRO[0770] fail to start controller vol2-controller of vol2, cleaning up: fail to wait for api endpoint at http://10.0.0.17:9501/v1: timeout waiting for http://10.0.0.17:9501/v1
WARN[0770] HTTP handling error unable to attach volume: failed to start the controller for volume 'vol2': Fail to create controller for vol2: fail to process schedule request: failed to process schedule: fail to wait for api endpoint at http://10.0.0.17:9501/v1: timeout waiting for http://10.0.0.17:9501/v1
ERRO[0770] Error in request: unable to attach volume: failed to start the controller for volume 'vol2': Fail to create controller for vol2: fail to process schedule request: failed to process schedule: fail to wait for api endpoint at http://10.0.0.17:9501/v1: timeout waiting for http://10.0.0.17:9501/v1

[Chart] Better presentation of volume plugin directory choices

It's easy to ignore that the important option of volume plugin (flexvolumepath) in the chart.

We can have multiple choices:

RKE: /var/lib/kubelet/volumeplugins
GKE: /home/kubernetes/flexvolume/
Customized, by default: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/

Can we make it more obvious for the user to select the correct value?

For example, can we have a radio button present all the directory we know according to the distribution, then let the user fill in the blank if he is not using the common know distro?

can't attach volume to a swarm node

hi here,

Trying to attach a successfully created volume to a swarm node doesn't work , with these log messages:

ERRO[1195] fail to start controller Xinity_Vol1-controller of Xinity_Vol1, cleaning up: fail to wait for api endpoint at http://10.20.1.14:9501/v1: timeout waiting for http://10.20.1.14:9501/v1 
WARN[1195] HTTP handling error unable to attach volume: failed to start the controller for volume 'Xinity_Vol1': Fail to create controller for Xinity_Vol1: fail to process schedule request: failed to process schedule: fail to wait for api endpoint at http://10.20.1.14:9501/v1: timeout waiting for http://10.20.1.14:9501/v1 
ERRO[1195] Error in request: unable to attach volume: failed to start the controller for volume 'Xinity_Vol1': Fail to create controller for Xinity_Vol1: fail to process schedule request: failed to process schedule: fail to wait for api endpoint at http://10.20.1.14:9501/v1: timeout waiting for http://10.20.1.14:9501/v1

Setup is made with docker-machine with virtualbox.
OS: RancherOS latest release

following wiki documentation for multi-node setup

Container Storage Interface (CSI) support

CSI became beta in v1.10, we should start working on supporting it.