migtools / mig-operator Goto Github PK

OpenShift Migration Operator

License: Apache License 2.0

Dockerfile 0.47% Shell 17.28% Jinja 82.24%

mig-operator's Introduction

Crane Operator

Crane Operator (mig-operator) installs a system of migration components for moving workloads from OpenShift 3 to 4.

Installable Component	Repository
Velero + Custom Migration Plugins	velero, openshift-migration-plugin
Migration Controller	mig-controller
Migration UI	mig-ui

Demos

Development
Crane Operator Installation
- Crane Operator Upgrades
Component Installation and Configuration
Migration Example Using API
CORS (Cross-Origin Resource Sharing) Configuration
Removing Crane Operator
Resources Migrated By Crane Operator
Direct Migration Requirements

Development

See hacking.md for instructions on installing unreleased versions of mig-operator.

Crane Operator Installation

See OpenShift version Compatibility for info on appropriate Crane versions for your OpenShift versions.

OpenShift 4

Crane Operator is installable on OpenShift 4 via OperatorHub.

Installing released versions

Visit the OpenShift Web Console.
Navigate to Operators => OperatorHub.
Search for Crane Operator.
Install the desired Crane Operator version.

Installing latest

See hacking.md

OpenShift 3

Crane Operator is installable on OpenShift 3 via OpenShift manifest.

Installing released versions

Obtain the operator.yml and controller-3.yml for the desired version. Visit Quay for a list of valid tags. Stable releases are tagged as release-x.y.z. Latest is used for development and may be unstable.

podman cp $(podman create quay.io/konveyor/mig-operator-container:<tag>):/operator.yml ./
podman cp $(podman create quay.io/konveyor/mig-operator-container:<tag>):/controller-3.yml ./

oc create -f operator.yml

Crane Operator Upgrades

See the MTC Upgrade Documentation.

Component Installation and Configuration

Component installation and configuration is accomplished by creating or modifying a MigrationController CR.

Installation Topology

You must install Crane Operator and components on all OpenShift clusters involved in a migration.

Use Case	Recommended Topology
Migrating from OpenShift 3 => 4	Install Velero on all clusters. Install the Controller and UI on the OpenShift 4 cluster.
Migrating from OpenShift 3 => 3	Install Velero on all clusters. Install the Controller and UI on the target cluster.

Customizing your Installation

You can choose components to install by setting parameters MigrationController CR spec

Parameter Name	Usage	Recommended Setting
`migration_velero`	Set to `true` to install Velero and Restic.	Set to `true` on all clusters.
`migration_controller`	Set to `true` to install the Migration Controller	Set to `true` only on one cluster, preferably OpenShift 4.
`migration_ui`	Set to `true` to install the Migration UI	Set to `true` only where `migration_controller: true`.

Installing Crane Components

Creating a MigrationController CR will tell Crane Operator to install Migration Components.

OpenShift 4

In the OpenShift console navigate to Operators => Installed Operators.
Click on Crane Operator.
Find MigrationController on the Provided APIs page and click Create Instance.
On OpenShift 4.5+, click the Configure via: YAML view radio button.
Customize settings (component selections, migration size limits) in the YAML editor, and click Create.

OpenShift 3

Review the controller-3.yml retrieved earlier.
Adjust settings (component selections, migration size limits) if desired.
Set mig_ui_cluster_api_endpoint to point at the Controller cluster APIserver URL/Port.
Run oc create -f controller-3.yml

Additional Settings

Additional settings can be applied by editing the MigrationController CR.

oc edit migrationcontroller -n openshift-migration

Restic Timeout

spec:
  restic_timeout: 1h

The default restic_timeout is 1 hour, specified as 1h. You can increase this if you anticipate doing large backups that will take longer than 1 hour so that your backups will succeed. The downside to increasing this value is that it may delay returning from unanticipated errors in some scenarios. Valid units are s, m, and h, which stand for second, minute, and hour.

Migration Limits

spec:
  mig_pv_limit: '100'
  mig_pod_limit: '100'
  mig_namespace_limit: '10'

Setting for the max allowable number of resources in a Migration Plan. The default limits serve as a recommendation to break up large scale migrations into several smaller Migration Plans.

Migration Example Using API

CORS (Cross-Origin Resource Sharing) Configuration

You must follow the CORs configuration steps only if:

You are installing Crane Operator 1.1.1 or older
You are installing Migration Controller and Migration UI on OpenShift 3

Removing Crane Operator

To clean up all the resources created by the operator you can do the following:

oc delete namespace openshift-migration

oc delete crd backups.velero.io backupstoragelocations.velero.io deletebackuprequests.velero.io downloadrequests.velero.io migrationcontrollers.migration.openshift.io podvolumebackups.velero.io podvolumerestores.velero.io resticrepositories.velero.io restores.velero.io schedules.velero.io serverstatusrequests.velero.io volumesnapshotlocations.velero.io

oc delete clusterrole migration-manager-role manager-role

oc delete clusterrolebindings migration-operator velero mig-cluster-admin migration-manager-rolebinding manager-rolebinding

oc delete oauthclient migration

Resources Migrated By Crane Operator

Please refer to Resources Migrated By Crane Operator in order to gain insights regarding what kind of objects/resources get migrated by the Crane Operator.

Direct Migration Requirements

In MTC 1.4.0, a new feature called Direct Migration is available that will yield significant time savings for most customers migrating persistent volumes and/or internal images. Direct Migration enables the migration of persistent volumes and internal images directly from the source cluster to the destination cluster without an intermediary replication repository. This introduces a significant performance enhancement while also providing better error and progress reporting information back to the end-user. Please refer the Direct Migration Requirements documentation for more details.

mig-operator's People

Contributors

Stargazers

Watchers

Forkers

sseago jwmatthews eriknelson jmontleon pranavgaikwad mkarg75 djwhatle dymurray danil-grigorev fbladilo rayfordj sergiordlr newgoliath mansam gangchen03 xinredhat andyvesel huweihua-redhat hu-weihua benaka47 yurturta alaypatel07 vpagar1 aufi ashokponkumar haikelrh joserobertor gyliu513 jaydipgabani shawn-hurley seshapad shubham-pampattiwar ibolton336 isantospardo brabenschlag djzager stillalearner ciecierski prasadjoshi12 terry-basin kaovilai jpeixoto6 weshayutin rksh2singh mvazquezc awels

mig-operator's Issues

white page at point 4

Hi,
i did the setup on a ocp 4.1 cluster and on a ocp 3.11 cluster and it worked good.
When I migrate a project without pvc no problem at all but when I try to migrate a projet with pvc, at point 4 "Storage Class" I get a white page.
This happens with firefox 68.0 and chrome 76.0.3809.100 on fedora core 30

Even if I got a white page, the Plan has been created but I cannot delete it with chrome

regards

Redo non-olm operator permissions to match OLM case

We're giving the operator cluster-admin in non-olm installs. We shouldn't be doing this.
https://github.com/konveyor/mig-operator/blob/master/deploy/non-olm/v1.1.2/operator.yml#L159-L171

Update docs info referred to in downstream CSV to reference official downstream docs

Example:
https://docs.openshift.com/container-platform/4.2/migration/migrating-3-4/about-migration.html

Ideally we'd update this per release to track, 4.2, 4.3, 4.4, etc.

Alternative is we could just leverage the 'latest' link to reduce need to update this per release and use something like below:
https://docs.openshift.com/container-platform/latest/migration/migrating-3-4/about-migration.html

Change controller.yml defaults to OCP 3

Since we're using OLM for OCP 4, it would be a good to change the controller.yml defaults to OCP 3.

Need to update from Ansible Operator 0.9.0 base

Need to add a LICENSE file

Operator should use supported CORS config API for 4.2, and have fallback behavior for other versions

Not sure what the fallback behavior should be if the new CORS API doesn't exist, but at least with 4.2, we need to operator to configure CORS using the officially introduced API here: https://jira.coreos.com/browse/MSTR-717

It's my understanding that the ability for us to detect the version of the cluster has been removed, with the recommendation for us to instead use the API discovery mechanism to determine if the capability exists. I'm not sure how it would be possible to dynamically detect if this feature is present or not. Will follow up with what we expect to do.

Make route edge/redirect

connecting to the migration route via http should redirect to https. We can do this via route configuration.

This is especially helpful when getting the route value from the command line where the scheme portion of the uri is absent.

Update the command to get the CORS URL for OCP 3 in the README

There is a command for OCP4 already that would be better to use for OCP 3 as well so that the periods are escaped, etc. to ensure the URL value is not abusable.

Installing mig operator from OLM on 4.3

I tried installing the mig operator from the operator hub catalog on a openshift 4.3 cluster (via the console ui). When the deployment is created, the image is image: registry.redhat.io/rhcam-1-0/openshift-migration-rhel7-operator@sha256:c27f7293d019a8c033b3944f31e4812ca3eee53bd129561656ba79f557e0b7a8 which fails to load when the pod is created because that repository requires credentials. Since the operator pod uses a service account that is created via the operator installation I'm not sure how to provide the registry credentials prior to installation. What's the preferred method for installing the operator from operator hub? I know I can create the secret and update the service account after the fact, so just wondering if there's a way to set things up so that I can provide the registry credentials ahead of time so the operator works after installation.

We need to update our CSV to use sha's and not floating tags

OCP Disconnected work for 4.3 has a hard requirement we use sha's in our CSV.

We need to target having this done and ready for next z-stream release.

documentation issue

When I added a cluster into the ui, using the token from
oc sa get-token -n openshift-migration mig
didn't work
in ocp 3.11 I had to use the token from the migration-operator SA

Keep mig-ui-config in sync with oauthclient secret

If the oauthclient CR already exists the operator should read it in using k8s_facts and update the mig-ui-config configmap secret value wtih the same value if it is not already the same.

In order for this to take effect the mig-ui pod has to be restarted. I'm not sure if we can achieve this with the k8s some other ansible module or some other means.

Cannot install operator+controller in OCP 3.11 source cluster

Hello,

When deploying migration operator and controller in OCP3, the deployment fails and the "mig" service account is not created

What I do:

oc create -f operator.yml
Since it is the ocp3 source cluster, I edit controller.yaml so that
migration_controller: false
migration_ui: false
oc create -f controller.yml

What I expect:

I expect all the pods to be running properly, and "mig" service account created.

What it happens:

Service account with name "mig" is not created.

(v_env) [sregidor@sregidor mig-operator]$ oc get pods
NAME                                  READY     STATUS    RESTARTS   AGE
migration-operator-6d47cc9948-pw2p7   2/2       Running   0          1m
restic-6c2f9                          1/1       Running   0          23s
restic-8jlmh                          1/1       Running   0          23s
restic-b79b4                          1/1       Running   0          23s
restic-ctbgt                          1/1       Running   0          23s
restic-wskh2                          1/1       Running   0          23s
velero-7559946c5c-hqvh8               1/1       Running   0          23s
(v_env) [sregidor@sregidor mig-operator]$ oc get sa
NAME                 SECRETS   AGE
builder              2         1m
default              2         1m
deployer             2         1m
migration-operator   2         1m
velero               2         29s

There is an error in operator logs

oc logs migration-operator-6d47cc9948-pw2p7 -c operator
.
.
.
\nTASK [migrationcontroller : Set up migration CRDs] *****************************\r\ntask path: /opt/ansible/roles/migrationcontroller/tasks/main.yml:86\nok: [localhost] => (item=cluster-registry-crd.yaml) => {\"changed\": false, \"item\": \"cluster-registry-crd.yaml\", \"method\": \"delete\", \"result\": {}}\nok: [localhost] => (item=migration_v1alpha1_migcluster.yaml) => {\"changed\": false, \"item\": \"migration_v1alpha1_migcluster.yaml\", \"method\": \"delete\", \"result\": {}}\nok: [localhost] => (item=migration_v1alpha1_migmigration.yaml) => {\"changed\": false, \"item\": \"migration_v1alpha1_migmigration.yaml\", \"method\": \"delete\", \"result\": {}}\nok: [localhost] => (item=migration_v1alpha1_migplan.yaml) => {\"changed\": false, \"item\": \"migration_v1alpha1_migplan.yaml\", \"method\": \"delete\", \"result\": {}}\nok: [localhost] => (item=migration_v1alpha1_migstorage.yaml) => {\"changed\": false, \"item\": \"migration_v1alpha1_migstorage.yaml\", \"method\": \"delete\", \"result\": {}}\n\r\nTASK [migrationcontroller : Set up mig controller] *****************************\r\ntask path: /opt/ansible/roles/migrationcontroller/tasks/main.yml:97\nfatal: [localhost]: FAILED! => {\"changed\": false, \"msg\": \"Failed to find exact match for migration.openshift.io/v1alpha1.MigCluster by [kind, name, singularName, shortNames]\"}\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost                  : ok=8    changed=0    unreachable=0    failed=1   \r\n\n","job":"1837425794803595142","name":"migration-controller","namespace":"mig","error":"exit status 2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:190"}

Please, tell me if you need more information. I will gladly provide it.

Thank you,

BR.

Installing the Migration Opearator in a non "migration-opeartor" namespace fails silently and issue must be dug out of logs when installing via OLM

3S The ansible rollout of the operator fails and one need to dig out the reason
from the operator pod log. Operator playbook should check deplyoment namespace
rather than failing randomly. Not obvious from operatorhub page so likely to
not be done correctly by customers. 1) Why openshift-migration a must? 2) Any
way to make this clear in the UI?

Velero permissions give us access to every scc

We are granting ourselves use access to every single scc. This can cause us to be assigned unexpected scc's with restricted access.

Move to Operator Metadata Bundle Format

RHT operators migration timeline: April 17th,2020 - Jun 17th

Will update this issue as details become more clear what the actual expectation is. Expect documentation in early March: Documentation by Brian and Ralph for operator pipelines: Mar 6th,2020

Change ClusterRole 'manager-role' to something more specific to our needs

I assume we should rename our ClusterRole from 'manager-role' to something more specific to our migration tooling.

https://github.com/fusor/mig-operator/blob/master/roles/migrationcontroller/templates/controller.yml.j2#L6

Perhaps: 'migration-manager-role'

Usage Guide: Need example fleshed out for how to use AWS S3 as an option as well (include IAM Role)

Let's edit below and add an example for using AWS S3 as well
https://github.com/fusor/mig-operator/blob/master/docs/usage/1.md#16-object-storage-setup

Need to include the below in an IAM Role


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        }
    ]
}

See migtools/mig-controller#221 for some more info

Get Azure "host" cluster resourceGroup from Azure metadata endpoint

Currently we prompt the user for an azure_resource_group on the MigrationController CR. It's not necessary for the user to supply this information since the mig-operator pod can query this information from the compute.resourceGroupName JSON path available from the metadata endpoint accessible to all VMs running on Azure:

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service#retrieving-all-metadata-for-an-instance

For the best possible user experience, we should always try to query this endpoint and check if we get back a response that looks like what Azure's endpoint would provide. We shouldn't make the user tick an "azure: true" box, it's not necessary.

curl -H Metadata:true "http://169.254.169.254/metadata/instance?api-version=2019-03-11"

{
  "compute": {
    "azEnvironment": "AzurePublicCloud",
    "customData": "",
    "location": "westus",
    "name": "jubilee",
    "offer": "Windows-10",
    "osType": "Windows",
    "placementGroupId": "",
    "plan": {
        "name": "",
        "product": "",
        "publisher": ""
    },
    "platformFaultDomain": "1",
    "platformUpdateDomain": "1",
    "provider": "Microsoft.Compute",
    "publicKeys": [],
    "publisher": "MicrosoftWindowsDesktop",
    "resourceGroupName": "myrg",
    "resourceId": "/subscriptions/xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx/resourceGroups/myrg/providers/Microsoft.Compute/virtualMachines/negasonic",
    "sku": "rs4-pro",
    "subscriptionId": "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx",
    "tags": "Department:IT;Environment:Prod;Role:WorkerRole",
    "version": "17134.345.59",
    "vmId": "13f56399-bd52-4150-9748-7190aae1ff21",
    "vmScaleSetName": "",
    "vmSize": "Standard_D1",
    "zone": "1"
  },
  "network": {
    "interface": [
      {
        "ipv4": {
          "ipAddress": [
            {
              "privateIpAddress": "10.1.2.5",
              "publicIpAddress": "X.X.X.X"
            }
          ],
          "subnet": [
            {
              "address": "10.1.2.0",
              "prefix": "24"
            }
          ]
        },
        "ipv6": {
          "ipAddress": []
        },
        "macAddress": "000D3A36DDED"
      }
    ]
  }
}

mig-operator does not remove deployed `mig` components when CR vars are set to false after deploy

I successfully deployed velero, the controller, and the ui with the following CR content:

kind: MigrationController
[...]
  migration_controller: true
  migration_ui: true
  migration_velero: true

Later, I decided I wanted to remove some (or all) of these components.

kind: MigrationController
[...]
  migration_controller: false
  migration_ui: false
  migration_velero: false

Reconcile ran in response to my change, but the components were not removed from the cluster.

stable: Restic DS pods bouncing on pending on OCP 3.7

I think this issue boils down to this article related to DS and node placement on OCP 3.7 :

https://docs.openshift.com/container-platform/3.11/dev_guide/daemonsets.html

The issue is only affecting 3.7 restic DS pods. Could not reproduce with other 3.x or 4.x releases.

Basically OCP 3.7 will attempt to place pods onto master and infra nodes that are not schedulable by regular pods, this is what we see :

[fbladilo@fbladilo:~/git-projects/mig-ci] (fix_operator_ns *%)$ oc -n openshift-migration get pods -o wide
NAME                                  READY     STATUS    RESTARTS   AGE       IP         NODE
migration-operator-1921234097-stbp7   2/2       Running   0          31m       10.1.2.3   node2.ci-stable-37-41-v3-44.internal
restic-5w22m                          1/1       Running   0          30m       10.1.2.4   node2.ci-stable-37-41-v3-44.internal
restic-fjqcx                          1/1       Running   0          30m       10.1.6.5   node1.ci-stable-37-41-v3-44.internal
restic-mcfbj                          0/1       Pending   0          1s        <none>     master1.ci-stable-37-41-v3-44.internal
restic-rq866                          0/1       Pending   0          1s        <none>     infranode1.ci-stable-37-41-v3-44.internal
velero-717797469-8fxvx                1/1       Running   0          30m       10.1.6.6   node1.ci-stable-37-41-v3-44.internal

The pods continue to re-start themselves endlessly and creates a load on the OCP 3.7 cluster.
I verified that applying the patch for the node selector on OCP 3.7 allows the restic pods to succeed scheduling.

How to reproduce :

Deploy operator and controller on stable tag on OCP 3.7
Inspect openshift-migration namespace

Set up mig ui, "Failed to create object: Could not determine if we should add owner ref"

Saw below failure on a fresh, brand new ocp4 cluster, nothing installed prior.

image: quay.io/ocpmigrate/mig-operator:latest
imageID: quay.io/ocpmigrate/mig-operator@sha256:28004497e365ff9bf812059f11bfec26871989c55444937a9a26555eec193265

$ oc logs migration-operator-bfbc7f445-nk4gz -c ansible

TASK [migrationcontroller : Set up mig ui] *************************************
task path: /opt/ansible/roles/migrationcontroller/tasks/main.yml:158
fatal: [localhost]: FAILED! => {"changed": false, "error": 400, "msg": "Failed to create object: Could not determine if we should add owner ref\n", "reason": "Bad Request", "status": 400}

PLAY RECAP *********************************************************************
localhost : ok=21 changed=0 unreachable=0 failed=1

$ oc get migrationcontrollers.migration.openshift.io migration-controller -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigrationController
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"migration.openshift.io/v1alpha1","kind":"MigrationController","metadata":{"annotations":{},"name":"migration-controller","namespace":"mig"},"spec":{"cluster_name":"host","migration_controller":true,"migration_ui":true,"migration_velero":true}}
creationTimestamp: "2019-07-10T16:06:15Z"
generation: 1
name: migration-controller
namespace: mig
resourceVersion: "88844"
selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/mig/migrationcontrollers/migration-controller
uid: a546e88e-a32c-11e9-924b-063ca044fe88
spec:
cluster_name: host
migration_controller: true
migration_ui: true
migration_velero: true
status:
conditions:

ansibleResult:
changed: 0
completion: 2019-07-10T17:04:50.319287
failures: 1
ok: 21
skipped: 5
lastTransitionTime: "2019-07-10T17:04:50Z"
message: |
Failed to create object: Could not determine if we should add owner ref
reason: Failed
status: "False"
type: Failure
lastTransitionTime: "2019-07-10T17:04:50Z"
message: Running reconciliation
reason: Running
status: "True"
type: Running

$ oc get oauthclient migration -o yaml
apiVersion: oauth.openshift.io/v1
grantMethod: auto
kind: OAuthClient
metadata:
annotations:
operator-sdk/primary-resource: mig/migration-controller
operator-sdk/primary-resource-type: MigrationController.migration.openshift.io
creationTimestamp: "2019-07-10T16:06:33Z"
name: migration
resourceVersion: "74538"
selfLink: /apis/oauth.openshift.io/v1/oauthclients/migration
uid: b008a0a1-a32c-11e9-a782-0a580a800018
redirectURIs:

https://migration-mig.apps.cluster-jwm0710ocp4a.jwm0710ocp4a.mg.dog8code.com/login/callback
secret: NmVkMjYwZTUtODg4MS01NTdjLTgzYzMtZGFiMGM1ZGQxZDI5

$ oc get all -n mig
NAME READY STATUS RESTARTS AGE
pod/controller-manager-79bf7cd7d-mwsgs 1/1 Running 0 54m
pod/migration-operator-bfbc7f445-nk4gz 2/2 Running 0 55m
pod/restic-fg8v4 1/1 Running 0 54m
pod/restic-p24mt 1/1 Running 0 54m
pod/restic-t92l4 1/1 Running 0 54m
pod/velero-5dcfc7fcb7-f9llv 1/1 Running 0 54m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/controller-manager-service ClusterIP 172.30.87.249 443/TCP 54m
service/migration-operator ClusterIP 172.30.192.72 8383/TCP 55m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/restic 3 3 3 3 3 54m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/controller-manager 1/1 1 1 54m
deployment.apps/migration-operator 1/1 1 1 55m
deployment.apps/velero 1/1 1 1 54m

NAME DESIRED CURRENT READY AGE
replicaset.apps/controller-manager-79bf7cd7d 1 1 1 54m
replicaset.apps/migration-operator-bfbc7f445 1 1 1 55m
replicaset.apps/velero-5dcfc7fcb7 1 1 1 54m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/migration migration-mig.apps.cluster-jwm0710ocp4a.jwm0710ocp4a.mg.dog8code.com migration-ui port-9000 edge None

GVR list permissions are missing from the migration-controller ClusterRole

Issue seen with deploying to OCP 4.4

@Danil-Grigorev saw the below deploying operator to a OCP 4.4 nightly

TASK [migrationcontroller : Set up velero controller] ************************** task path: /opt/ansible/roles/migrationcontroller/tasks/main.yml:147 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to find exact match for apps/v1beta1.Deployment by [kind, name, singularName, shortNames]"}

See: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.9.md#apps

For background the decision was to use the older apps/v1beta1 so our operator would work with OCP 3.7 - OCP 4.3.

Now that OCP 4.4 is out, looks like the apps/v1beta1 Deployment was removed, so probably need to have 2 logic paths that will work with older clusters (OCP 3.7) and newer.

Update corsAllowedOrigin config on 4.2 side to use new workflow

Notice below is only available in 4.2. We may want to consider keeping current 4.1 unsupported workflow as well as the 4.2 workflow so we can work on both.

openshift/openshift-docs#15925

Use migration-controller SA instead of mig SA for remote connection

We just ran into a confusing issue where GVR was working when the controller was running on the destination 4 cluster but no the source 3 cluster.

The reason is that the controller when running on the destination is making calls against the source it is using the mig SA token which is more privileged (basically full cluster-admin).

When the controller is running on the source cluster making queries against the source it's using the migration-controller SA.

The proposal is, when time permits, to make an update that creates the migration-controller SA regardless of whether the migration-controller is installed, drop the mig SA, and instruct users to use this SA token for the controller remote connection so the permissions are identical on both sides

Documentation inacurate

The documentation for the installation on OCP3 says

"controller-3.yml and controller-4.yml in the deploy/ocp3-operator/latest, stable, and v1.0.0 directories contain the recommended settings for OCP 3 and 4 respectively."

The path is incorrect, it's "deploy/nom-olm/latest, stable and v1.0.0"

Use deployment instead of deploymentconfig and stateful set

Right now we create a deployment (velero), daemonset (restic), deploymentconfig (ui), and statefulset (controller). Unless we have a need, as in the case of restic, we should be consistent with the resources types we create.

After talking to Fabian the only benefit of a stateful set is PVC templating. As the controller doesn't use a PVC we can move this to a deployment. The primary benefit of a dc instead of a deployment is related to imagestreams so we can use a deployment here too.

Metrics not reporting on latest version of OpenShift within metrics query interface

CAM metrics query is not returning any results. Need to see if the required resources to get hooked into the monitoring system have changed recently.

Specify different image Repos

Hi!!

I am interested in using the mig-operator to perform a migration of some projects between clusters on OCP 3.11, however the client uses internal image repositories and does not allow any connections to quay.

How can we specify the images for velero, restric, migration-ui, controller-manager or any other necessary containers. It should an option on the operator CRD no?

Cheers,
[email protected]

fix disabling operator hub default sources in openshift 4.1

The process is different in 4.1 and 4.2+. The role should be fixed to handle either.

Expose resource requests for NooBaa

The default resources requested by NooBaa cause the noobaa-core container to be stuck in pending on a default 3 worker cluster. This can be adjusted like so:

spec:
  dbResources:
    requests:
      cpu: 0.5
  coreResources:
    requests:
      cpu: 0.5

Issue with OAuthClient, seeing a redirectURL of 'localhost'

{"error":"invalid_request","error_description":"The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

$ oc logs oauth-openshift-5c55549456-sfcfb -n openshift-authentication
...
...
E0724 22:55:06.635251 1 osinserver.go:91] internal error: urls cannot be blank.
I0724 22:56:48.900788 1 log.go:172] http: TLS handshake error from 10.128.2.7:45840: remote error: tls: unknown certificate
I0724 22:56:48.901197 1 log.go:172] http: TLS handshake error from 10.128.2.7:45842: remote error: tls: unknown certificate
E0724 22:57:18.294481 1 osinserver.go:91] internal error: urls cannot be blank.
I0724 22:57:18.519711 1 log.go:172] http: TLS handshake error from 10.128.2.7:46152: remote error: tls: unknown certificate


$ oc get oauthclient migration -o yaml
apiVersion: oauth.openshift.io/v1
grantMethod: auto
kind: OAuthClient
metadata:
  annotations:
    operator-sdk/primary-resource: mig/migration-controller
    operator-sdk/primary-resource-type: MigrationController.migration.openshift.io
  creationTimestamp: "2019-07-24T22:24:47Z"
  name: migration
  resourceVersion: "2728968"
  selfLink: /apis/oauth.openshift.io/v1/oauthclients/migration
  uid: d859f4b2-ae61-11e9-9185-0a580a800017
redirectURIs:
- https://migration-mig.apps.cluster-jwm0716ocp4a.jwm0716ocp4a.mg.dog8code.com/login/callback
- https://localhost
secret: ODI3OWVhMWItZjMzZi01NWMwLWJiNWEtYWMyZDg3MmYzYjgw

non-olm latest is consistently inaccurate and forgotten, how can ensure it stays up to date with the versioned operator.ymls

I had a previous PR that proposed turning the PR into a symlink to the latest version'd operator.yml file, but I don't think that's sufficient because there is something fundamentally different between deploying latest (I want the latest images out of quay that have master), rather than deploying images built from the release tags.

I'm wondering if we could write a script instead that takes the version as an argument and runs some set command to swap out the release tags for "latest" so it actually grabs the latest images built from master.

Open to some thoughts here, I'm going to experiment a bit and maybe submit a proposal PR.

[non-admin] Add MigrationTenant CRD

With the non-admin work, CAM will introduce a new CRD MigrationSandbox which will imply a namespace-scoped installation of CAM in the current namespace. The associated ansible role that is executed will call the migrationcontroller role with sandbox: true. This will allow for all of the MigrationController configuration fields to be exposed on the MigrationSandbox CR as well.

The MigrationSandbox CR creation should trigger a deployment of a new migration-controller with SANDBOX_NAMESPACE set to wherever the MigrationSandbox CR was created.

missing rolebinding

the controller-manager pod had a lot of
E0827 13:55:00.314547 1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:openshift-migration-operator:migration-controller" cannot list resource "deployments" in API group "apps" at the cluster scope

errors

I had to add the rolebinding also for migration-controller SA

Rename CRD files to match output names from controller-runtime generator.go

Filed operator-framework/community-operators#1464 to double check this won't be a problem when submitting the operator to community operators.

OLM/operator-courier don't see to enforce the file names.
operator-framework/community-operators#1464

Migration UI Login redirect not working when using custom CA in the default ingress router

After adding a custom self-signed defaultcertificate following in the default ingress router

The web UI fails to login with the follwoing error:

{"error":"invalid_request","error_description":"The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

The oauth server pods shows this message in the log:

http: TLS handshake error from <ingress_pod_ip>:43440 : remote error: tls: unknown certificate

It seems that the web UI is not using the bundle from /etc/ssl/certs to validate the certificate against the ingress router

Point to latest non-olm .yml file for operator.yml in docs/usage/1.md

In section "1.2 Setup with mig-operator"
where it states A manifest for deploying mig-operator to OpenShift 3.x and 4.x has been made available as operator.yml
the operator.yaml is a link to https://raw.githubusercontent.com/fusor/mig-operator/master/deploy/non-olm/v1.0.0/operator.yml
This should be changed to reflect https://raw.githubusercontent.com/fusor/mig-operator/master/deploy/non-olm/v1.1.0/operator.yml
version 1.0.0 of the operator.yml has issues with deploying the MigController in section 1.3

Improve Snapshot documentation with "Why"

Avital brought some good feedback regarding the snapshot documentation, requesting that it be improved to include why someone would choose snapshots over filesystem copy. We should also explain the pros/cons of each.

OperatorHub console and Status/Version saying Unknown

After installing our Operator the Status/Version fields say Unknown.
Assume we need to update our CR to have the UI set the values we want.

Example:

Below may be of help in updating the status of the CR
https://github.com/operator-framework/operator-sdk/blob/master/doc/ansible/dev/developer_guide.md#custom-resource-status-management

latest: Potential RBAC issue with latest controller

I'm seeing these RBAC issues when migrating, on phase ResticRestarted affecting daemonsets :

{"level":"info","ts":1585070675.613397,"logger":"migration|zpr5t","msg":"[RUN]","migration":"mssql-app-mig-1585069805-stage-phase","stage":true,"phase":"ResticRestarted"}
{"level":"error","ts":1585070676.4000652,"logger":"migration|zpr5t","msg":"","migration":"mssql-app-mig-1585069805-stage-phase","error":"daemonsets.extensions \"restic\" is forbidden: User \"system:serviceaccount:openshift-migration:migration-controller\" cannot get daemonsets.extensions in the namespace \"openshift-migration\": no RBAC policy matched","stacktrace":"github.com/konveyor/mig-controller/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/konveyor/mig-controller/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/konveyor/mig-controller/pkg/logging.Logger.Error\n\t/go/src/github.com/konveyor/mig-controller/pkg/logging/logger.go:75\ngithub.com/konveyor/mig-controller/pkg/logging.Logger.Trace\n\t/go/src/github.com/konveyor/mig-controller/pkg/logging/logger.go:81\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*Task).haveResticPodsStarted\n\t/go/src/github.com/konveyor/mig-controller/pkg/controller/migmigration/pod.go:92\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*Task).Run\n\t/go/src/github.com/konveyor/mig-controller/pkg/controller/migmigration/task.go:282\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*ReconcileMigMigration).migrate\n\t/go/src/github.com/konveyor/mig-controller/pkg/controller/migmigration/migrate.go:78\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*ReconcileMigMigration).Reconcile\n\t/go/src/github.com/konveyor/mig-controller/pkg/controller/migmigration/migmigration_controller.go:195\ngithub.com/konveyor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/konveyor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/konveyor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/konveyor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/konveyor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/konveyor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/konveyor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/konveyor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/konveyor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/konveyor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1585070683.5990846,"logger":"migration|nljtb","msg":"[RUN]","migration":"mssql-app-mig-1585069805-stage-phase","stage":true,"phase":"ResticRestarted"}

Not affecting release-1.1 or release-1.0 branches as far as I can tell... Operator IMG is :

Containers:
  ansible:
    Container ID:  cri-o://c68214624a1682347d5c6e0b3c52f417b87f4c0467e44357fef22b7fb458d6cb
    Image:         quay.io/konveyor/mig-operator-container:latest
    Image ID:      quay.io/konveyor/mig-operator-container@sha256:65710f8af49fbe456e405f1effc3d2e2172a9306776df87ef619dd6061f265d6

Manual Approval UI for OLM does not work properly

This may not work or may require steps we don't know. Need to investigate.

mig-operator does not restore 'host' MigCluster if deleted

It seems that mig-operator creates a host MigCluster on clusters where you create a MigrationController CR set up to deploy mig-controller.

This host MigCluster does not get restored if accidentally deleted, which uncovered an issue with mig-ui here: migtools/mig-ui#384

I think it would be desirable to have mig-operator restore the components it deployed if they are accidentally removed.

latest: broken with permission issues OCP3

I was not sure operator is the correct repo for this one, using latest images when migrating demo apps from OCP 3.11 to 4.2-nightly we hit perm issues, the controller logs show the following :

{"level":"info","ts":1567207920.9164975,"logger":"cluster|g4d9f","msg":"[rWatch] Starting manager"}
{"level":"info","ts":1567207920.9165099,"logger":"cluster|g4d9f","msg":"[rWatch] Manager started"}
{"level":"info","ts":1567207920.9165144,"logger":"cluster|g4d9f","msg":"Remote watch started.","cluster":"migcluster-remote"}
{"level":"info","ts":1567207922.3167105,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"remotewatcher-controller"}
{"level":"info","ts":1567207922.4169009,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"remotewatcher-controller","worker count":1}
{"level":"info","ts":1567208064.1786604,"logger":"migration|wqk4v","msg":"Migration [RUN]","name":"sock-shop-mig-1567207912","stage":false,"phase":""}
{"level":"info","ts":1567208064.186142,"logger":"migration|85dp4","msg":"Migration [RUN]","name":"sock-shop-mig-1567207912","stage":false,"phase":"Started"}
{"level":"info","ts":1567208064.282569,"logger":"migration|9dnbl","msg":"Migration [RUN]","name":"sock-shop-mig-1567207912","stage":false,"phase":"Prepare"}
E0830 23:34:25.108110       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot list resource "serviceaccounts" in API group "" at the cluster scope
E0830 23:34:26.109746       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot list resource "serviceaccounts" in API group "" at the cluster scope
E0830 23:34:27.110940       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot list resource "serviceaccounts" in API group "" at the cluster scope
E0830 23:34:28.112164       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot list resource "serviceaccounts" in API group "" at the cluster scope
E0830 23:34:29.113389       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot list resource "serviceaccounts" in API group "" at the cluster scope
E0830 23:34:30.114438       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot list resource "serviceaccounts" in API group "" at the cluster scope

I tried also checking out PR : #62 to test but the issue persists, I could not reproduce this with stable tag CI runs. Something to note is that when the controller gets into this state it will not migrate subsequent migrations, I even tried destroy the controller pod and basically came back and resumed logging these error messages.
The phase affected is "Prepare" , as seen below :

[fbladilo@fbladilo:~/git-projects/mig-ci] (controller_destroy_rolebindings *%)$ oc -n openshift-migration describe migmigration
Name:         sock-shop-mig-1567207912
Namespace:    openshift-migration
Labels:       controller-tools.k8s.io=1.0
Annotations:  touch=f6e2939c-2b3e-4c50-ae7e-e9dda437990c
API Version:  migration.openshift.io/v1alpha1
Kind:         MigMigration
Metadata:
  Creation Timestamp:  2019-08-30T23:34:24Z
  Generation:          3
  Owner References:
    API Version:     migration.openshift.io/v1alpha1
    Kind:            MigPlan
    Name:            sock-shop-migplan-1567207912
    UID:             6408167d-cb7e-11e9-bb43-02b963518610
  Resource Version:  175264
  Self Link:         /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/sock-shop-mig-1567207912
  UID:               b30bcd39-cb7e-11e9-bb43-02b963518610
Spec:
  Mig Plan Ref:
    Name:        sock-shop-migplan-1567207912
    Namespace:   openshift-migration
  Quiesce Pods:  true
  Stage:         false
Status:
  Conditions:
    Category:              Advisory
    Last Transition Time:  2019-08-30T23:34:24Z
    Message:               Step: 3/27
    Reason:                Prepare
    Status:                True
    Type:                  Running
    Category:              Required
    Last Transition Time:  2019-08-30T23:34:24Z
    Message:               The migration is ready.
    Status:                True
    Type:                  Ready
  Phase:                   Prepare
  Start Timestamp:         2019-08-30T23:34:24Z
Events:                    <none>

To reproduce :

Deploy using latest operator/controller images on OCP 3.11 and 4.x (we run controllers on 4.x)
Deploy sock-shop app from https://github.com/fusor/mig-demo-apps or https://github.com/fusor/mig-e2e on OCP3
Attempt a migration to 4.x cluster

UPDATE :
This issue affects all OCP 3.x supported releases migrating to 4.x

Updating velero_image doesn't bounce restic pods

When the MigrationController is updated with a new velero_image, the operator updates the velero deployment and the restic daemonset. The deployment update causes openshift to bounce the velero pod. The daemonset update does not cause the restic pods to bounce. The pods need to be deleted when updating the restic daemonset so that they will be restarted with the right configuration.

Below is what the error looks like


$ oc get all -n mysql-persistent
NAME                      READY   STATUS    RESTARTS   AGE
pod/mysql-1-x26sf-stage   0/1     Pending   0          24m

$ oc get pod mysql-1-x26sf-stage -o yaml -n mysql-persistent
apiVersion: v1
kind: Pod
metadata:
  annotations:
    backup.velero.io/backup-volumes: mysql-data
    openshift.io/backup-registry-hostname: 172.30.197.242:5000
    openshift.io/backup-server-version: "1.11"
    openshift.io/restore-registry-hostname: image-registry.openshift-image-registry.svc:5000
    openshift.io/restore-server-version: "1.13"
    openshift.io/scc: node-exporter
    snapshot.velero.io/mysql-data: 2fee5f84
  creationTimestamp: "2019-07-10T22:10:32Z"
  labels:
    migmigration: 54ab6430-a35f-11e9-8773-063ca044fe88
    migration-included-stage-backup: 54ab6430-a35f-11e9-8773-063ca044fe88
    migration-stage-pod: "true"
    velero.io/backup-name: 54985200-a35f-11e9-a81b-a514154a7515-bnm4z
    velero.io/restore-name: 54985200-a35f-11e9-a81b-a514154a7515-67fdv
  name: mysql-1-x26sf-stage
  namespace: mysql-persistent
  resourceVersion: "171533"
  selfLink: /api/v1/namespaces/mysql-persistent/pods/mysql-1-x26sf-stage
  uid: 89051849-a35f-11e9-88f3-0ac4ab43e1ee
spec:
  containers:
  - args:
    - infinity
    command:
    - sleep
    image: registry.access.redhat.com/rhel7
    imagePullPolicy: Always
    name: sleep-0
    resources: {}
    securityContext:
      capabilities:
        drop:
        - MKNOD
        - SYS_CHROOT
      procMount: Default
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mysql/data
      name: mysql-data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-86p2l
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-g8d52
  initContainers:
  - args:
    - 888358ef-a35f-11e9-88f3-0ac4ab43e1ee
    env:
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    image: gcr.io/heptio-images/velero-restic-restore-helper:latest
    imagePullPolicy: Always
    name: restic-wait
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /restores/mysql-data
      name: mysql-data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-86p2l
      readOnly: true
  nodeSelector:
    node-role.kubernetes.io/compute: "true"
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seLinuxOptions:
      level: s0:c8,c2
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: mysql-data
    persistentVolumeClaim:
      claimName: mysql
  - name: default-token-86p2l
    secret:
      defaultMode: 420
      secretName: default-token-86p2l
status:
  conditions:
  - lastProbeTime: "2019-07-10T22:35:48Z"
    lastTransitionTime: "2019-07-10T22:10:32Z"
    message: '0/6 nodes are available: 6 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending

migtools / mig-operator Goto Github PK

mig-operator's Introduction

Crane Operator

Demos

Contents

Development

Crane Operator Installation

OpenShift 4

Installing released versions

Installing latest

OpenShift 3

Installing released versions

Crane Operator Upgrades

Component Installation and Configuration

Installation Topology

Customizing your Installation

Installing Crane Components

OpenShift 4

OpenShift 3

Additional Settings

Restic Timeout

Migration Limits

Migration Example Using API

CORS (Cross-Origin Resource Sharing) Configuration

Removing Crane Operator

Resources Migrated By Crane Operator

Direct Migration Requirements

mig-operator's People

Contributors

Stargazers

Watchers

Forkers

mig-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org