Unable to redeploy the certificates post-expiry in openshift 3.11

Question

I have deployed openshift(okd) 3.11 using : https://github.com/openshift/openshift-ansible/tree/release-3.11 I would want to produce a scenario where certificates expire and test how the renewal certificates can be done.

Hence I have set following variables in the inventory as 1 day(so that certificates expire quickly):

openshift_hosted_registry_cert_expire_days=1
openshift_ca_cert_expire_days=1
openshift_master_cert_expire_days=1
etcd_ca_default_days=1

As expected after 1 day the oc commands where not working and master-api, master-etcd pods where in exited state. Now i wanted to renew all the certificates hence i have run the re-deploy certificate play referring to https://docs.openshift.com/container-platform/3.11/install_config/redeploying_certificates.html#redeploying-all-certificates-current-ca

ansible-playbook -i openshift-ansible/playbooks/inventory.ini openshift-ansible/playbooks/redeploy-certificates.yml

But the this ansible play gets aborted with error:

.
.
.
.
TASK [Wait for master to restart] **********************************************************************************************************
skipping: [master.167.254.204.228.nip.io]

TASK [Wait for master API to come back online] *********************************************************************************************
skipping: [master.167.254.204.228.nip.io]

TASK [openshift_control_plane : restart master] ********************************************************************************************
changed: [master.167.254.204.228.nip.io] => (item=api)
changed: [master.167.254.204.228.nip.io] => (item=controllers)

RUNNING HANDLER [openshift_control_plane : verify API server] ******************************************************************************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
.
.
.
FAILED - RETRYING: verify API server (2 retries left).
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.167.254.204.228.nip.io]: FAILED! => {
    "attempts": 120,
    "changed": false,
    "cmd": [
        "curl",
        "--silent",
        "--tlsv1.2",
        "--max-time",
        "2",
        "--cacert",
        "/etc/origin/master/ca-bundle.crt",
        "https://master.167.254.204.228.nip.io:8443/healthz/ready"
    ],
    "delta": "0:00:00.012426",
    "end": "2020-11-29 22:56:24.445762",
    "rc": 7,
    "start": "2020-11-29 22:56:24.433336"
}

MSG:

non-zero return code


RUNNING HANDLER [openshift_control_plane : verify Local API server] ************************************************************************

Please let me know if im missing out anything while re-deploying certificates or any alternate way where we can renew these certificates.

Update

I have also tried the redeploy-openshift-ca.yml playbook with -e openshift_redeploy_openshift_ca=true:

ansible-playbook -i openshift-ansible/playbooks/inventory.ini openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml -e openshift_redeploy_openshift_ca=true

But this play too fails at the same task as earlier where it is waiting for master-api to be running.

The master-api docker logs shows:

.
.
I1202 18:02:55.930375       1 plugins.go:84] Registered admission plugin "SecurityContextDeny"
I1202 18:02:55.930387       1 plugins.go:84] Registered admission plugin "ServiceAccount"
I1202 18:02:55.930396       1 plugins.go:84] Registered admission plugin "DefaultStorageClass"
I1202 18:02:55.930408       1 plugins.go:84] Registered admission plugin "PersistentVolumeClaimResize"
I1202 18:02:55.930418       1 plugins.go:84] Registered admission plugin "StorageObjectInUseProtection"
F1202 18:03:25.933354       1 start_api.go:68] dial tcp 167.254.204.228:2379: connect: connection refused

The etcd docker logs shows:

2020-12-02 18:05:14.459240 I | embed: ready to serve client requests
2020-12-02 18:05:14.459730 I | embed: serving client requests on 167.254.204.228:2379
WARNING: 2020/12/02 18:05:14 Failed to dial 167.254.204.228:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

I think for the first output the issue is that your CA is also expired, thus redeploying all certificates will not resolve the issue. In the second output you are not executing the same playbook. What is the result when you execute the `redeploy-certificates.yml` playbook with `-e openshift_redeploy_openshift_ca=true`? — Simon, Dec 03 '20 at 18:42
I have tried now as you suggested, but the result is same, its failing at same point i.e, it is getting aborted when master-api fails to start — Rakesh Kotian, Dec 03 '20 at 20:05

Unable to redeploy the certificates post-expiry in openshift 3.11

Update

0 Answers0

Linked