Outage Preparation

CentOS CI Infra Outage Preparation¶

During a scheduled outage where it is likely we will lose network access entirely to the entire rack, or between racks, it is advisable to shutdown the following services:

Duffy
CentOS CI Openshift prod/stg
Legacy CI Jenkins
Legacy OKD
keepalived on gateway02.ci.centos.org

Legacy OKD¶

bstinson, as the only person on the team which has access to the legacy OKD cluster, must handle tasks related to this cluster.

OCP¶

https://github.com/centosci/ocp4-docs/blob/master/sops/create_etcd_backup.md https://github.com/centosci/ocp4-docs/blob/master/sops/cordoning_nodes_and_draining_pods.md https://github.com/centosci/ocp4-docs/blob/master/sops/graceful_shutdown_ocp_cluster.md

Admin nodes Prod: ocp-admin.ci.centos.org Stg: n4-136.cloud.ci.centos.org

Take etcd backup to the admin node associated with prod/stg
Cordon and drain all nodes
gracefully shutdown

Duffy¶

switch off duffy - workers
- source duffy2-venv/bin/activate; FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf python scripts/worker.py
switch off duffy server
- FLASK_APP=duffy DUFFY_SETTINGS=/etc/duffy.conf flask run -h 0.0.0.0 -p 8080
ci.centos.org legacy jenkins: manage jenkins, prepare for shutdown
- ssh jenkins - systemctl restart jenkins

keepalived on Gateway nodes¶

Shutdown keepalived on gateway02.ci.centos.org
sudo systemctl stop keepalived