Bare-metal host deploy operation
Bare-metal host deploy operation¶
This process can be used to add a new bare-metal node in the CentOS Infra/inventory.
It can be hosted within the Community Cage
(Red Hat) DC, or dedicated/hosted server hosted by a CentOS sponsor
DataCenter we control (Red Hat DC)¶
Through internal ticket with PNT/DevOps we ensure that machine/chassis is racked, and documented. We also add it in the Internal Inventory, and start also "reserving" IP addresses needed for IPMI/iDrac/mgmt vlan interface and also for Operating System.
We also have to create probably another ticket on internal portal to ensure that ToR switches (that we don't have control on) would have ports configured correctly (enabled, set to correct VLAN PVID, etc)
Hardware initialization¶
There is a very small ip range in the mgmt vlan available for new nodes that would be connected. So on the internal dhcpd node (see in inventory which server is current for the boot-server
ansible role), you can always verify/see if new machine is leased an ip from the oob/management vlan.
Once we have dial tone
on the hardware side (oob/mgmt vlan), we need to ensure that we :
- change default credentials with randomly generated one
- configure alerting for hardware issues
- setup correctly raid array if we have a hardware raid controller
Preparing PXE/UEFI boot env¶
If we want ansible to automatically deploy it, we'll just have to add the node in the inventory and ensure that the
- following variables set :
- ipmi_ip
,
ipmi_user,
ipmi_pass` : used to remotely pxe boot the node ip
,gateway
,netmask
anddns
(usually apart fromip
, which is unique, the rest is coming through inheritance
- ipmi_ip
- based on group inheritance, ensure that variables documented in adhoc-provision-node.yml are also defined
Note
We can deploy both CentOS and RHEL so if you define rhel_version
it will be deploying RHEL but otherwise it will default to CentOS and centos_version
, which is normally 8-stream for now
Deploying the machine¶
If previous steps are done and also network switch port[s] working, we can just now proceed with ansible :
ansible-playbook-prod playbooks/adhoc-provision-node.yml
[WARNING] Nodes to be fully wiped/reinstalled with CentOS => : <my_new_node[s>
In a summary that playbook will (through delegate_to
ansible tasks) :
- prepare the kickstart needed for the host to be deployed (jinja2 template)
- prepare the pxe/tftp/grub settings to boot from network (on the tftpd node)
- use ipmi to reset the hardware node and force booting over pxe
- wait for sshd to be available on the freshly deployed node
Warning
Attention : this will wipe existing operating system, reason why that playbook is using ansible vars_prompt
to ensure that it's waiting for input that you need to verify. As you can also specify a group of machines to also be deployed but a wrong input would destroy/reinstall existing nodes.
Sponsored machine¶
When we receive a new dedicated server, hosted in another DC that we don't control (no pxe/dhcp), the process usually goes like this :
- through email exchanged with sponsor, we agree on a minimal setup
- we receive initial credentials
- we collect needed informations (like ipv4/ipv6 address[es], dns resolvers, etc)
- we perform remotely (without remote console access) a reinstall on itself (faster then auditing the state in which we receive a machine) that is reinstalled following our standards
- we add node in dns/ansible (see Common section )