Tags: view all tags

Requirements for new Configuration Management System

Reasons for changing (or rethinking our setup)

Our CFEngine is getting old. A new version is out, and it's not compatible with the old one. We are not receiving bug-fixes, nor new features for the current one.
We have started loosing control of what's inside. The complexity is too high, and it acts like another sysadmin that does things that we don't know about. Every change we make takes more time and is more risky. It's a big unknown monster we have learned to play with.
It needs a cleanup. There are things inside that are not used anymore. There are also things that began very simple but became too complex for that simplicity and have to be re-written. In particular, the currently defined classes are not consistent to each other (eg. seperate arc01/arc02 classes, but common LCG_CE).
We need to use more intelligently the system of classes, perhaps making meta-groups of them, for clarity, taking into account both a pre-production system or a cluster-split, or hardware classes (common use-case during multiple project phases). And re-writing the classes requires almost a complete cfengine re-write.
Currently there is a separation between hardware and software configuration: cfengine manages software and newmachine script manages hardware. This is undesired, and it's even not very well done. We need a system where everything is in its place; newmachine script should bring a system in a state able to run cfengine under network and ONLY THAT.
We want to have a mechanism that allows for repeatable installations (to the bit if possible). This invites for local repository mirrors for OS & gLite, STABLE & UNSTABLE.

Before choosing the right CM, we need to define what do we need from it to be able to select some candidates.

Classes redefinition

Here is a proposal of how classes should look like.

Each machine would have it's own custom, individual features, like MACs, IPs... (defined within CM).
It should include one (and just one) class from each group, from all groups (this would also allow for better "dsh groups" with common names)
Each group has disjoint class members.

Machine type

CE (LRMS, LCG, CREAM, ARC, ARGUS)
SE (CORE, POOL)
NFS
LUSTRE (MDS, OSS)
WN
XENHOST
BDII (TOP, SITE)

Cluster

PROD1, PROD2 *[1]
PPS1
PPS2

(note [1]: maybe, when we make a change, we want to make it in the 'pre' tree, and develop a script to commit/merge the changes to the 'prod' tree when we think the change is finished)

Hardware

SUN_X4170
IBM_1234
XEN

Container (optional?)

Rack3
Xen01

Capabilities

Classes don't do all the stuff, at the end the CM has to configure real services. The idea is that we don't have to maintain the same code/config on two different places, so if WNs and CEs have to mount Lustre, we don't write the piece of code to configure a Lustre client twice. Also, to configure Lustre (to follow this example) the code has to be class-aware to configure it the right way, for example:

Lustre is different on Xen guests (runs over TCP)
And there can be more than one lustre instance, for example: pre-production and production, and if we split the MDSs/MDT, there are two production clusters.
Lustre is different for clients and servers, so maybe there should be two different capabilities (or config-items, however we call it): lustre-server and lustre-client, and both are independent.

We should create a list of configuration items, or capabilities, that we may need to configure. We should place all possible things to be able to choose a tool that can handle all kind of situations:

NETWORK

IP(s), hostname(s), subnet(s), hosts, routes, ...

SECURITY

known_hosts, authorized_keys, sshd_config, iptables, ...

FILESYSTEMS

nfs exports, nfs mounts, xfs format, ext3 format, lvm, fstab, lustre-client, lustre-server, ...

VIRTUALIZATION

xen_host_filesystem, xen_guest_config, ...

DCACHE

dcache_headnode, make_pool, ...

... ETCETERA!

Requirements inside a machine setup

Installation

Ethernet MAC assignment (in case of VM)
Hostname / public IP
10.10 IP assignment
DHCP
PXE boot
Kickstart file (with disk drivers for OS, and initial partition table)
Installation and setup of Cluster Configuration Management system (CCM)

Up to this point the machine is installed with the 10.10 network only, with no internet access (maybe only to download software, maybe not)

Either Installation or CCM

The following things can be either done from the bootstrap or later. Preferabily done by the CCM.

Drivers for IB card
Assignment of Public IP
Securiry (basic firewall, ssh keys)
Partition / data

CCM only

Repositories
Packages
Service configuration
Yaim

Testing Puppet.

From the different good alternatives that are not image-based (puppet, cfengine3, quattor, cheff) we have decided to try with Puppet, because it seems to fulfill the requirements (save for the installation, that will be done with kickstart), its learning curve is reasonable, and has the biggest community around it.

The idea is to deploy a worker node fully controlled by Puppet, that can be used in production. After ensuring this works well, we could then deploy the rest of the worker nodes with it, and continue wth the rest of the systems when the time comes.

Preparation of a Worker Node

Here are the steps that need to be followed to achieve this. The idea is to have this finished by the end of September 2012.

The list is just a reference, will surely grow as we start working on it

This work will be first done inside a pre-production environment, with virtual machines. Then we need to move it to real hardware.

Prepare installation server (to be finished by 12th of March)

To host everything needed to kickstart a new basic machine
With a copy of all repositories needed
With puppet running
Dashboard running in VM (with sl6)
Puppetize the KS file generation for each machine
Allow to run puppet master against a local working copy of a user, and clients to fetch it.

Prepare node with HW/OS layer (to be finished by 15th of April - before Phoenix moves)

Puppet client (may not be the same as with KS)
Bring up public network (VM: ethernet only) keeping the 10.10 active.
resolv.conf
/etc/hosts
Time sync with public ntp servers
SSH setup
- Unique/shared known-hosts file
- password-less configuration (and possibly hostbased, for WNs only?)
- fully-controlled authorized-keys (removing config removes from file)
Iptables (with the ability to add rules from other components)
Mail agent
Logrotate
Syslog
Repository setup and installation of packages (with fixed/reproducible rpm versions)
Implement system to replicate RPMs within different machines (ideas: snapshotted repo, rpm_clone, or puppet RPM lists)
Shared scripts in /opt/cscs

Prepare the upper layers: Mount points, Monitoring, Middleware. (to be finished by 1st of August)

Ganglia
Nagios
NFS mounts
GPFS client
Pbs_mom
Yaim
Glexec
Sudoers

Prepare physical machine (to be finished by 15th of September)

Disk Partitioning (this complicates the installation process)
Hardware and software RAID
Infiniband
Tuning sysctl.conf
Benchmarking (optional?)

Node Classification

I have found the way Puppet has to do node inheritance is not really flexible, and can lead to errors, because the way it works is not intuitive (you think a variable or a class or a tag is being declared, but that truth applies only to some parts of the manifests). It turns out that Puppet has almost ten different ways to deal with this (http://puppetlabs.com/blog/the-problem-with-separating-data-from-puppet-code/

) but I discovered this can go down to three:

Use builtin puppet capabilities, with node inheritance, using parametrized classes when needed. This may be suitable for small or simple, but it presents the problems defined above
Use an External Node Classifier, possibly the one included in the Puppet Dashboard. This has some strong points, since it's easy to use from the Dashboard, but it's not yet ready for using arrays and hashes(http://groups.google.com/group/puppet-users/browse_thread/thread/c0ec096d7daabbaf), and it depends on a database, which complicates using stages (users running a copy of the classifier, modifying it, testing, and merging with the production one). The Foreman has another classifier of this kind (very similar to the Puppet Dashboard) and you could build one yourself. http://docs.puppetlabs.com/guides/external_nodes.html
Use Hiera as an ENC.

http://www.devco.net/archives/2011/06/11/puppet_backend_for_hiera_part_2.php

http://puppetlabs.com/blog/first-look-installing-and-using-hiera/

https://github.com/puppetlabs/hiera-puppet/tree/master/example

http://www.mail-archive.com/puppet-users@googlegroups.com/msg28134.html

References

Puppet function/configuration/type references:

Examples from other sites:

For production, check:

http://projects.puppetlabs.com/projects/1/wiki/Using_Stored_Configuration

For making puppet remove the configuration when you remove a class, check this out (it's called the Truth Enforcer): https://github.com/jordansissel/puppet-examples/tree/master/nodeless-puppet

-- PabloFernandez - 2010-07-23