Hardware Card for Thors
Short description about the hardware.
Specifications
What's inside the hardware, probably depending on what we're talking about.
- Disk controller model. Linux driver needed?
- Network card model
- Space that takes in the rack (U's)
- Max CPUs, Max Disks, Max Mem slots.
- CDROM?
- Dual power supply?
- Front / Back picture (maybe available online?)
- Any other?
Power consumption (measured before and during the CPU/Disk tests)
Ambiental details
- External working temperature
- Normal CPU/memory internal temperature
- Air flow (cubic meters per hour)
- Noise (dB)
Operations
Interesting information like how to handle it, anything interesting which is not trivial.
How to get into the ILOM
Power up/down procedures
Commands to issue in an internal console
Firmware updates
Replacement of internal components
Replacing a disk
When you need to change a disk, you need to identify:
- The device name. You probably already have it.
- The raid it belongs to:
cat /proc/mdstat
- The physical location of the disk
hd
. Print it (beware it may be different depending on the machine)
Check that the disk is no longer part of the raid (in /proc/mdstat), and go and replace the disk (can be done while the machine is up). Afterwards:
- Check with
dmesg
that the new disk is detected, and the firmware revision is the same as the other disks
- Partition the new disk, with a replica from a working disk. Eg:
sfdisk -d /dev/sda| sfdisk --force /dev/sdah
- Add the disk to the raid it belonged to:
mdadm --add /dev/md0 /dev/sdah1
Check the disk is visible (as spare) on the raid. Then:
- Backup the old file
cp /etc/mdadm.conf /etc/mdadm.conf.backup
- IMPORTANT: Regenerate the file with the new disk:
sh /root/makemdconf.sh
Installation notes
Instructions on how to set up a new machines that arrives, with things like:
BIOS settings
RAID configuration
Drivers required / kernel compatibility
Check firmware homogeneity with other machines in the cluster
Benchmarks
Information about benchmarks performed in the machine.
Disk benchmarks (i.e. Bonnie++)
Example:
CPU benchmarks (i.e. HepSpec06)
Network benchmarks (i.e. Iperf)
Monitoring
Instructions about monitoring the hardware
Power Consumption
Raid Sanity
Other?
Manuals
External links to manuals
Issues
Information about issues found with this hardware, and how to deal with them
Issue1
Issue2