Activities Overview from September 2010 to August 2011

Achieved stability and high availability on PhaseC

LCG software is distributed among many different services, that need to be installed in tens of machines. Servers nowadays have multiple cores and bigger amounts of memory and hard disk, and would be a waste to dedicate one server to each service. Therefore, we decided to virtualize many of the middleware pieces. Due to disk latencies on virtual machines, it was decided not to virtualize storage, but other services like:

  • Vobox (one per VO), Mon (for accounting), Ganglia (monitoring), User Interface, and a security bastion host.
Also, a number of very important systems were deployed using different High Availiability mechanisms, also making use of virtualization techniques. This basically means that, even if a problem arises in one server, there should be another that take over the same task:
  • Lcg-CE, the gLite old computing element. Two independent servers, both accepting jobs independently, also dividing the incoming load between both (active/active).
  • Cream-CE, the new gLite computing element. Again, another two independent servers (active/active).
  • Arc-CE, the Nordugrid computing element. Two independent servers (active/active).
  • PBS/Batch server, with the scheduler, both in active/passive mode: if the active fails, the passive becomes active.
  • BDII (information system), with three services in a DNS-balanced fashion. (active/active)
In sum, 18 production servers were deployed using 5 physical hosts.

There were other services not virtualized, in PhaseC, because they deliver high throughput:

  • Central storace, with dCache, consisted of 38 pool nodes (data) and 2 service nodes (metadata and other services)
  • Scratch filesystem, with Lustre, consisted of almost 400 spindles connected to 8 IO servers and 2 metadata servers.
  • NFS servers, for the experiment software, consisted in 2 servers in High Availability (active/passive)

Old Worker Nodes from PhaseB distributed among Swiss universities

PhaseB worker nodes were decommissioned in Summer 2010, replaced by Sun Blades in PhaseC. Storage was not replaced at that time, because they were considered to last one or two more years. This left an important amount of computing resources that could be used somewhere else. Therefore it was decided to split it among the different WLCG Universities in Switzerland: Bern and Geneva.

This has proven to be a very well applauded activity, not to say very useful for the Universities, and we will continue giving them CSCS decommissioned hardware, if they are interested.

Old Worker Nodes from PhaseB distributed among Swiss universities

After October:

  • Deployment of NFS with HA for Experiment’s software releases.
  • Twiki migration to wiki.chipp.ch
  • Maui scheduler replaced by Moab, purchased with support for Torque.
  • New accounting system (from MON to APEL) deployment
  • Security enhancement with ssh keys only, and review of firewall rules.
  • New User Interface with 64 bits and local glite-Nagios
  • Optimization of queue closure time for maintenances
  • Reinstallation of Solaris storage nodes with Linux, for homogeneity.
  • Lustre reformatting to ext4 FS to improve performance.
  • PhaseD hardware and software deployment to meet the pledges in March 2011.
  • GPFS installation as Scratch FS in parallel to Lustre, as a possible replacement.
  • Pre-Production virtual system deployment to be able to test software before production.
  • Lcg-CE old middleware decommission.
  • glExec and Argus deployment for centrally managed blacklisting of users
  • Early Adopters for EMI1 software releases (Cream, Argus, Apel, WN)
  • CHIPP system administrators intruduction to Phoenix activities, for backup in case of an emergency
  • Router in HA mode for Infiniband and Ethernet separaton from the same physical network, to simplify internal networking software stacks.
  • Move to Lugano preparation started
-- PabloFernandez - 2011-09-06
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2011-09-07 - PabloFernandez
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback