Atlas technical discussion Meeting on 2011-04-07

Facts, and problem definition.

  • LHCb was having random timeouts on their software. This was due to excessive use of metadata operations on the scratch filesystem.
  • CSCS investigation on scratch FS usage showed that this was caused mainly by random user usage pattern, and also by Atlas production/pilots doing an excessive amount of background IO operations.
  • Atlas has a different memory usage pattern compared to the rest of the VOs. Some of their jobs use 3 or even 4 GB of RAM, relaying on swap to do the work.
  • If the node starts swapping, it will bring the performance on the node down, affecting other VOs.

Actions taken:

  • On March 29th CSCS confined Atlas into the new PhaseD nodes, with 2k HS06, using GPFS. Two days later, this space was moved

So, we have two problems that would be nice to solve asap. Maybe the best way would be address each separately, and then try to find a combined solution.

Scratch FS problem

Attendants

  • person

Minutes

  • item

Action items

  • item
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2011-04-07 - PabloFernandez
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback