create new tag
view all tags

All News

Major downtime Wed, 26. 5. 2021 to Thu, 27. 5. 2021 -- 20. 04. 2021 DerekFeichtinger
Downtime due to construction work for upgrading the PSI compute center's power and cooling capacities.
New worker nodes t3wn70-73 adding 512 new slots -- 12. 04. 2021 DerekFeichtinger
4 powerful worker node based on AMD EPYC were added to the Tier-3: t3wn[70-73]
New NFS /work area -- 12. 03. 2021 DerekFeichtinger
More performant and backupped /work NFS service. You can read up on the changes in accessing snapshots following the links.
June 05-07 Downtime for SE reconfiguration and upgrade -- 02. 06. 2020 DerekFeichtinger
The SE will be reconfigured to allow additional forms off access to its files. NFS4.1 will allow direct access to numpy data sets for your python based analysis. The downtime will start on Fri 8AM and last until Mon 8AM.
| test of EOS on SLURM Compute Nodes -- 02. 03. 2020 NinaLoktionova
Dear all,

If there are volunteers to try access to eos from batch system, please let me know.

Cheers, Nina


17-18 Feb 2020 T3 DOWNTIME -- 10. 02. 2020 NinaLoktionova
Due to dCache upgrade there is no any services available on 17-18 February.
T3 shutdown on 10-13 January 2020 -- 09. 12. 2019 NinaLoktionova
On January 2020 the annual system test (maintenance day) will be conducted on the entire PSI. The IT services are not or only partially available from Friday, 10 January 2020, 20:00 until Sunday, 12 January 2020, 20:00.
| progress of t3 migration to Slurm -- 08. 11. 2019 NinaLoktionova
Dear all,

Here is a reminder that we are in a process of WNs migration to Slum from SGE.

There is a plan to finish this transfer in December.

For those who still needs SGE t3wn11-29 are available.

For more detailed info please check mailing list

(Sent: Thursday, September 5, 2019 11:28 AM To: cms-tier3-users@lists.psi.ch Subject: [[Cms-tier3-users] ] request to test analysis software on T3 "CC7" Slurm Batch system for all t3 physics groups/users)

| | welcome to test Slurm GPU and CPU nodes -- 10. 05. 2019 NinaLoktionova
Dear T3 Users,

please let us know if you are interested in testing new Batch System and SW on RHEL7:


| | short downtime on Feb 26 from 3 to 4 PM -- 25. 02. 2019 NinaLoktionova
Dear Users of T3,

Please take into account that tomorrow, Feb. 26, all UIs, WNs and SLURM test nodes are not available around 3-4 PM. | | Storage Maintenance on 4 Februaury 2019 -- 01. 02. 2019 NinaLoktionova
There is a planned replacement of NetApp headnode without Tier3 operation interruption. |

Shutdown of Tier3 on 4-6 January 2019 -- 21. 12. 2018 NinaLoktionova
Because of central PSI shutdown, T3 is not available starting from afternoon 4.01 till afternoon 7.01.
| Change of T3 home location -- 13. 12. 2018 NinaLoktionova
All T3 users have to migrate to /t3home/${USER} - a new login and workspace organized on central PSI nfs service.


Migration means re-registration of home location on administrator side (and correspondingly change of $HOME variable.)

Deadline of migration is 20.02.19. |

Upgrade of T3 Storage -- 13. 09. 2018 NinaLoktionova
Since August we've put into production 4 new storage servers as a replacement of old hardware and enhancement of the dcache storage. So that /pnfs space is currently about 1.2PB.
| UI security updates on September 10 at 2 PM -- 07. 09. 2018 NinaLoktionova
On Sept 10, 2 PM, there will be service interruption on t3 user interfaces for ~ 30 minutes because of security updates.


T3 is NOT be available because of dCache upgrade on Apr. 19 -- 09. 04. 2018 NinaLoktionova
the scheduled time for this intervention is between 9:30 am to 19:30 pm
| PSI Systems Test on 6 -7 January 2018 -- 14. 12. 2017 NinaLoktionova
The annual system test will take place on Saturday, 6 and Sunday, 7 January 2018.

Please note that on these days it will not be possible to work at the entire PSI (office/laboratory), since the power supply, the computer network and the cooling supply can be temporarily switched off. | | T3 downtime October 5 at 14:00 -- 02. 10. 2017 NinaLoktionova
The reason is SW security updates and firmware updates on storage HW to fix failure of Controller battery. |

Tier-3 Security Update Downtime On 25.07 At 10:00 -- 13. 07. 2017 NinaLoktionova
Dear all, we expect about 2 hours interruption of services.
Emergency Downtime of T3 due to HW failure -- 08. 06. 2017 DerekFeichtinger
Due to a controller failure on the central home file system storage the T3 has been taken offline on 6. 6. 2017. Must wait for replacement part delivery (expected on 8. 6. 2017).
Shutdown of T3 on Mon Apr 3 due to security update -- 30. 03. 2017 DerekFeichtinger
We are required to perform an urgent security update to some of the T3 hosts, else we risk suspension from the grid services.
snaphot taking of /shome now only on daily basis -- 26. 02. 2017 DerekFeichtinger
shapshots of the home file system /shome will only be taken on daily basis, since hourly snapshots led to frequent problems of users not being able to free space.
Annual power test at PSI on 07/08.01.2017 -- 08. 12. 2016 JoosepPata
PSI will perform a power test on the aforementioned dates, meaning any T3 computing will be unavailable at this time.
/shome reboot on 27.10.2016, 12:30-13:30 -- 26. 10. 2016 FabioMartinelli
/shome reboot on 27.10.2016, 12:30-13:30 in order to upgrade both the RAID FW and the related Linux driver
New T3 UIs -- 12. 10. 2016 FabioMartinelli
Installed t3ui01 (PSI), t3ui02 (ETHZ), t3ui03 (UniZ)
T3 scheduled downtime on Fri 23.09.2016 - 13:30-18:00 -- 19. 09. 2016 FabioMartinelli
In order to upgrade both the /pnfs and the /shome file service the T3 will be in scheduled downtime on Fri 23.09.2016 - 13:30-18:00
T2 scheduled downtime - 21/09/2016 from 08:00 till 20:00 -- 13. 09. 2016 FabioMartinelli
T2 scheduled downtime - 21/09/2016 from 08:00 till 20:00
T3 scheduled downtime - Friday 01/07/2016 from 14:00 till 15:00 -- 16. 06. 2016 FabioMartinelli
On Friday 01/07/2016 from 14:00 till 15:00 the /shome file service is going to be updated
New PSI mailing list service -- 18. 05. 2016 FabioMartinelli
The PSI mailing list service has been migrated from mailman to Sympa
New 576 CPU cores available at T3 -- 03. 05. 2016 FabioMartinelli
Deployed 9 t3wn* servers featuring 64 CPU cores / 128GB RAM / 10Gbs cards
T3 scheduled downtime - Friday 06/05/2016 from 10:00 till 17:00 -- 02. 05. 2016 FabioMartinelli
On Friday 06/05/2016 from 10:00 till 17:00 we're both upgrading the /pnfs file service and the T3 networking to 10Gbps
New T3 shome & swshare -- 22. 03. 2016 FabioMartinelli
New T3 shome & swshare mounted in /mnt/t3nfs01/data01/{shome,swshare}
T3 downtime 8th/9th Jan 2016 -- 10. 12. 2015 FabioMartinelli
PSI is going to perform its yearly electrical tests
/pnfs scheduled downtime - Friday 25/09/2015 from 13:30 till 14:30 -- 22. 09. 2015 FabioMartinelli
The /pnfs file service will be updated from version 2.13.7 to 2.13.9
/pnfs scheduled downtime - Friday 11/09/2015 from 13:00 till 20:00 -- 01. 09. 2015 FabioMartinelli
The /pnfs file service will be upgraded from version 2.10 to 2.13
T3 downtime from 24/07/2015 at 16:00 till 27/07/2015 at 14:00 -- 14. 07. 2015 FabioMartinelli
The T3 will be completely stopped for ~3 days
/pnfs scheduled downtime - Friday 12/06/2015 from 13:00 till 14:00 -- 10. 06. 2015 FabioMartinelli
On Friday we're testing if dCache properly reboots from scratch.
Possible T3 power cut on Thursday 30th April from 19:00 till 19:30 -- 30. 04. 2015 FabioMartinelli
Unexpected electrical maintenance at T3
T3 is in downtime on Friday 20th March from 12:00 till 22:00 -- 16. 03. 2015 FabioMartinelli
We're upgrading dCache from version 2.6 to version 2.10
All the SL5 UI servers will be disposed but t3ui05 -- 11. 03. 2015 FabioMartinelli
On Monday 16th March we're stopping all the old SL5 t3ui0* servers but t3ui05
VOID !! T3 is in downtime on Friday 20th Feb from 12:00 till 22:00 -- 16. 02. 2015 FabioMartinelli
VOID !! We're upgrading dCache from version 2.6 to version 2.10
On 19.12.2014 at 12:30 all the t3ui are going to be rebooted -- 19. 12. 2014 FabioMartinelli
Because of a Linux security update all the t3ui* servers will be rebooted
All the t3wn will be migrated to SL6 by the end of Jan 2015 -- 15. 12. 2014 FabioMartinelli
Both CMSSW and WLCG expects a SL6 T3 to properly work, so all our SL5 t3wn* will be reinstalled as SL6 t3wn*
T3 will be in downtime from Fri 9th Jan at 16:00 till Mon 12th Jan at 14:00 -- 08. 12. 2014 FabioMartinelli
PSI is performing its yearly electrical tests
How to use both IPython and CMSSW IPython on SL6 -- 26. 11. 2014 FabioMartinelli
/cvmfs/cms.cern.ch will replace /swshare/cms on Fri 21-11-2014 at 16:00 -- 17. 11. 2014 FabioMartinelli
/cvmfs is the standard technology used in the CMS Grid to distribute each CMSSW releases around the world.
gfalFS -- 14. 11. 2014 FabioMartinelli
On the SL6 t3ui* servers the gfalFS tool can be used to mount one or more CMS SEs as local directories.
/pnfs downtime on Friday 12th Sep from 13:00 until ~19:00 -- 03. 09. 2014 FabioMartinelli
The /pnfs file service will be in downtime to be updated to its latest version
New SL6 WNs -- 15. 08. 2014 FabioMartinelli
The T3 batch queues also use the new SL6 WNs t3wn[41,43,44,50] ( 100GB RAM, 32 cores )
New SL6 UIs -- 16. 07. 2014 FabioMartinelli
New SL6 t3ui[12,15,16,17,18,19] providing 1.7TB /scratch RAID10
Scheduled t3ui02,03,06,07 downtime on Fri 20th Jun from 14:00 to 16:00 -- 18. 06. 2014 FabioMartinelli
Next Friday 20th June at 14:00 we're going to enlarge the t3ui0[2,3,6,7]:/scratch of ~100GB
New tool 'dc_find' to quickly search into /pnfs -- 04. 04. 2014 FabioMartinelli
By using the tool dc_find the T3 users can easily list their own/group or global /pnfs files and propose to the admins what must to be deleted.
CERN has introduced the new gfal CLIs and APIs to interact with the Grid SEs -- 25. 03. 2014 FabioMartinelli
During 2014 the Grid users have to replace the lcg-* commands usage with the new gfal-* commands
Scheduled downtime from Friday 10th January 08:00 am to Monday 13th January 11:00 am -- 17. 12. 2013 FabioMartinelli
As every year, PSI has a forced shutdown of most IT services because of its general electrical maintenance.
Scheduled network downtime on Wed 18th Dec 2013 - from 9:00 am to 10:00 am -- 13. 12. 2013 FabioMartinelli
The T3 will be unavailable on Wed 18th Dec 2013 from 9:00 am to 10:00 am beacuse of a network recabling.
Scheduled downtime to upgrade dCache from version 2.2 to 2.6 -- 8. 11. 2013 FabioMartinelli
On Friday 8th Nov at 12:00 we're going to upgrade dCache from version 2.2 to 2.6 DONE
Network slowness on the nodes t3wn30-40 -- 03. 10. 2013 FabioMartinelli
Today the nodes t3wn[30-40] are limited to 10MB/s, the PSI network team will check our switches.
1h downtime on Friday 13 Sep 2013 9:30 am -- 10. 09. 2013 FabioMartinelli
Because of an HW error we need to stop the /pnfs files service for ~1h.
devtools-1.1 utilities installed on the t3ui,t3wn servers -- 04. 09. 2013 FabioMartinelli
By using these utilities users can arbitrarily use either the 2012 gcc compilers - ver. 4.7.2 or the 2008 default gcc compilers - ver. 4.1.2
Scheduled downtime to introduce the new Storage System -- 15. 08. 2013 FabioMartinelli
On Friday 16th morning we'll stop the /pnfs file services to quickly introduce the new Storage System.
New WNs Grid middleware to be tested before 1st June. -- 06. 05. 2013 FabioMartinelli
A temporary batch queue short.q.validation has been created to validate the new WNs Grid middleware.
T3 Scheduled Downtime on Monday 6th May ~9:30 am -- 29. 04. 2013 FabioMartinelli
We need to stop for ~ 2h several VMs that have to be migrated into the new PSI VMware cluster.
T3 Scheduled Downtime on March 28th -- 27. 02. 2013 FabioMartinelli
We're going to upgrade both dCache to ver. 2.2 and Postgresql to ver. 9.2.3
Doodle about the next T3 downtime in March -- 18. 02. 2013 FabioMartinelli
Doodle opened, it will be closed on Feb 25th
/pnfs/psi.ch/cms NFS Read-only mounted on each UI -- 18. 02. 2013 FabioMartinelli
/pnfs/psi.ch/cms is now Read-only mounted on each UI to allow users an handy navigation of /pnfs.
Reinstallation of t3ui02 t3ui03 t3ui04 on Feb 8th -- 07. 02. 2013 FabioMartinelli
After this final reinstallation all our UIs t3ui0[1-9] will offer 261GB /scratch.
Reinstallation of t3ui05 t3ui06 t3ui07 on Feb 6th -- 04. 02. 2013 FabioMartinelli
We're going to reinstall t3ui05 t3ui06 t3ui07 to increase both size and speed of their /scratch.
Downtime for the annual PSI IT maintenance on Fri, Jan 11th, 16h until Mon 14th, evening -- 17. 12. 2012 FabioMartinelli
Annual PSI IT maintenance that we're going to use also to migrate dCache from 1.9.5 to 1.9.12.
Downtime for major upgrade of the SE on Thu Nov 29th - Fri 30th -- 02. 11. 2012 DerekFeichtinger
Dcache, the Storage Element SW will be upgraded. The upgrade involves a complete migration of the underlying data base to a new format (chimera). Therefore, all operations involving access to the SE must be stopped for the time of the upgrade.
Enforced flexible Job RAM limits -- 26. 07. 2012 FabioMartinelli
Each Job will request by default 3GB of RAM but it's permited to explicitely request up to 6GB.
Added new WNs t3wn[30-40] -- 05. 06. 2012 FabioMartinelli
Introduced additional 176 job slots, each ~1.2 faster than the previous 160 slots.
Major Downtime March 14/15 for T3 upgrades -- 09. 02. 2012 DerekFeichtinger
Major upgrades to the storage and compute infrastructure of the T3 require a complete exchange of the current network switching.
Introduced /tmp and /scratch disks quota on UIs and WNs -- 13. 01. 2012 FabioMartinelli
To prevent a generic user to fill a shared partition and to observe the others space usage.
Downtime Jan 6-8 -- 23. 12. 2011 DerekFeichtinger
Due to PSI computing center maintenance the Tier-3 will go on downtime from Fri Jan 6, 15h in the afternoon until Monday Jan 9 in the early morning.
Introduction of new walltime limits for all.q, long.q. New interactive debugging queue -- 23. 11. 2011 DerekFeichtinger
On Nov 28, Based on the agreed policies, we will introduce a limit of 10h for jobs on the all.q and 96h on the long.q. The new interactive debugging queue is now also accessible for users.
Testing of new batch system policies -- 11. 10. 2011 DerekFeichtinger
In order to keep some fast turnaround resources free during normal work hours, we are testing out a number of new batch system policies. Submit to the short.q (up to 90 min jobs) to benefit from the free slots.
Downtime Jan 7-9 for PSI compute center maintenance -- 05. 01. 2011 DerekFeichtinger
Due to maintenance and systems testing in the PSI compute centers we must shut down the Tier-3 for the weekend of Jan 8/9. The downtime will begin on Friday evening, 17h. The system will be brought up again on Sunday morning.
Maintenance downtime Tue, Dec 21st 2010 FINISHED DONE -- 06. 12. 2010 DerekFeichtinger
Need to do some end of the year maintenance + enable shome quotas.
SE impaired due to fileserver problem - RESOLVED DONE -- 19. 10. 2010 DerekFeichtinger
Problem with one fileserver (t3fs07) where the disk failover and spare replacement did not work correctly. Solved by replacement parts from SUN/Oracle support on Oct 22.
Short downtime on Mon, June 14, 9h-10h for NFS server reboot -- 14. 06. 2010 DerekFeichtinger
The management processor on the NFS home server is in an unresponsive state. A total reboot + firmware upgrade is needed (as announced on t3 user mailing list).
RAM upgrade on Fri, May 21 -- 17. 05. 2010 DerekFeichtinger
Upgrade of the PSI Tier-3 worker nodes to 24 GB RAM per node (3GB per core).
Downtime Thu Mar 11th (+ 12th in case of problems) -- 09. 03. 2010 DerekFeichtinger
Upgrade of dCache to 'golden' production release. Pool migrations
Downtime Mon Jan 25 - Tue Jan 26 -- 20. 01. 2010 DerekFeichtinger
upgrade of WNs to SL5, reinstallation of batch system, another try at upgrading the dcache storage manager to 1.9.5-11
Downtime 8th to 11th Jan 2010 -- 16. 12. 2009 DerekFeichtinger
Due to a power shutdown at PSI on Sat 9th, all systems need to go down.
Emergency downtime Thu Dec 11 2009 - Sun Dec 13 -- 11. 12. 2009 DerekFeichtinger
Due to a repeated failure of dcache on file server t3fs05 we had to take a downtime. Fix requires Solaris OS update (reinstallation). The UI will also be unavailable from Friday noon.
Downtime July 30/31 2009 (finished) -- 21. 07. 2009 DerekFeichtinger
Downtime for a number of upgrades and the introduction of larger NFS area
Downtime May 8th - 10th (finished) -- 05. 05. 2009 DerekFeichtinger
Basic OS and MW updates and adaption of the SE information system to get us correctly registered
Quotas for /shome filesystem -- 10. 01. 2009 DerekFeichtinger
To protect the system from filling up we enforced user quotas on /shome (15GB soft / 25GB hard limit)
PSI CMS Tier-3 cluster is now online -- 03. 11. 2008 DerekFeichtinger
The PSI CMS Tier-3 cluster is now online. This is the common cluster for CMS members of ETHZ, University of Zurich and PSI.
Tier-3 users' test phase -- 05. 10. 2008 DerekFeichtinger
The cluster is ready for test users. CMSSW jobs run fine. Data can be ordered with PhEDEx. CRAB jobs work ok except for some options like -resubmit. User feedback is found on the CMSTier3Log1 page
Tier-3 Installation phase (2) -- 30. 08. 2008 DerekFeichtinger
Most low level problems solved. Registering to Grid and CMS services. Todo list.
Tier-3 Installation Phase -- 18. 08. 2008 DerekFeichtinger
The Tier-3 is in the install phase. First successful tests with the storage have been done. But a major OpenSolaris problem slows us down

-- DerekFeichtinger - 12 May 2009

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2017-07-13 - NinaLoktionova
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback