Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ ATLAS GPU Challenge on 2020-03-05 * *zoom link* ---++ Agenda * ATLAS description of the GPU challenge * <b>Testing of containerised AI and DL workflows on GPU on the Grid</b><br /><br /> * <u>Activity by ADC (WMS)+software groups. Original call</u>:<br /> <p> _Dear All,_ </p> <p><i>since the New Year is all about starting fresh with new challenges and ideas, we’d like to throw a challenge for you. This is inspired by some recent successes:</i><br /><i>(1) Lukas and Alessandra have been able to run DL training (including hyperparameter optimisation) on GPUs in containerised jobs, with the resources accessed via grid machinery</i><br /><i>(2) Rui and Sau Lan have been able to run DL training on Summit using the GPUs, but by direct personal access, not via the grid</i></p> <p> _The challenge is as follows: can we combine the two activities such that we could use grid machinery and containerised jobs to access the GPUs on a “difficult” HPC in the US (such as Summit), to run some user-submitted DL training application? I have no idea how difficult this is - presumably there will be all kinds of policy issues related to running containers and network access on a machine such as Summit - but I think this is a good practical challenge to enable us to start using these machines for real, in a way that could help physics analyses and combined performance._ </p> <p> _Please let us know what you think of this plan. If you like it and think it is worth pursuing, please forward this to whoever you think would be interested. Perhaps we could have some kind of talk or session on this at the Lancaster SW&C meeting in June?_ </p> <p> _Cheers,_ </p> <p><i>James & Ale<br /><br /></i></p> * <u>Aiming at having ~10 users succesfully running small scale workflows on the Grid (volunteered resources)</u> * Make GPUs "popular" with interested users * Develop WMS integration solutions to fully support such workflows<br /><br /> * *Technical aspects, proposed starting configuration for CSCS* * 1 or more GPUs available * easier if dedicated partition, more complex (but doable) if using one of the existing partitions (might attract unwanted jobs on the GPU node(s), e.g. ops tests etc.) * ARC5+RTE (*) ok to request 1GPU statically. With ARC6 can dynamically adjust requests. * Preferred to start with a "generic node": no CVMFS, singularity, network connectivity not clear, hopefully not needed (or could be via squid proxies). * Container built off-site, staged in via ARC.<br /><br /> * *Request to CSCS:* * Name of partition to use to start testing. * Useful to know the node specs.<br /><br /> * <span style="background-color: transparent;">(*) RTE for ARC5:* </span> <p>###########################<br />### RTE Code ###<br />###########################<br />if [ "x$1" = "x0" ]; then<br /> # RTE Stage 0<br /> # You can do something on ARC host as this stage.<br /> # Here is the right place to modify joboption_ variables.<br /> # Note that this code runs under the mapped user account already!<br /> # No root priveledges!<br /> export joboption_nodeproperty_0="${joboption_nodeproperty_0} --constraint=gpu --gres=gpu:1"<br /><br /></p> * Discussion ---++ Attendants * person ---++ Minutes * item ---++ Action items * item
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r6
|
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r2 - 2020-03-03
-
GianfrancoSciacca
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback