NOTE: This information is somewhat sensitive and should be kept private. The content can't be read if you're not in the
TwikiAdminGroup.
j * Set ALLOWTOPICVIEW =
TwikiAdminGroup
Service classification on risk
This layered classification implies that a security breach in a certain place should not affect services in upper layers.
Users work on the bottom layer and the more upwards you advance the further they are (and they should).
Class |
Services |
Description |
Palladium |
Firewall, Syslog endpoint (blackbox, flight recorder) |
Here is where our own grid syslog server will be, and will just listen to syslog input traffic, nothing else. It is a kind of blackbox and should be the most protected layer, and only be used by us in case of an incident investigation. |
Platinum |
(Backup, Syslog, DNS, NTP) @CSCS, Snort, IDS, ciscoVPN |
This is composed by critical services that we must rely on, mostly generic CSCS services. If this layer is not compromised we can go easily back to a previously known status by using the backup. |
Gold |
Pub, OpenVPN, Repository, Installation server, NFS, cfengine DHCP, PXE, Xen Hosts, Management ports (ILOMs) |
This third layer is composed by cluster support services, the infrastructure needed to install and manage the cluster. Users doesn't have anything to do with it, and most services here don't affect production if they are temporarily lost (the exception is Xen*1 hosts). On the other hand, a hacker here can do really nasty things to the production environment, but as long as he doesn't climb to the top layer we can recover. |
Silver |
Lustre backend, Mon, BDIIs, dCache, LRMS |
This layer is composed of production services that are required during run-time by the grid cluster, but users don't have direct access to them (they can only read or write files, but not execute). |
Bronze |
UIs, WNs, Lcgce, Cream, Arc, VOBoxes |
The lowest layer is the one closest to the users, in fact those on the lowest is where they have rights to execute arbitrary binaries. |
Network split
We've got five different network areas to enforce security both in the installation process and in the remote management ports (ILOMs), and separate Internet from our CSCS public network and LCG network. It is structured as follows:
- Internet is the lowest security zone, where the attacks come from. The public CSCS network is protected from there with a Firewall (CSCS net and LCG net are protected with different firewalls).
- CSCS Public network (actually composed by many networks), with ip-ranges 148.187.[3,12,17,18,130,140,224] is owned by CSCS but not managed by Grid team. Here lie services like DNS, nagios, twiki, svn. It is behind a firewall also managed centrally by CSCS BUS team.
- LCG Public network has the range 148.187.64.0/22 and is behind a distinct firewall managed by HCS team. This is where all our systems are and where we expect attacks to come to.
- Installation network. With the range 10.10.64.0/22 (mirrored IPs from the LCG public network) this private network is there for installation purposes. All physical PhaseC machines have got an ethernet interface to that network. Even though this is a private network it is on the same VLAN as the LCG Public network (it's on the Force10 switches with PhaseB) so we should consider it as secure as the LCG Public network (any compromised machine can just change the IP and switch from one network to the other). FG: we need to verify if shared.
- ILOM is the most secure network. It's separated from the rest in a different VLAN and only the Xen Hosts are the entry points to it. All the remote administration interfaces are connected there, including STONITH interfaces and power switches monitoring.
Entry points
We have to admin this complex network and thus we need entry points to it. These are the machines that have access to the different parts of the system:
- XEN_HOSTS (xen11,12,13,14,15): They are connected to all the networks: LCG Public, Installation and ILOM. Access is allowed only locally from CSCS subnets, denied if not local.
- The installation server is also an entry point to the Installation 10.10/22 network. Anyway this is in the same VLAN as the LCG Public so it's an entry point just for convenience. Access is allowed only locally from CSCS subnets, denied if not local.
- Pub virtual machine: This is connected to the same three networks as the xen_hosts. This is where the admins will have home directories with their password-protected ssh private keys and it is considered a secure host for that respect. It will also be possible to enter the ILOM network through a VPN to be able to do remote control from outside CSCS. Access is permitted for ssh-authenticated connections from anywhere, strictly. Port or sequence-specific configuration can be considered.
- UI virtual machines: Connected to the LCG Public subnet but can even be placed later in a special DMZ for containment purposes. Access is permitted for ssh-authenticated connections from anywhere; outbound port traffic is allowed, inbound under negotiation.
All machines are required to have a denyhosts mechanism, explicitly allowing traffic from LCG Public/22 or a less subset, but not less than pub, nagios and xen hosts.
Sysadmin access
- All machines in the cluster will have restricted ssh root access only to administrators. They will not have user accounts (save for Pub, the entry point). They will only listen on the ssh port to the CSCS ip range.
- Administrators should only store their private ssh key on a "considered trusted" host and type the password just there. A trusted host is a host managed by the administrator and must be inside CSCS. All acceses from outside should come first to Pub.
--
FotisGeorgatos - 2010-03-19