Each node in the LAN runs a lightweight PeerMon daemon process that collects and exchanges information about system-wide resource usage. Applications that use PeerMon data interact with their local daemon to obtain system-wide resource data.
Each PeerMon daemon is a multithreaded process. One thread, the Sender thread,
periodially collects load information about its own node and sends to three
peers its own load information and load infomation it has about other nodes in
the system (stored in its hash map of node-load info data).
The Listener thread receives messages from other PeerMon peers containing
system-wide load information and updates hash map entries with any new data
it receives. Applications that use PeerMon data communicate with their
local PeerMon daemon using the Client Interface thread.
The Client Interface thread uses only local information in its own hash
map to provide system-wide load information to the client application program
(for example, to smarterSSH).
Its peer-to-peer design makes PeerMon a scalable and fault tolerant monitoring system for efficiently collecting system-wide resource usage information. Experiments evaluating PeerMon's performance show that it adds little additional overhead to the system and that it scales well to large-sized LANs. PeerMon was initially designed to be used by system services that provide load balancing and job placement, however, it can be easily extended to provide monitoring data for other system-wide services.
We implemented three tools (smarterSSH, autoMPIgen, and a dynamic DNS binding system) that use PeerMon data to pick "good" nodes for job or process placement in a LAN. Tools using PeerMon data for job placement can greatly improve the performance of applications running on general purpose LANs.
Our LISA'10 paper describes the design of peerMon in more detail and presents experimental results running and using PeerMon in our LAN.
Once set-up, it is easy to run the peermon daemon and the client programs that use PeerMon data, including SmartherSSH and autoMPIGen. Command line options can be used to specify the runtime behavior of peermon and the client programs.
In addition, we provide several example system and run scripts that can be used to automatically start peermon on a system and keep it running.
Here is more detailed information about:
There are two ways to build and install PeerMon. The first is to follow the 4 build and config steps listed below, the second is to do an automated build from the .deb file (this will install in /usr/sbin).
Building PeerMon requires the GNU C++ compiler and Python be installed on your system.
Setting up PeerMon on your system should be pretty easy, consisting of just a few steps:
OR install peermon on a system, and config to run at start-up by following these steps:
Example config files are in the utils/ subdirectory. You can edit these with your machine information, and then run the installation script setup.sh or run make install to install peermon executables and these config files on your system.
The contents of the config file, should be lines of the form IP:listen_portnum of a handful of other peerMon nodes in the system (we currently list 3 in our config file that is shared by every node). For example:
130.58.68.76:1288 130.58.68.165:1288 130.58.68.160:1288The listen_portnum should match the required port number, -p, command argument to peermon.
As long as peermon is eventually started on at least one of the hosts listed in the config file, the peermon daemon started with this config file will be added to the P2P network and learn about all other nodes in the network after a few exchanges of messages. The other configuration file is optional and contains IP prefixes which determine the validity of the sender IP as well as the IP addresses in the data to be sent out by peermon. The default location for this file is in /etc/peermon/valid_ips.txt.
You can also run peermon with the -i command line argument to specify a different config file. An example of the contents of the valid_ips.txt file is:
130.58.68 256.256.256This will tell peermon to accept only data coming from IP addresses conforming to this prefix and ignore any data it receives about a host whose IP does not conform to the prefix. The absence or emptiness of this file implies that peermon will accept and pass along data about any IP without restrictions.
sudo adduser --system peermonWithout this user account, peermon will not run (as a daemon)
You can run peermon without the peermon user (see the -u option and the install.sh script documentation below)
For Linux/amd64 platforms, you can install the demonized version of peermon on your system using the debian package: peermon_1.1_amd64.deb. For other platforms, or for building a non-demonized version, use Option 1. To install using the Debian package, run: sudo dpkg -i peermon_1.1_amd64.deb.
This will install all binary, config files, and start-up files necessary to run the peermon daemon on your system. The dpkg -i will install all peermon executables in /usr/sbin, peermon config files in /etc/peermon, and peermon start scripts in /etc/init.d/.
There are two things you will need to change by hand on your system:
# after editing startcommand and creating a file with a list of # a list of machines on which to run commands (machinefile), run these: ./install.sh machinefile your_user_name ./isup.sh machinefile your_user_name ./killall.sh machinefile your_user_name
$ ln -s /etc/init.d/peermon.sh /etc/rc2.d/S99-peermon
Here is an example peermon.sh script (this example is also in utils/init.d-peermon.sh).
/etc/init.d/peermon.sh stop /etc/init.d/peermon.sh startIn utils/peerhealth.sh is an example script that can be run periodically as a cron job:
#! /bin/bash #This script checks for if the peermon daemon is running and if not it restarts #it. The script should be run from root's cron if ! `ps -A | grep -q peermon` then /etc/init.d/peermon.sh stop /etc/init.d/peermon.sh start fi
peermon command line arguments:
peermon -p portnum [-h] [-c] [-f configfile] [-l portnum] [-n secs] [-i ipconfigfile] -p portnum: use portnum as the listen port for peermon and portnum+1 as the send port -c: run this peermon daemon in collector-only mode -f conf_file: run w/conf_file instead of using the default config file in /etc/peermon/machines.txt -i ip_file: run w/ip_file instead of using the default -l portnum: use portnum for client interface (default is port 1981) -n secs: how often peermon daemon sends its info to peers (default 20) -u run as a regular user daemon process started by any user vs. as the peermon user daemon started at start-up. This is a way to easily start and stop an instance of a peermon P2P NW whenever by whomever. You should use the -p and -l options to not interfere with other peermon networks running on same set of machines -h: print out this help messageRunning in collector-only mode specifies that the node is a consumer of PeerMon resource data, but is not a provider of resources to other PeerMon nodes. We use this on our DNS server to allow it to use PeerMon data to choose the best nodes to dynamically bind names to, but to prevent other PeerMon nodes from choosing our DNS server as a target for job placement.
With the peermon distribution, we include two client programs that use PeerMon data: smarterSSH and autoMPIgen. We also include directions for how to use the PeerMon client interface to enable dynamic DNS mapping that uses PeerMon data to pick the "best" nodes for ssh placement. In addition, we include a simple client example with the distribution that can be used as a starting point for implementing your own client programs that use PeerMon data.
smarterSSH command line options:
-n num to select number of machines to print (default is 1) -v for full printout of stats for each machine listed -I to list machines by ip address instead of hostname -c to sort by CPU load, and -m to sort by free memory if neither is specified, a combination is used -m to order results by free memory (default is combination CPU mem) -r to return a list of machines randomly selected (no ordering by usage) -i or --info just display results and exit -h or --help show this help menu and exit -p port_num to specify peerMon port number (default 1981)The default port number of the peermon client thread is 1981. If you run peermon with the -l flag to specify a different client port number, then you also need to run smarterSSH (and autoMPIgen) with the -p option specifying the client port number.
Some example smarterSSH command lines:
# list top 20 machines smarterSSH -v -i -n 20 # ssh into the top machine based on cpu utilization smarterSSH -c
autoMPIgen command line options:
[hostfile] is the compulsory argument specifying the name of the output file. -n num to select number of machines to output. Default is 1. -v a full printout of stats for each machine listed. -c to order results by CPU load -m to order results by free memory (default is CPU and mem) -r to return a list of machines randomly selected machines (no ordering by usage stats) -s slots list the specified number of slots with each host in the hostfile -q include the number of CPUs on a host as slots with each machine in hostfile (-q and -s are incompatable) -x to interpret the -n num value as CPUs rather than nodes (this option is only valid with both -q and -n) -h or --help to see this help menu. -p port_num use port_num to connect to peerMon (default 1981) -I output IPs instead of hostnames -i or --info display the results and exit (no hostfile generated here)Some example autoMPI commands:
# create a host file named myhostfile, from 20 top machines autoMPIgen -n 20 myhostfile # create a host file named myhostfile, from 20 top machines ordered by cpu autoMPIgen -n 20 -c myhostfile # create a host file named myhostfile, from 20 top machines # list hosts by their IP, and for each entry include slot=4 autoMPIgen -n 20 -I -s 4 mhostfile
The combination of these DELTA values and the random ordering of "equally good node" results by smarterSSH and autoMPIgen, can result in many different targets each time they are run. For example, if there are currently 20 equally good nodes, then each time smarterSSH is called it will randomly return one of these 20 nodes. The idea is to ignore small differences in RAM or CPU node that do not represent real significant load differences and to help distribute smarterSSH and autoMPIgen load over more of these equally good nodes.
Using PeerMon data to select a set of "best" nodes has several benefits over BIND's support for load distribution that selects a host to bind to using either round-robin or random selection from a fixed set of possible hosts. First, it allows for the "best" host to be selected based on current system resource load, thus adapting to dynamic changes in system resource usage and resulting in better load distribution. Second, it is resilient to nodes being unreachable due to temporary network partitioning, node failure, or to deliberate shut-down of nodes in order to save on energy consumption during times of low use. In BIND, if the selected host is not reachable, then ssh hangs. Using our system, unreachable or failed nodes will not be included in the set of "best" targets. When a node is reachable again, PeerMon will discover it and the node may make its way back into the set of "best" targets. An additional benefit for system administrators is less editing of the DNS data files. If a machine is taken out for service, it is automatically (within a minute or two) removed from the pool of best-available machines, requiring no manual editing of the DNS data files. When a machine is restarted, it will quickly be added back into the PeerMon network and will automatically be a candidate target for dynamic DNS binding.
./peermon -p 2222 -cYou need to have peermon running on some of the regular nodes in your system in regular mode, and then start peermon on the DNS server in collector-only mode (using the -c command line option). Running in collector-only mode means that other peermon nodes will exclude the node from being a target of smarterSSH, autoMPIgen, or any other tool using PeerMon.
For these two steps, first enable the dynamic update feature of BIND 9 by adding "allow-update" sub0statement to your DNS zone config file. For example:
zone "cs.swarthmore.edu" { type master; file "cs.db"; allow-update {127.0.0.1;130.58.68.10;}; };
Next, write a script to update DNS records based on PeerMon data and add a cron job to run the script periodically. We run our script every minute. One way to do this is to write a script that:
python smarterSSH -ips -n 5 > tempfile
update delete cslab.cs.swarthmore.edu. update add cslab.cs.swarthmore.edu. 30 IN A 130.58.68.41 update add cslab.cs.swarthmore.edu. 30 IN A 130.58.68.70 update add cslab.cs.swarthmore.edu. 30 IN A 130.58.68.162 update add cslab.cs.swarthmore.edu. 30 IN A 130.58.68.74 update add cslab.cs.swarthmore.edu. 30 IN A 130.58.68.148
$ host cslab.cs.swarthmore.edu cslab.cs.swarthmore.edu has address 130.58.68.70 cslab.cs.swarthmore.edu has address 130.58.68.74 cslab.cs.swarthmore.edu has address 130.58.68.148 cslab.cs.swarthmore.edu has address 130.58.68.162 cslab.cs.swarthmore.edu has address 130.58.68.41
$ sudo rndc freeze (edit the data files here, being sure to update the serial number) $ sudo rndc thaw
To use peermonlib:
from peermonlib import PeermonLibIf you run your client in a directory different from the one in which peermonlib.py is installed, then you also need to add the directory containing peermonlib to the path before this line. For example:
import sys sys.path.insert(2,"/usr/sbin") # add as 2nd element in my python library path
lib = PeerMonLib() entries = lib.nodes_list
The distribution includes peermon, autoMPIgen, smarterSSH, and example config files and a simple client example from which you can build custom peermon clients. All documentation for building, installing, and using peermon is on this webpage.