User:Merlissimo/Sandbox


 * For the newtask command on Solaris, see batch project (outdated)

Job scheduling is the primary method by which tools should be started on the Toolserver. Jobs (i.e., tools) are submitted to the scheduler, which then starts the job on an appropriate host, based on factors like current load and resources needed. Using batch scheduling means you don't need to worry about where to start a job, or whether the job should be started during off-peak hours, etc. Job scheduling can be used for any sort of tools, whether they're one-off jobs, tools like bots which need to run permanently, or regular jobs run from cron.

While it's possible to run jobs on a server directly, without using job scheduling, this is strongly discouraged, since it makes it harder for the Toolserver administrators to manage server resources and load.

Introduction
The Toolserver uses Sun Grid Engine 6.2 (SGE) for scheduling jobs.

First you submit a job and specify a list of resources that your jobs will need. If there is a host that has sufficient resources available as requested your job will be started there. If the system is busy, there might be no free resources, and jobs will be queued until more resources become available. At present, it's very unlikely that jobs will be queued for more than just few minutes in this way, except if there is some maintenance.

If a host crashes or shuts down all jobs will be restarted on other hosts of the host is unavailable for more than an hour.

Different to former documentation you don't have to care about different queues. SGE will handle this for you. You only have to add some information about maximum runtime, expected memory usage and other resources needed by your job on submit.

Submitting jobs
To submit a job, use the qsub or qcronsub command.

qcronsub has excactly the same syntax as qsub. The only difference is, that the job is not submitted if there is a job having the same name is already running or queued. For more details about this see. At the excamples below qsub is always used.

$ qsub -l h_rt=0:30:00 -l virtual_free=100M $HOME/mytool.py Your job 80570 ("mytool.py") has been submitted The scheduler will place the job in the queue, and eventually (probably immediately) run it on a suitable host. Once the job has finished, it will be removed from the system.

-l h_rt=0:30:00 specifies the runtime limit of the job ([dd:]hh:mm:ss), in this case 30 minutes. You should set this to the expected maximum runtime; if the job runs any longer, it will be killed. If you don't specify a limit, the default is 6 hours. For tools like irc bots that should never stop you can specify -l h_rt=INFINITY

-l virtual_free=100M</tt> specifies the peak memory usage of the job during runtime, in this case 100 Megabyte. The maximum available memory you can request per job is 1000M. SGE will ensure that all jobs scheduled on one host have not requested more than 1G virtual_free in sum (Account limits).

h_rt</tt> and virtual_free</tt> are the only resources you always must add to your request.

Currently also jobs not requesting these two resources are accepted but this will change in future. In these cases -l h_rt=6:00:00 -l virtual_free=50M</tt> is used as default values.

In general you must request all resources that you needs. If you don't, you job may be scheduled on a host not having this resource. All resources you request must be maximum or peak values. E.g. if you run a bot that normally has a memory usage of about 100Mb but in error cases this could raise to 500Mb then you have to request -l virtual_free=500M</tt>. It is not a problem if your job uses less resources than requested (requested resources aren't block fully exclusively). Jobs needing much resources or having a long runtime have a lower priority for scheduling and get less cpu time if the system is busy.

Rather than specifying arguments to qsub</tt> every time the job is run, you can instead put them in the script itself, using special directives starting with #$</tt>: ... rest of script ...
 * 1) ! /usr/bin/python
 * 2) $ -l h_rt=0:30:00
 * 3) $ -j y
 * 4) $ -o $HOME/mytool.out

If you want to receive mail when a job finishes, use -m e</tt>. To receive mail when a job starts and when it finishes, use -m be</tt>.

If you want a warning before your job is killed, specify s_rt</tt> with a value lower than h_rt</tt>, for example: $ qsub -l h_rt=1:00:00 -l s_rt=0:55:00 slowjob.py This will send a SIGUSR1 signal after 55 minutes, which you can catch to perform cleanup before the job ends. After 1 hour, SIGKILL will be sent.

Converting an existing cron job to use the scheduler
Example: You have a tool, mytool.py</tt>, which runs from cronie at 0300 UTC every day: 0 3 * * * $HOME/mytool.py

To run this tool under the job scheduler, change it to: 0 3 * * * qcronsub -s mytool $HOME/mytool.py

"mytool" is how you tell cronsub your tool is named. The output from the tool will be written to $HOME/mytool.out.

Converting a Phoenix tool to use the scheduler
Example: You have a tool, mytool.py</tt>, which runs under Phoenix so it's automatically (re)started: */10 * * * * phoenix $HOME/phoenix-mytool $HOME/mytool.py

To run this tool under the job scheduler, change it to: */10 * * * * cronsub -sl mytool $HOME/mytool.py

The output from the tool will be written to $HOME/mytool.out.

Receiving mail when the job starts or finishes
To receive mail when the job finishes, add this line: To receive mail when it starts as well:
 * 1) ! /usr/bin/python
 * 2) $ -m ae
 * 1) $ -m bae

Submitting jobs from cronie
While it's sometimes useful to run a single job from the command line, most tools need to run regularly, using cron. We provide a script called cronsub</tt> to make this easier. cronsub</tt> is a wrapper around qsub</tt> which provides some additional functionality.

Example: if you wanted <tt>test.py</tt> to be submitted at 0300h UTC every day, you could use an entry like this in your cronietab: 0 3 * * * cronsub mytool $HOME/mytool.py The first argument (<tt>mytool</tt>) will be used as the name of the job, and the second argument is the command to run. The output file will be set to <tt>$HOME/ .out</tt>, in this case <tt>$HOME/mytool.out</tt>.

Among other things, <tt>cronsub</tt> will prevent a job from running if a job of the same name already exists. This means that if your job is queued, or takes longer to run than expected, a second duplicate job won't be started.

You can specify <tt>qsub</tt> arguments to <tt>cronsub</tt>: cronsub mytool -l h_rt=0:30:00 $HOME/mytool.py ... but generally it's easier to use <tt>#$</tt> lines in the script itself.

submit.toolserver.org
We have set up a pair of redundant hosts to act as SGE job submission servers. These work by sharing each user's cronietab between both hosts, and executing jobs on whichever server is working. This avoids that problem where jobs run from cronie on one login server (such as willow) will fail to run if that host is down, even when other login servers are available.

To use the new hosts, log into submit.toolserver.org and set up a cronietab (*not* a crontab) as normal.

Note that these hosts are *only* for submitting SGE jobs, not for running tools on.

Submitting long-running jobs
Some tools, like bots, are meant to run continuously, and restart if they exit. These tools are not suitable for running in the default queue (<tt>all.q</tt>); instead, we provide a separate queue called <tt>longrun</tt>. To start a job in the <tt>longrun</tt> queue: $ qsub -q longrun $HOME/longtool.py However, a better way to start such tools is using <tt>cronsub</tt>. Since <tt>cronsub</tt> won't start duplicate jobs, you can try to start your long-running tools regularly (for example, every 10 minutes); if the job is running, nothing will happen, but if it has exited for some reason, it will be restarted. An example of using <tt>cronsub</tt> this way might be: */10 * * * * cronsub -sl longtool $HOME/longtool.py This will run <tt>cronsub</tt> every 10 minutes. The <tt>-l</tt> argument instructs <tt>cronsub</tt> to start the job in the <tt>longrun</tt> queue.

Running jobs under screen
Many tools need to run under <tt>screen</tt> because they require a terminal to work. You can still run these jobs using the job scheduler, but you need to create a small wrapper script to start <tt>screen</tt>. For <tt>mytool.py</tt>, such a script might be called <tt>mytool.sh</tt> and look like this:

exec screen -D -m -S mytool python $HOME/mytool.py
 * 1) ! /bin/sh
 * 2) screen doesn't produce any output, so use /dev/null to avoid creating empty files
 * 3) $ -j y
 * 4) $ -o /dev/null

You can then use <tt>cronsub</tt>, as above, to submit the job from <tt>cronie</tt>: */10 * * * * cronsub -l mytool $HOME/mytool.sh

This will create a screen session named <tt>mytool</tt> and start <tt>mytool.py</tt> inside it. Tools that run under <tt>screen</tt> are almost always meant to run forever, so here we used the <tt>longrun</tt> queue (<tt>cronsub -l</tt>).

You can attach to the <tt>screen</tt> session using <tt>screen -r myprog</tt> as normal, but first you need to check which host the job is running on:

$ qstat | grep mytool 80463 0.55500 mytool    rriver       r     11/17/2010 03:48:26 longrun@willow.toolserver.org      1

Here the job is running on <tt>willow</tt>, so that's where you need to run <tt>screen -r</tt>.

Scheduling SQL queries
When writing batch jobs that perform SQL queries, the most important resource is often available SQL capacity rather than CPU or memory. In this case, it is possible to specify that your job needs to run an SQL query on one or more clusters:

mysql -h sql-s1 -BNe 'select count(*) from revision' enwiki_p
 * 1) ! /bin/sh
 * 2) $ -N sqltest
 * 3) $ -l sqlprocs-s1=1

The line <tt>#$ -l sqlprocs-s1=1</tt> indicates that this script needs 1 execution slot on the sql-s1 cluster. If free slots are available, the job will run immediately; otherwise, it will wait for a slot to become available. You can also configure this on the <tt>qsub</tt> command line:

% qsub -l sqlprocs-s1=1 sql.sh

Currently, 10 SQL slots are configured for each server, and each query running for longer than 60 seconds counts as using a slot. Replication lag is currently not taken into account, but this will probably change soon.

Note: For long-running jobs (as opposed to jobs which run once then exit), do not reserve any SQL slots; since the program runs continuously, it will take the slots forever and prevent other jobs from running.

Allowing jobs to be automatically restarted or migrated
By default, when a cluster node crashes or reboots, all jobs on it are terminated and will not be restarted, because it's not always safe to restart a job that was previously running. If you would like your job to be restarted when this happens, you can start it as a restartable job using <tt>-r y </tt>. There is no need to do this for jobs in the <tt>longrun</tt> queue, since jobs in that queue are restartable by default.

Migration allows jobs to be moved between nodes while they're running, which improves load distribution and results in better performance. Migration relies on checkpointing -- the ability of a job to save its state and resume when restarted.

We do not provide any automatic checkpointing system; if you wish your job to be migrated, you need to implement this yourself. Examples of jobs that are suitable for migration include: Most jobs in the <tt>longrun</tt> queue are probably suitable for migration, but it is not be enabled by default. To mark a job as a checkpointing (migratable) job, start it with the <tt>-ckpt default</tt> argument.
 * Jobs which work by removing work items from a queue and processing them; when migrated, the job just starts from the top of the queue
 * Jobs which are event-based and wait for work to do, e.g. most IRC bots or recentchanges bots
 * Jobs which regularly save their working state and can resume from the saved state if they are restarted