Jamie Fargen's Weblog

Archive for April, 2011

SGE Workshop

by on Apr.07, 2011, under Uncategorized

Here are the notes that from the workshop I recently gave on using Sun Oracle Grid Engine.

SGE User Administration

User Roles

  • Managers- have full capabilities to manipulate the Grid Engine system. By default, the superusers of the master host and of any machine that hosts a queue instance have manager privileges.
  • Operators- can perform many of the same commands as managers, except that operators cannot add, delete, or modify queues.
  • Users- have certain access permissions, as described in Configuring Users, but users have no cluster or queue management capabilities.

Displaying users by role type

  • To show a the list of managers – qconf -sm
  • To show list of operators – qconf -so (not used by RC)
  • To show a list of users in a – qconf -suserl

Managing users role

  • To add a user to the manager role – qconf -am
  • To remove a user from the manager role – qconf -dm

Note: We don’t use the Operators role and users will be added to the when the request an account by going to https://rc.usf.edu/signup/account.php.

User Access Lists

A general user has access to a certain set of hardware, called the general queue, for now this consists of rcninb.q, rcnib2.q, smp.q, rcnx.q, and the *.volatile.q. We use Access Lists to grant people access to additional queues, which will grant them access to certain groups more hardware. Depending on the queue this could give them access to hundreds of more processors machines with up to 128GB of RAM.

  • To view all of the Access Lists – qconf -sul
  • To view the users in a specific Access list – qconf -su mumcuUsers
  • To add a user to a list – qconf -au accessList

Dealing with Queues

  • To show a list of queues – qconf -sql
  • To show queue properties – qconf -sq

Note: In the properties you can find the user list for the queue.

Queue Stat

  • To see the status of your jobs in the queue run – qstat
  • To see the status of all jobs in the queue for all users – qstat -u ‘*’
  • To see the full the full display of info for a queue – qstat -f
  • To see all the nodes in an Error state – qstat -f | grep E
  • show the status of the all of the queue’s - qs
  • show all nodes - qs takes arguments -u, -E, and -d
  • show nodes a user can access – gq <netid>

Dealing with Jobs

  • To see the status of a specific users job – qstat -u
  • To see the properties of a certain job – qstat -j
  • To submit a job to the scheduler – qsub
  • To kill a job submitted to the scheduler – qdel -f
  • To alter a scheduled or running job – qalter
  • To clear an error state in a job – qmod -cj
  • To clear an error state in a queue – qmod -cq
  • To clear all error states – qmod -c ‘*’
  • To disable a queue – qmod -d
  • To enable a queue – qmod -e
  • To reschedule a job – qmod -rj

Much of this information was originally found at http://wikis.sun.com/display/GridEngine/+Sun+Grid+Engine.

Leave a Comment more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...