ga_irods Package

ga_irods is a Django+Celery -> iRODS connector. The idea is that you can use this module to write webapps that call a data grid (iRODS) in a web-scale manner. Every iCommand is a Celery task. All iRODS environments are stored as model instances in a database.

This is all very well and good, but how to you use it? Assuming you know a bit about iRODS, have or know someone who has an account, and are familiar with the icommands clients (these are commands analogous to the unix file system commands, but with an i- prepended), then usage of this app is quite simple. First, add ‘ga_irods’ to your list of INSTALLED_APPS in Django’s settings.py. Then run:

$ python manage.py syncdb

That will add the RodsEnvironment model to the admin tool. Now, assuming you’re an admin on your django installation (you should be if you can run manage.py at all), then you can add RodsEnvironments for each User in the system. I capitalize User, because the owner of every RodsEnvironment must be an actual User in the django.contrib.auth application. Finally, make sure that celeryd is running and that django-celery is installed and listed in INSTALLED_APPS.

Once environments are setup, you can write a webapp that manipulates iRODS. One might attach a RodsEnvironment to a session object, then use this to accomplish things in an example view like so in your views.py:

from ga_irods import tasks as itasks

@requires_login
def my_view(request):
    if not 'rodsenvironment' in request.session:
        # redirect to allow the user to select from his/her environments
    else:
        stdout, stderr = itasks.ils(request.GET['path'])
        return render_to_response('mytemplate.html', lsresults=stdout)

All icommands are available. See the documentation on the mod:tasks module below.

icommands Module

Originally written by Antoine deTorcy

class ga_irods.icommands.Session(root, icommands_path, session_id='default_session')[source]

Bases: object

A set of methods to start, close and manage multiple iRODS client sessions at the same time, using icommands.

admin(*args)[source]

Runs the iadmin icommand with optional argument list and returns tuple (stdout, stderr) from subprocess execution.

create_environment(myEnv)[source]

Creates session files in temporary directory.

Argument myEnv must be instance of RodsEnv defined above. This method is to be called prior to calling self.runCmd(‘iinit’).

delete_environment()[source]

Deletes temporary sessionDir recursively.

To be called after self.runCmd(‘iexit’).

run(icommand, data=None, *args)[source]

Runs an icommand with optional argument list and returns tuple (stdout, stderr) from subprocess execution.

Set of valid commands can be extended.

run_safe(icommand, data=None, *args)[source]
runbatch(*icommands)[source]
session_file_exists()[source]

Checks for the presence of .irodsEnv in temporary sessionDir.

username[source]

Returns current irodsUserName from .irodsEnv or an empty string if the file does not exist.

zone[source]

Returns current zone name from .irodsEnv or an empty string if the file does not exist.

exception ga_irods.icommands.SessionException(exitcode, stdout, stderr)[source]

Bases: exceptions.Exception

models Module

class ga_irods.models.RodsEnvironment(*args, **kwargs)[source]

Bases: django.db.models.base.Model

RodsEnvironment(id, owner_id, host, port, def_res, home_coll, cwd, username, zone, auth)

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception RodsEnvironment.MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

RodsEnvironment.objects = <django.db.models.manager.Manager object at 0x104aee710>
RodsEnvironment.owner

tasks Module

The following is a list of the IRODS celery tasks and a brief description of what each does:

  • iadmin - perform irods administrator operations (irods admins only).
  • ibun - upload/download structured (tar) files.
  • ichksum - checksum one or more data-objects or collections.
  • ichmod - change access permissions to collections or data-objects.
  • icp - copy a data-object (file) or collection (directory) to another.
  • iexecmd - remotely execute special commands.
  • ifsck - check if local files/directories are consistent with the associated objects/collections in iRODS.
  • iget - get a file from iRODS.
  • ilocate - search for data-object(s) OR collections (via a script).
  • ils - list collections (directories) and data-objects (files).
  • ilsresc - list iRODS resources and resource-groups.
  • imcoll - manage mounted collections and associated cache.
  • imeta - add/remove/copy/list/query user-defined metadata.
  • imiscsvrinfo - retrieve basic server information.
  • imkdir - make an irods directory (collection).
  • imv - move/rename an irods data-object (file) or collection (directory).
  • ipasswd - change your irods password.
  • iphybun - physically bundle files (admin only).
  • iphymv - physically move a data-object to another storage resource.
  • ips - display iRODS agent (server) connection information.
  • iput - put (store) a file into iRODS.
  • iqdel - remove a delayed rule (owned by you) from the queue.
  • iqmod - modify certain values in existing delayed rules (owned by you).
  • iqstat - show the queue status of delayed rules.
  • iquest - issue a question (query on system/user-defined metadata).
  • iquota - show information on iRODS quotas (if any).
  • ireg - register a file or directory/files/subdirectories into iRODS.
  • irepl - replicate a file in iRODS to another storage resource.
  • irm - remove one or more data-objects or collections.
  • irmtrash - remove data-objects from the trash bin.
  • irsync - synchronize collections between a local/irods or irods/irods.
  • irule - submit a rule to be executed by the iRODS server.
  • iscan - check if local file or directory is registered in irods.
  • isysmeta - show or modify system metadata.
  • itrim - trim down the number of replicas of data-objects.
  • iuserinfo- show information about your iRODS user account.
  • ixmsg - send/receive iRODS xMessage System messages.
class ga_irods.tasks.IAdmin[source]

Bases: ga_irods.tasks.IRODSTask

iadmin(environment, command, *options)[source]

Usage: iadmin [-hvV] [command]

A blank execute line invokes the interactive mode, where it prompts and executes commands until ‘quit’ or ‘q’ is entered. Single or double quotes can be used to enter items with blanks.

Commands are: * lu [name[#Zone]] (list user info; details if name entered) * lua [name[#Zone]] (list user authentication (GSI/Kerberos Names, if any)) * luan Name (list users associated with auth name (GSI/Kerberos) * lt [name] [subname] (list token info) * lr [name] (list resource info) * ls [name] (list directory: subdirs and files) * lz [name] (list zone info) * lg [name] (list group info (user member list)) * lgd name (list group details) * lrg [name] (list resource group info) * lf DataId (list file details; DataId is the number (from ls)) * mkuser Name[#Zone] Type (make user) * moduser Name[#Zone] [ type | zone | comment | info | password ] newValue * aua Name[#Zone] Auth-Name (add user authentication-name (GSI/Kerberos) * rua Name[#Zone] Auth-Name (remove user authentication name (GSI/Kerberos) * rmuser Name[#Zone] (remove user, where userName: name[@department][#zone]) * mkdir Name [username] (make directory(collection)) * rmdir Name (remove directory) * mkresc Name Type Class Host [Path] (make Resource) * modresc Name [name, type, class, host, path, status, comment, info, freespace] Value (mod Resc) * rmresc Name (remove resource) * mkzone Name Type(remote) [Connection-info] [Comment] (make zone) * modzone Name [ name | conn | comment ] newValue (modify zone) * rmzone Name (remove zone) * mkgroup Name (make group) * rmgroup Name (remove group) * atg groupName userName[#Zone] (add to group - add a user to a group) * rfg groupName userName[#Zone] (remove from group - remove a user from a group) * atrg resourceGroupName resourceName (add (resource) to resource group) * rfrg resourceGroupName resourceName (remove (resource) from resource group) * at tokenNamespace Name [Value1] [Value2] [Value3] (add token) * rt tokenNamespace Name [Value1] (remove token) * spass Password Key (print a scrambled form of a password for DB) * dspass Password Key (descramble a password and print it) * pv [date-time] [repeat-time(minutes)] (initiate a periodic rule to vacuum the DB) * ctime Time (convert an iRODS time (integer) to local time; & other forms) * suq User ResourceName-or-‘total’ Value (set user quota) * sgq Group ResourceName-or-‘total’ Value (set group quota) * lq [Name] List Quotas * cu (calulate usage (for quotas)) * rum (remove unused metadata (user-defined AVUs) * asq ‘SQL query’ [Alias] (add specific query) * rsq ‘SQL query’ or Alias (remove specific query) * help (or h) [command] (this help, or more details on a command) Also see ‘irmtrash -M -u user’ for the admin mode of removing trash and similar admin modes in irepl, iphymv, and itrim. The admin can also alias as any user via the ‘clientUserName’ environment variable.

Parameters:
  • environment
  • options
Returns:

name = 'iadmin'
class ga_irods.tasks.IBundle[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.IBundle'
run(environment, command, *options)[source]
Usage : ibun -x [-hb] [-R resource] structFilePath
irodsCollection
Usage : ibun -c [-hf] [-R resource] [-D dataType] structFilePath
irodsCollection

Bundle file operations. This command allows structured files such as tar files to be uploaded and downloaded to/from iRODS.

A tar file containing many small files can be created with normal unix tar command on the client and then uploaded to the iRODS server as a normal iRODS file. The ‘ibun -x’ command can then be used to extract/untar the uploaded tar file. The extracted subfiles and subdirectories will appeared as normal iRODS files and sub-collections. The ‘ibun -c’ command can be used to tar/bundle an iRODS collection into a tar file.

For example, to upload a directory mydir to iRODS:

tar -chlf mydir.tar -C /x/y/z/mydir .
iput -Dtar mydir.tar .
ibun -x mydir.tar mydir

Note the use of -C option with the tar command which will tar the content of mydir but without including the directory mydir in the paths. The ‘ibun -x’ command extracts the tar file into the mydir collection. The target mydir collection does not have to exist nor be empty. If a subfile already exists in the target collection, the ingestion of this subfile will fail (unless the -f flag is set) but the process will continue.

It is generally a good practice to tag the tar file using the -Dtar flag when uploading the file using iput. But if the tag is not made, the server assumes it is a tar dataType. The dataType tag can be added afterward with the isysmeta command. For example: isysmeta mod /tempZone/home/rods/mydir.tar datatype ‘tar file’

The following command bundles the iRods collection mydir into a tar file:

ibun -cDtar mydir1.tar mydir

If a copy of a file to be bundled does not exist on the target resource, a replica will automatically be made on the target resource. Again, if the -D flag is not use, the bundling will be done using tar.

The -b option when used with the -x option, specifies bulk registration which does up to 50 rgistrations at a time to reduce overhead.

Options are: * -b bulk registration when used with -x to reduce overhead * -R resource - specifies the resource to store to. This is optional

in your environment
  • -D

    dataType - the struct file data type. Valid only if the struct file

    does not exist. Currently only one dataType - ‘t’ which specifies a tar file type is supported. If -D is not specified, the default is a tar file type

  • -x

    extract the structFile and register the extracted files and directories

    under the input irodsCollection

  • -c

    bundle the files and sub-collection underneath the input irodsCollection

    and store it in the structFilePath

  • -f

    force overwrite the struct file (-c) or the subfiles (-x).

  • -h

    this help

Parameters:
  • environment
  • options
Returns:

class ga_irods.tasks.IChksum[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.IChksum'
run(environment, *options)[source]
class ga_irods.tasks.IGet[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.IGet'
run(environment, path, callback=None, post=None, post_name=None, *options)[source]

Usage: iget [-fIKPQrUvVT] [-n replNumber] [-N numThreads] [-X restartFile] [-R resource] srcDataObj|srcCollection ... destLocalFile|destLocalDir

Usage : iget [-fIKPQUvVT] [-n replNumber] [-N numThreads] [-X restartFile] [-R resource] srcDataObj|srcCollection

Usage : iget [-fIKPQUvVT] [-n replNumber] [-N numThreads] [-X restartFile] [-R resource] srcDataObj ... -

Get data-objects or collections from irods space, either to the specified local area or to the current working directory.

If the destLocalFile is ‘-‘, the files read from the server will be written to the standard output (stdout). Similar to the UNIX ‘cat’ command, multiple source files can be specified.

The -X option specifies that the restart option is on and the restartFile input specifies a local file that contains the restart info. If the restartFile does not exist, it will be created and used for recording subsequent restart info. If it exists and is not empty, the restart info contained in this file will be used for restarting the operation. Note that the restart operation only works for uploading directories and the path input must be identical to the one that generated the restart file

The -Q option specifies the use of the RBUDP transfer mechanism which uses the UDP protocol for data transfer. The UDP protocol is very efficient if the network is very robust with few packet losses. Two environment variables - rbudpSendRate and rbudpPackSize are used to tune the RBUDP data transfer. rbudpSendRate is used to throttle the send rate in kbits/sec. The default rbudpSendRate is 600,000. rbudpPackSize is used to set the packet size. The dafault rbudpPackSize is 8192. The -V option can be used to show the loss rate of the transfer. If the lost rate is more than a few %, the sendrate should be reduced.

The -T option will renew the socket connection between the client and server after 10 minutes of connection. This gets around the problem of sockets getting timed out by the firewall as reported by some users.

Options are: * -f force - write local files even it they exist already (overwrite them) * -I redirect connection - redirect the connection to connect directly

to the best (determiined by the first 10 data objects in the input collection) resource server.
  • -K

    verify the checksum

  • -n

    replNumber - retrieve the copy with the specified replica number

  • -N

    numThreads - the number of thread to use for the transfer. A value of 0 means no threading. By default (-N option not used) the server decides the number of threads to use.

  • -P

    output the progress of the download.

  • -r

    recursive - retrieve subcollections

  • -R

    resource - the preferred resource

  • -T

    renew socket connection after 10 minutes

  • -Q

    use RBUDP (datagram) protocol for the data transfer

  • -v

    verbose

  • -V

    Very verbose restartFile input specifies a local file that contains the restart info.

  • -X

    restartFile - specifies that the restart option is on and the restartFile input specifies a local file that contains the restart info.

  • -h

    this help

Parameters:
  • environment – a dict or primary key of the RodsEnvironment model that governs this session
  • path – the path to get from
  • callback – a registered Celery task that can be called as a subtask with the entire contents of the file that was gotten (file must fit in memory)
  • post – a URL to which the results of the iget can be POSTed. File can be larger than available memory.
  • post_name – the filename that the POST will be given.
  • options – any of the above command line options.
Returns:

class ga_irods.tasks.ILs[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.ILs'
run(environment, *options)[source]
Display data Objects and collections stored in irods. Options are:
  • -A

    ACL (access control list) and inheritance format

  • -l

    long format

  • -L

    very long format

  • -r

    recursive - show subcollections

  • -v

    verbose

  • -V

    Very verbose

  • -h

    this help

Parameters:
  • environment – a dict or primary key of the RodsEnvironment model that governs this session
  • options – any of the above command line options
Returns:

stdout, stderr tuple of the command.

class ga_irods.tasks.IPut[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.IPut'
run(environment, path, data, *options)[source]
Usage : iput [-abfIkKPQrTUvV] [-D dataType] [-N numThreads] [-n replNum]
[-p physicalPath] [-R resource] [-X restartFile] [–link]

localSrcFile|localSrcDir ... destDataObj|destColl

Usage : iput [-abfIkKPQTUvV] [-D dataType] [-N numThreads] [-n replNum]
[-p physicalPath] [-R resource] [-X restartFile] [–link]
localSrcFile

Store a file into iRODS. If the destination data-object or collection are not provided, the current irods directory and the input file name are used. The -X option specifies that the restart option is on and the restartFile input specifies a local file that contains the restart info. If the restartFile does not exist, it will be created and used for recording subsequent restart info. If it exists and is not empty, the restart info contained in this file will be used for restarting the operation. Note that the restart operation only works for uploading directories and the path input must be identical to the one that generated the restart file

If the options -f is used to overwrite an existing data-object, the copy in the resource specified by the -R option will be picked if it exists. Otherwise, one of the copy in the other resources will be picked for the overwrite. Note that a copy will not be made in the specified resource if a copy in the specified resource does not already exist. The irepl command should be used to make a replica of an existing copy.

The -I option specifies the redirection of the connection so that it can be connected directly to the resource server. This option can improve the performance of uploading a large number of small (<32 Mbytes) files. This option is only effective if the source is a directory and the -f option is not used

The -Q option specifies the use of the RBUDP transfer mechanism which uses the UDP protocol for data transfer. The UDP protocol is very efficient if the network is very robust with few packet losses. Two environment variables - rbudpSendRate and rbudpPackSize are used to tune the RBUDP data transfer. rbudpSendRate is used to throttle the send rate in kbits/sec. The default rbudpSendRate is 600,000. rbudpPackSize is used to set the packet size. The dafault rbudpPackSize is 8192. The -V option can be used to show the loss rate of the transfer. If the lost rate is more than a few %, the sendrate should be reduced.

The -T option will renew the socket connection between the client and server after 10 minutes of connection. This gets around the problem of sockets getting timed out by the firewall as reported by some users.

The -b option specifies bulk upload operation which can do up to 50 uploads at a time to reduce overhead. If the -b is specified with the -f option to overwrite existing files, the operation will work only if there is no existing copy at all or if there is an existing copy in the target resource. The operation will fail if there are existing copies but not in the target resource because this type of operation requires a replication operation and bulk replication has not been implemented yet. The bulk option does work for mounted collections which may represent the quickest way to upload a large number of small files.

Options are: * -a all - update all existing copy * -b bulk upload to reduce overhead * -D dataType - the data type string * -f force - write data-object even it exists already; overwrite it * -I redirect connection - redirect the connection to connect directly

to the resource server.
  • -k

    checksum - calculate a checksum on the data

  • -K

    verify checksum - calculate and verify the checksum on the data

  • –link - ignore symlink.

  • -N

    numThreads - the number of thread to use for the transfer. A value of 0 means no threading. By default (-N option not used) the server decides the number of threads to use.

  • -p physicalPath - the physical path of the uploaded file on the sever

  • -P

    output the progress of the upload.

  • -Q

    use RBUDP (datagram) protocol for the data transfer

  • -R

    resource - specifies the resource to store to. This can also be specified in your environment or via a rule set up by the administrator.

  • -r

    recursive - store the whole subdirectory

  • -T

    renew socket connection after 10 minutes

  • -v

    verbose

  • -V

    Very verbose

  • -X

    restartFile - specifies that the restart option is on and the restartFile input specifies a local file that contains the restart info.

  • -h

    this help

Parameters:
  • environment – a dict or primary key of the RodsEnvironment model that governs this session
  • path – the path to store the object in
  • data – the data object to store
  • options – any of the above command line options.
Returns:

stdout, stderr of the command.

class ga_irods.tasks.IRODSTask[source]

Bases: celery.app.Task

collection(name)[source]
mount(local_name, collection=None)[source]
name = 'ga_irods.tasks.IRODSTask'
run(environment)[source]
session[source]
unmount(local_name)[source]
class ga_irods.tasks.Ichmod[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ichmod'
run(environment, *options)[source]
class ga_irods.tasks.Icp[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Icp'
run(environment, *options)[source]
class ga_irods.tasks.Iexecmd[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iexecmd'
run(environment, *options)[source]
class ga_irods.tasks.Ifsck[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ifsck'
run(environment, *options)[source]
class ga_irods.tasks.Ilocate[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ilocate'
run(environment, *options)[source]
class ga_irods.tasks.Ilsresc[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ilsresc'
run(environment, *options)[source]
class ga_irods.tasks.Imcoll[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Imcoll'
run(environment, *options)[source]
class ga_irods.tasks.Imeta[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Imeta'
run(environment, *options)[source]
class ga_irods.tasks.Imiscserverinfo[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Imiscserverinfo'
run(environment, *options)[source]
class ga_irods.tasks.Imkdir[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Imkdir'
run(environment, *options)[source]
class ga_irods.tasks.Imv[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Imv'
run(environment, *options)[source]
class ga_irods.tasks.Iphybun[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iphybun'
run(environment, *options)[source]
class ga_irods.tasks.Iphymv[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iphymv'
run(environment, *options)[source]
class ga_irods.tasks.Ips[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ips'
run(environment, *options)[source]
class ga_irods.tasks.Iqdel[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iqdel'
run(environment, *options)[source]
class ga_irods.tasks.Iqmod[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iqmod'
run(environment, *options)[source]
class ga_irods.tasks.Iqstat[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iqstat'
run(environment, *options)[source]
class ga_irods.tasks.Iquest[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iquest'
run(environment, *options)[source]
class ga_irods.tasks.Iquota[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iquota'
run(environment, *options)[source]
class ga_irods.tasks.Ireg[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ireg'
run(environment, *options)[source]
class ga_irods.tasks.Irepl[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Irepl'
run(environment, *options)[source]
class ga_irods.tasks.Irm[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Irm'
run(environment, *options)[source]
class ga_irods.tasks.Irmtrash[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Irmtrash'
run(environment, *options)[source]
class ga_irods.tasks.Irsync[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Irsync'
run(environment, *options)[source]
class ga_irods.tasks.Irule[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Irule'
run(environment, *options)[source]
class ga_irods.tasks.Iscan[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iscan'
run(environment, *options)[source]
class ga_irods.tasks.Isysmeta[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Isysmeta'
run(environment, *options)[source]
class ga_irods.tasks.Itrim[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Itrim'
run(environment, *options)[source]
class ga_irods.tasks.Iuserinfo[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Iuserinfo'
run(environment, *options)[source]
class ga_irods.tasks.Ixmsg[source]

Bases: ga_irods.tasks.IRODSTask

name = 'ga_irods.tasks.Ixmsg'
run(environment, *options)[source]

views Module

Table Of Contents

Previous topic

ga_irods - Django+Celery -> iRODS bridge

This Page