eachElem                 package:nws                 R Documentation

_A_p_p_l_y _a _F_u_n_c_t_i_o_n _i_n _P_a_r_a_l_l_e_l _o_v_e_r _a _S_e_t _o_f _L_i_s_t_s _a_n_d _V_e_c_t_o_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     'eachElem' executes function 'fun' multiple times in parallel with
     a varying set of arguments, and returns the results in a list.  It
     is functionally similar to the standard R 'lapply' function, but
     is more flexible in the way that the function arguments can be
     specified.

_U_s_a_g_e:

       ## S4 method for signature 'sleigh':
       eachElem(.Object, fun, elementArgs=list(), fixedArgs=list(), 
           eo=NULL, DEBUG=FALSE)

_A_r_g_u_m_e_n_t_s:

 .Object: sleigh class object.

     fun: the function to be evaluated by the sleigh. In the case of
          functions like '+', '%*%', etc., the function name must be
          quoted.

elementArgs: list of vectors, lists, matrices, and data frames that
          specify (some of) the arguments to be passed to 'fun'. Each
          element should correspond to an argument of 'fun'.

fixedArgs: list of additional arguments to be passed to 'fun'. Each
          element should correspond to an argument of 'fun'.

      eo: list specifying environment options. See the section
          Environment Options below.

   DEBUG: logical; should 'browser' function be called upon entry to
          'eachElem'? The default is 'FALSE'.

_D_e_t_a_i_l_s:

     The 'eachElem' function forms argument sets from objects passed in
     via 'elementArgs' and 'fixedArgs'.   The elements of
     'elementsArgs' are used to specify the arguments that are
     changing, or varying, from task to task, while the elements of
     'fixedArgs' are used to specify the arguments that do not vary
     from task to task.  The number of tasks that are executed by a
     call to 'eachElem' is basically equal to the length of the longest
     vector (or list, etc) in 'elementArgs'.  If any elements of
     'elementArgs' are shorter, then their values are recycled, using
     the standard R rules.

     The elements of 'elementArgs' may be vectors, lists, matrices, or
     data frames.  The vectors and lists are always iterated over by
     element, or '"cell"', but matrices and data frames can also be
     iterated over by row or column.  This is controlled by the 'by'
     option, specified via the 'eo' argument.  See below for more
     information.

     For example:

     'eachElem(s, '+', elementArgs=list(1:4), fixedArgs=list(100))'

     This will submit four tasks, since the length of 1:4 is four.  The
     four tasks will be to add the arguments 1 and 100, 2 and 100, 3
     and 100, and 4 and 100.  The result is a list containing the four
     values 101, 102, 103, and 104.

     Another way to do the same thing is with:

     'eachElem(s, '+', elementArgs=list(1:4, 100))'

     Since the second element of 'elementArgs' is length one, it's
     value is recycled four times, thus specifying the same set of
     tasks as in the previous example.  This method also has the
     advantage of making it easy to put fixed values before varying
     values, without the need for the 'eo$argPermute' option, discussed
     later.  For example:

     'eachElem(s, '-', elementArgs=list(100, 1:4))'

     is similar to the R statement:

     '100 - 1:4'

     Note that in simple examples like these, where the results are
     numeric values, the standard R 'unlist' function can be very
     useful for converting the resulting list into a vector.

_E_n_v_i_r_o_n_m_e_n_t _O_p_t_i_o_n_s:

     The 'eo' argument is a list that can be used to specify various
     options.  The following options are recognized: 

     _e_l_e_m_e_n_t_F_u_n_c The 'eo$elementFunc' option can be used to specify a
          callback function that provides the varying arguments for
          'fun' in place of 'elementArgs' (that is, you can't specify
          both 'eo$elementFunc' and 'elementArgs').  'eachElem' calls
          the 'eo$elementFunc' function to get a list of arguments for
          one invocation of 'fun', and will keep calling it until
          'eo$elementFunc' signals that there are no more tasks to
          execute by calling the 'stop' function with no arguments.
          'eachElem' appends any values specified by 'fixedArgs' to the
          list returned by 'eo$elementFunc' just as if 'elementArgs'
          had been specified.

          'eachElem' passes the number of the desired task (starting
          from 1) as the first argument to 'eo$elementFunc', and the
          value of the 'eo$by' option as the second argument.  Note
          that the use of the 'eo$elementFunc' function is an advanced
          feature, but is very useful when executing a large number of
          tasks, or when the arguments are coming from a database
          query, for example.  For that reason, the 'eo$loadFactor'
          option should usually be used in conjunction with
          'eo$elementFunc' (see description below).

     _a_c_c_u_m_u_l_a_t_o_r The 'eo$accumulator' option can be used to specify a
          callback function that will receive the results of the task
          execution as soon as they are complete, rather than returning
          all of the task results as a list when 'eachElem' completes. 
          In other words, 'eachElem' will call the 'eo$accumulator'
          function with task results as soon as it receives them from
          the sleigh workers, rather than saving them in memory until
          all the tasks are complete.  Note that if the tasks are
          _chunked_ (using the 'eo$chunkSize' option described below),
          then the 'eo$accumulator' function will receive multiple task
          results, which is why the task results are always passed to
          the 'eo$accumulator' function in a list.

          The first argument to the 'eo$accumulator' function is a list
          of results, where the length of the list is equal to
          'eo$chunkSize'. The second argument is a vector of task
          numbers, starting from 1, where the length of the vector is
          also equal to 'eo$chunkSize'. The task numbers are very
          important, because the results are not guaranteed to be
          returned in order.  'eo$accumulator' is another advanced
          feature, and like 'eo$elementFunc', is very useful when
          executing a large number of tasks.  It allows you to process
          each result as they finish, rather than forcing you to wait
          until all of the tasks are complete.  In conjunction with
          'eo$elementFunc' and 'eo$loadFactor', you can set up a
          pipeline, allowing you to process an unlimited number of
          tasks efficiently.  Note that when 'eo$accumulator' is
          specified, 'eachElem' returns NULL, not the list of results,
          since 'eachElem' doesn't save any of the results after
          passing them to the 'eo$accumulator' function.

     _b_y The 'eo$by' option specifies the iteration scheme to use for
          matrix and data frame elements in 'elementArgs'.  The default
          value is '"row"', but it can also be set to '"column"' or
          '"cell"'.  Vectors and lists in 'elementArgs' are not
          affected by this option.

     _c_h_u_n_k_S_i_z_e The 'eo$chunkSize' option is a tuning parameter that
          specifies the number of tasks that sleigh workers should
          allocate at a time.  The default value is 1, but if the tasks
          are small, performance can be improved by specifying a larger
          value, which decreases the overhead per task.

          If the 'fun' function executes very quickly, you may not be
          able to keep your workers busy, giving you poor performance. 
          In that case, consider setting the 'eo$chunkSize' option to a
          large enough number to increase the effective task execution
          time.

     _l_o_a_d_F_a_c_t_o_r The 'eo$loadFactor' option is a tuning parameter that
          specifies the maximum number of tasks per worker that are
          submitted to the sleigh at the same time.  If set, no more
          than '(loadFactor * workerCount)' tasks will be submitted at
          the same time.  This helps to control the resource demands
          that are made on the NetWorkSpaces server, which is
          especially important if there are a large number of tasks. 
          Note that this option is ignored if 'blocking' is set to
          'TRUE', since the two options are incompatible with each
          other.

          If in doubt, set the 'eo$loadFactor' option to 10.  That will
          almost certainly avoid putting a strain on the NetWorkSpaces
          server, and if that isn't enough to keep your workers busy,
          then you should  really be using the 'eo$chunkSize' option to
          give the workers more to do.

     _b_l_o_c_k_i_n_g The 'eo$blocking' option is used to indicate whether to
          wait for the results, or to return as soon as the tasks have
          been submitted.  If set to 'FALSE', 'eachElem' will return a
          'sleighPending' object that is used to monitor the status of
          the tasks, and to eventually retrieve the results.  You must
          wait for the results to be complete before executing any
          further tasks on the sleigh, or an exception will be raised. 
          The default value is 'TRUE'.

     _a_r_g_P_e_r_m_u_t_e The 'eo$argPermute' option is used to reorder the
          arguments passed to 'fun'.  It is generally only useful if
          the 'fixedArgs' argument has been specified, and some of
          those arguments need to precede the arguments specified via
          'elementArgs'.  Note that by using recycling of elements in
          'elementArgs', the use of 'fixedArgs' and 'argPermute' can
          often be avoided entirely.

_N_o_t_e:

     If 'elementArgs' or 'fixedArgs' isn't a list, 'eachElem' will
     automatically wrap it in a list.  This is a convenience that only
     works for passing in a single vector and matrix, however.

     If 'elementArgs' or 'fixedArgs' are named lists, then the names
     are used to map the values to the appropriate argument of 'fun'. 
     This can be used as another technique to avoid the use of
     'eo$argPermute'.

     The 'elementArgs' argument can be specified as a data frame. This
     works just like a named list, and therefore, the column names of
     the data frame must all correspond to arguments of 'fun'.  Note
     that if the data frame has many rows, the performance may not be
     good due to the overhead of subsetting data frames in R.

     If you have a huge number of tasks, consider using the
     'eo$elementFunc', 'eo$accumulator', and 'eo$loadFactor' options.

     If 'eo$elementFunc' returns a value that isn't a list, 'eachElem'
     will automatically wrap that value in a list.

     The 'eo$elementFunc' function doesn't have to define a second
     formal argument (the 'by' argument) if it's not needed.

     The 'eo$accumulator' function doesn't have to define a second
     formal argument (the 'taskVector' argument) if it's not needed.
     Just remember that the results are not guaranteed to come back in
     order.

_S_e_e _A_l_s_o:

     'eachWorker', 'sleighPending'

_E_x_a_m_p_l_e_s:

       ## Not run: 
     # create a sleigh
     s <- sleigh()

     # compute the list mean for each list element
     x <- list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))
     eachElem(s, mean, list(x))

     # median and quartiles for each list element
     eachElem(s, quantile, elementArgs=list(x), fixedArgs=list(probs=1:3/4))

     # use eo$elementFunc to supply 100 random values and eo$accumulator to
     # receive the results
     elementFunc <- function(i, by) {
       if (i <= 100) list(i=i, x=runif(1)) else stop()
     }
     accumulator <- function(resultList, taskVector) {
       if (resultList[[1]][[1]] != taskVector[1]) stop('assertion failure')
       cat(paste(resultList[[1]], collapse=' '), '\n')
     }
     eo <- list(elementFunc=elementFunc, accumulator=accumulator)
     eachElem(s, function(i, x) list(i=i, x=x, xsq=x*x), eo=eo)
       ## End(Not run)

