bigglm                 package:biglm                 R Documentation

_B_o_u_n_d_e_d _m_e_m_o_r_y _l_i_n_e_a_r _r_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     'bigglm' creates a generalized linear model object that uses only
     'p^2' memory for 'p' variables.

_U_s_a_g_e:

     bigglm(formula, data, family=gaussian(),...)
     ## S3 method for class 'data.frame':
     bigglm(formula, data,...,chunksize=5000)
     ## S3 method for class 'function':
     bigglm(formula, data, family=gaussian(),
          weights=NULL, sandwich=FALSE, maxit=8, tolerance=1e-7,
          start=NULL,...)
     ## S3 method for class 'bigglm':
     vcov(object,dispersion=NULL, ...)

_A_r_g_u_m_e_n_t_s:

 formula: A model formula

    data: See Details below. Method dispatch is on this argument

  family: A glm family object

chunksize: Size of chunks for processng the data frame

 weights: A one-sided, single term formula specifying weights

sandwich: 'TRUE' to compute the Huber/White sandwich covariance matrix
          (uses 'p^4' memory rather than 'p^2')

   maxit: Maximum number of Fisher scoring iterations

tolerance: Tolerance for change in coefficient (as multiple of standard
          error)

   start: Optional starting values for coefficients. If 'NULL', 'maxit'
          should be at least 2 as some quantities will not be computed
          on the first iteration

  object: A 'bigglm' object

dispersion: Dispersion parameter, or 'NULL' to estimate

     ...: Additional arguments

_D_e_t_a_i_l_s:

     The 'data' argument may be a function or a data frame.

     When it is a function the function must take a single argument
     'reset'. When this argument is 'FALSE' it returns a data frame
     with the next chunk of data or 'NULL' if no more data are
     available. When'reset=TRUE' it indicates that the data should be
     reread from the  beginning by subsequent calls. The chunks need
     not be the same size or in the same order when the data are
     reread, but the same data must be provided in total.  The
     'bigglm.data.frame' method gives an example of how such a function
     might be written, another is in the Examples below.

     The model formula must not contain any data-dependent terms, as
     these will not be consistent when updated.  Factors are permitted,
     but the levels of the factor must be the same across all data
     chunks (empty factor levels are ok).

_V_a_l_u_e:

     An object of class 'biglm'

_R_e_f_e_r_e_n_c_e_s:

     Algorithm AS274  Applied Statistics (1992) Vol.41,  No. 2

_S_e_e _A_l_s_o:

     'biglm', glm

_E_x_a_m_p_l_e_s:

     data(trees)
     ff<-log(Volume)~log(Girth)+log(Height)
     a <- bigglm(ff,data=trees, chunksize=10, sandwich=TRUE)
     summary(a)

     ## Not run: 
     ## requires internet access
     make.data<-function(urlname, chunksize,...){
           conn<-NULL
          function(reset=FALSE){
          if(reset){
            if(!is.null(conn)) close(conn)
            conn<<-url(urlname,open="r")
          } else{
            rval<-read.table(conn, nrows=chunksize,...)
            if (nrow(rval)==0) {
                 close(conn)
                 conn<<-NULL
                 rval<-NULL
            }
            return(rval)
          }
       }
     }

     airpoll<-make.data("http://faculty.washington.edu/tlumley/NO2.dat",
             chunksize=150,
             col.names=c("logno2","logcars","temp","windsp",
                         "tempgrad","winddir","hour","day"))

     b<-bigglm(exp(logno2)~logcars+temp+windsp,
              data=airpoll, family=Gamma(log),
              start=c(2,0,0,0),maxit=10)
     summary(b)         
     ## End(Not run)

