stepwiseqtl               package:qtl               R Documentation

_S_t_e_p_w_i_s_e _s_e_l_e_c_t_i_o_n _f_o_r _m_u_l_t_i_p_l_e _Q_T_L

_D_e_s_c_r_i_p_t_i_o_n:

     Performs forward/backward selection to identify a multiple QTL
     model, with model choice made via a penalized LOD score, with
     separate penalties on main effects and interactions.

_U_s_a_g_e:

     stepwiseqtl(cross, chr, pheno.col=1, qtl, formula, max.qtl=10, covar=NULL,
                 method=c("imp", "hk"), incl.markers=TRUE, refine.locations=TRUE,
                 additive.only=FALSE, scan.pairs=FALSE, penalties,
                 keeplodprofile=FALSE, keeptrace=FALSE, verbose=TRUE)

_A_r_g_u_m_e_n_t_s:

   cross: An object of class 'cross'. See 'read.cross' for details.

     chr: Optional vector indicating the chromosomes to consider in
          search for QTL.  This should be a vector of character strings
          referring to chromosomes by name; numeric values are
          converted to strings.  Refer to chromosomes with a preceding
          '-' to have all chromosomes but those considered.  A logical
          (TRUE/FALSE) vector may also be used.

pheno.col: Column number in the phenotype matrix which should be used
          as the phenotype.  One may also give character strings
          matching the phenotype names.  Finally, one may give a
          numeric vector of phenotypes, in which case it must have the
          length equal to the number of individuals in the cross, and
          there must be either non-integers or values < 1 or > no.
          phenotypes; this last case may be useful for studying
          transformations.

     qtl: Optional QTL object (of class '"qtl"', as created by
          'makeqtl') to use as a starting point.

 formula: Optional formula to define the QTL model to be used as a
          starting point.

 max.qtl: Maximum number of QTL to which forward selection should
          proceed.

   covar: Data frame of additive covariates.

  method: Indicates whether to use multiple imputation or Haley-Knott
          regression.

incl.markers: If FALSE, do calculations only at points on an evenly
          spaced grid.

refine.locations: If TRUE, use 'refineqtl' to refine the QTL locations
          after each step of forward and backward selection.

additive.only: If TRUE, allow only additive QTL models; if FALSE,
          consider also pairwise interactions among QTL.

scan.pairs: If TRUE, perform a two-dimensional, two-QTL scan at each
          step of forward selection.

penalties: Vector of three values indicating the penalty on main
          effects and heavy and light penalties on interactions.  See
          the Details below. If missing, default values are used that
          are based on simulations of backcrosses and intercrosses with
          genomes modeled after that of the mouse.

keeplodprofile: If TRUE, keep the LOD profiles from the last iteration
          as attributes to the output.

keeptrace: If TRUE, keep information on the sequence of models visited
          through the course of forward and backward selection as an
          attribute to the output.

 verbose: If TRUE, give feedback about progress.  If 'verbose' is an
          integer > 1, even more information is printed.

_D_e_t_a_i_l_s:

     We seek to identify the model with maximal penalized LOD score. 
     The penalized LOD score, defined in Manichaikul et al. (in
     preparation), is the LOD score for the model (the log10 likelihood
     ratio comparing the model to the null model with no QTL) with
     penalties on the number of QTL and QTL:QTL interactions.

     We consider QTL models allowing pairwise interactions among QTL
     but with an enforced hierarchy in which inclusion of a pairwise
     interaction requires the inclusion of both of the corresponding
     main effects.  Additive covariates may be included, but currently
     we do not explore QTL:covariate interactions.  Also, the penalized
     LOD score criterion is currently defined only for autosomal loci,
     and results with the X chromosome should be considered with
     caution.

     The penalized LOD score is of the form pLOD(g) = LOD(g) - Tm pm -
     Th ph - Tl pl where g denotes a model, pm is the number of QTL in
     the model ("main effects"), ph is the number of pairwise
     interactions that will be given a heavy interaction penalty, pl is
     the number of pairwise interactions that will be given a light
     interaction penalty, Tm is the penalty on main effects, Th is the
     heavy interaction penalty, and Tl is the light interaction
     penalty.  The 'penalties' argument is the vector (Tm, Th, Tl).  If
     Tl is missing ('penalties' has a vector of length 2), we assume Tl
     = Th, and so all pairwise interactions are assigned the same
     penalty.  

     The "heavy" and "light" interaction penalties can be a bit
     confusing.  Consider the clusters of QTL that are connected via
     one or more pairwise interactions.  To each such cluster, we
     assign at most one "light" interaction penalty, and give all other
     pairwise interactions the heavy interaction penalty.  In other
     words, if  pi is the total number of pairwise interactions for a
     QTL model, we let pl be the number of clusters of connected QTL
     with at least one pairwise interaction, and then let ph = pi - pl.

     Let us give an explicit example.  Consider a model with 6 QTL, and
     with interactions between QTL 2 and 3, QTL 4 and 5 and QTL 4 and 6
     (so we have the model formula 'y ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 +
     Q2:Q3 + Q4:Q5 + Q4:Q6'). There are three clusters of connected
     QTL: (1), (2,3) and (4,5,6).  We would assign 6 main effect
     penalties (Tm), 2 light interaction penalties (Tl), and 1 heavy
     interaction penalty (Th).  

     Manichaikul et al. (in prepartion) described a system for deriving
     the three penalties on the basis of permutation results from a
     two-dimensional, two-QTL genome scan (as calculated with
     'scantwo').  These may be calculated with the function
     'calc.penalties'.

     A forward/backward search method is used, with the aim to optimize
     the penalized LOD score criterion.  That is, we seek to identify
     the model with maximal the penalized LOD score.  The search
     algorithm was based closely on an algorithm described by Zeng et
     al. (1999).

     We use forward selection to a model of moderate size (say 10 QTL),
     followed by backward elimination all the way to the null model. 
     The chosen model is that which optimizes the penalized LOD score
     criterion, among all models visited.  The detailed algorithm is as
     follows.  Note that if 'additive.only=TRUE', no pairwise
     interactions are considered.


        1.  Start at the null model, and perform a single-QTL genome
           scan, and choose the position giving the largest LOD score. 
           If 'scan.pairs=TRUE', start with a two-dimensional, two-QTL
           genome scan instead.  If an initial QTL model were defined
           through the arguments 'qtl' and 'formula', start with this
           model and jump immediately to step 2.

        2.  With a fixed QTL model in hand:

           1.  Scan for an additional additive QTL.

           2.  For each QTL in the current model, scan for an
              additional interacting QTL.

           3.  If there are >= 2 QTL in the current model, consider
              adding one of the possible pairwise interactions.

           4.  If 'scan.pairs=TRUE' perform a two-dimensional, two-QTL
              scan, seeking to add a pair of novel QTL, either additive
              or interacting.

           5.  Step to the model that gives the largest value for the
              model comparison criterion, among those considered at the
              current step.


        3.  Refine the locations of the QTL in the current model (if
           'refine.locations=TRUE').  

        4.  Repeat steps 2 and 3 up to a model with some pre-determined
           number of loci.

        5.  Perform backward elimination, all the way back to the null
           model.  At each step, consider dropping one of the current
           main effects or interactions; move to the model that
           maximizes the model comparison criterion, among those
           considered at this step.  Follow this with a refinement of
           the locations of the QTL.

        6.  Finally, choose the model having the largest model
           comparison criterion, among all models visited.

     In this forward/backward algorithm, it is likely best to build up
     to an overly large model and then prune it back.  Note that there
     is no "stopping rule"; the chosen model is that which optimizes
     the model comparison criterion, among all models visited.  The
     search can be time consuming, particularly if a two-dimensional
     scan is performed at each forward step.  Such two-dimensional
     scans may be useful for identifying QTL linked in repulsion
     (having effects of opposite sign) or interacting QTL with limited
     marginal effects, but our limited experience suggests that they
     are not necessary; important linked or interacting QTL pairs can
     be picked up in the forward selection to a large model, and will
     be retained in the backward elimination phase.

_V_a_l_u_e:

     The output is a representation of the best model, as measured by
     the penalized LOD score (see Details), among all models visited.
     This is QTL object (of class '"qtl"', as produced by 'makeqtl'),
     with attributes '"formula"', indicating the model formula, and
     '"pLOD"' indicating the penalized LOD score.

     If 'keeplodprofile=TRUE', LOD profiles from the last pass through
     the refinement algorithm are retained as an attribute,
     '"lodprofile"', to the object.  These may be plotted with
     'plotLodProfile'.  

     If 'keeptrace=TRUE', the output will contain an attribute
     '"trace"' containing information on the best model at each step of
     forward and backward elimination.  This is a list of objects of
     class '"compactqtl"', which is similar to a QTL object (as
     produced by 'makeqtl') but containing just  a vector of chromosome
     IDs and positions for the QTL.  Each will also have attributes
     '"formula"' (containing the model formula) and '"pLOD"'
     (containing the penalized LOD score.

_M_e_t_h_o_d_s:

     *'imp'*: multiple imputation is used, as described by Sen and
     Churchill (2001).

     *'hk'*: Haley-Knott regression is used (regression of the
     phenotypes on the multipoint QTL genotype probabilities), as
     described by Haley and Knott (1992).

_A_u_t_h_o_r(_s):

     Karl W Broman, kbroman@biostat.wisc.edu

_R_e_f_e_r_e_n_c_e_s:

     Manichaikul, A., Moon, J. Y., Sen, \'S, Yandell, B. S. and Broman,
     K. W. (2009) A model selection approach for the identification of
     quantitative trait loci in experimental crosses, allowing
     epistasis. _Genetics_, to appear.

     Broman,  K. W. and Speed, T. P. (2002) A model selection approach
     for the identification of quantitative trait loci in experimental
     crosses (with discussion). _J Roy Stat Soc B_ *64*, 641-656,
     731-775.

     Haley, C. S. and Knott, S. A. (1992) A simple regression method
     for mapping quantitative trait loci in line crosses using flanking
     markers. _Heredity_ *69*, 315-324.

     Sen, \'S. and Churchill, G. A. (2001) A statistical framework for
     quantitative trait mapping.  _Genetics_ *159*, 371-387.

     Zeng, Z.-B., Kao, C.-H. and Basten, C. J. (1999) Estimating the
     genetic architecture of quantitative traits.  _Genetical
     Research_, *74*, 279-289.

_S_e_e _A_l_s_o:

     'calc.penalties', 'plotModel', 'makeqtl', 'fitqtl', 'refineqtl',
     'addqtl', 'addpair'

_E_x_a_m_p_l_e_s:

     data(fake.bc)

     fake.bc <- calc.genoprob(fake.bc, step=2.5)
     outsw <- stepwiseqtl(fake.bc, max.qtl=3, method="hk", keeptrace=TRUE)

     # best model
     outsw
     plotModel(outsw)

     # path through model space
     thetrace <- attr(outsw, "trace")

     # plot of these
     par(mfrow=c(3,3))
     for(i in seq(along=thetrace))
       plotModel(thetrace[[i]], main=paste("pLOD =",round(attr(thetrace[[i]],"pLOD"), 2)))

