scanone                 package:qtl                 R Documentation

_G_e_n_o_m_e _s_c_a_n _w_i_t_h _a _s_i_n_g_l_e _Q_T_L _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Genome scan with a single QTL model, with possible allowance for
     covariates, using any of several possible models for the phenotype
     and any of several possible numerical methods.

_U_s_a_g_e:

     scanone(cross, chr, pheno.col=1, model=c("normal","binary","2part","np"),
             method=c("em","imp","hk","ehk","mr","mr-imp","mr-argmax"),
             addcovar=NULL, intcovar=NULL, weights=NULL,
             use=c("all.obs", "complete.obs"), upper=FALSE,
             ties.random=FALSE, start=NULL, maxit=4000,
             tol=1e-4, n.perm, perm.Xsp=FALSE, perm.strata=NULL, verbose,
             batchsize=250)

_A_r_g_u_m_e_n_t_s:

   cross: An object of class 'cross'. See 'read.cross' for details.

     chr: Optional vector indicating the chromosomes for which LOD
          scores should be calculated.  This should be a vector of
          character strings referring to chromosomes by name; numeric
          values are converted to strings.  Refer to chromosomes with a
          preceding '-' to have all chromosomes but those considered. 
          A logical (TRUE/FALSE) vector may also be used.

pheno.col: Column number in the phenotype matrix which should be used
          as the phenotype.  This can be a vector of integers; for
          methods '"hk"' and '"imp"' this can be considerably faster
          than doing them one at a time.  One may also give a character
          strings matching the phenotype names.  Finally, one may give
          a numeric vector of phenotypes, in which case it must have
          the length equal to the number of individuals in the cross,
          and there must be either non-integers or values < 1 or > no.
          phenotypes; this last case may be useful for studying
          transformations.

   model: The phenotypic model: the usual normal model, a model for
          binary traits, a two-part model or non-parametric analysis

  method: Indicates whether to use the EM algorithm,  imputation,
          Haley-Knott regression, the extended Haley-Knott method, or
          marker regression.  Not all methods are available for all
          models. Marker regression is performed either by dropping
          individuals with missing genotypes ('"mr"'), or by first
          filling in missing data using a single imputation
          ('"mr-imp"') or by the Viterbi algorithm ('"mr-argmax"').

addcovar: Additive covariates; allowed only for the normal and binary
          models.

intcovar: Interactive covariates (interact with QTL genotype); allowed
          only for the normal and binary models.

 weights: Optional weights of individuals.  Should be either NULL or a
          vector of length n.ind containing positive weights.  Used
          only in the case 'model="normal"'.

     use: In the case that multiple phenotypes are selected to be
          scanned, this argument indicates whether to use all
          individuals,  including those missing some phenotypes, or
          just those individuals that have data on all selected
          phenotypes.

   upper: Used only for the two-part model; if true, the "undefined"
          phenotype is the maximum observed phenotype; otherwise, it is
          the smallest observed phenotype.

ties.random: Used only for the non-parametric "model"; if TRUE, ties in
          the phenotypes are ranked at random.  If FALSE, average ranks
          are used and a corrected LOD score is calculated.

   start: Used only for the EM algorithm with the normal model and no
          covariates.  If 'NULL', use the usual starting values; if
          length 1, use random initial weights for EM; otherwise, this
          should be a vector of length n+1 (where n is the number of
          possible genotypes for the cross), giving the initial values
          for EM.

   maxit: Maximum number of iterations for methods '"em"' and '"ehk"'.

     tol: Tolerance value for determining convergence for methods
          '"em"' and '"ehk"'.

  n.perm: If specified, a permutation test is performed rather than an
          analysis of the observed data.  This argument defines the
          number of permutation replicates.

perm.Xsp: If 'n.perm' > 0, so that a permutation test will be
          performed, this indicates whether separate permutations
          should be performed for the autosomes and the X chromosome,
          in order to get an X-chromosome-specific LOD threshold.  In
          this case, additional permutations are performed for the X
          chromosome.

perm.strata: If 'n.perm' > 0, this may be used to perform a stratified
          permutation test.  This should be a vector with the same
          number of individuals as in the cross data.  Unique values
          indicate the individual strata, and permutations will be
          performed within the strata.

 verbose: In the case 'n.perm' is specified, display information about
          the progress of the permutation tests.

batchsize: The number of phenotypes (or permutations) to be run as a
          batch; used only for methods '"hk"' and '"imp"'.

_D_e_t_a_i_l_s:

     Use of the EM algorithm, Haley-Knott regression, and the extended
     Haley-Knott method require that multipoint genotype probabilities
     are first calculated using 'calc.genoprob'.  The imputation method
     uses the results of 'sim.geno'.

     Individuals with missing phenotypes are dropped.

     In the case that 'n.perm'>0, so that a permutation test is
     performed, the R function 'scanone' is called repeatedly. If
     'perm.Xsp=TRUE', separate permutations are performed for the
     autosomes and the X chromosome, so that an X-chromosome-specific
     threshold may be calculated.  In this case, 'n.perm' specifies the
     number of permutations used for the autosomes; for the X
     chromosome, 'n.perm' * L_A/L_X permutations will be run, where L_A
     and L_X are the total genetic lengths of the autosomes and X
     chromosome, respectively.  More permutations are needed for the X
     chromosome in order to obtain thresholds of similar accuracy.  

     For further details on the models, the methods and the use of
     covariates, see below.

_V_a_l_u_e:

     If 'n.perm' is missing, the function returns a data.frame whose
     first two columns contain the chromosome IDs and cM positions.
     Subsequent columns contain the LOD scores for each phenotype. In
     the case of the two-part model, there are three LOD score columns
     for each phenotype: LOD(p,mu), LOD(p) and LOD(mu).   The result is
     given class '"scanone"' and has attributes  '"model"', '"method"',
     '"df"' and '"type"' (the latter is the type of cross analyzed). 

     If 'n.perm' is specified, the function returns the results of a
     permutation test and the output has class '"scanoneperm"'.  If
     'perm.Xsp=FALSE', the function returns a matrix with 'n.perm'
     rows, each row containing the genome-wide  maximum LOD score for
     each of the phenotypes.  In the case of the two-part model, there
     are three columns for each phenotype, corresponding to the three
     different LOD scores. If 'perm.Xsp=TRUE', the result contains
     separate permutation results for the autosomes and the X
     chromosome respectively, and an attribute indicates the lengths of
     the chromosomes and an indicator of which chromosome is X.

_M_o_d_e_l_s:

     *The normal model* is the standard model for QTL mapping (see
     Lander and Botstein 1989).  The residual phenotypic variation is
     assumed to follow a normal distribution, and analysis is analogous
     to analysis of variance.

     *The binary model* is for the case of a binary phenotype, which
     must have values 0 and 1.  The proportions of 1's in the different
     genotype groups are compared.  Currently only methods 'em' and
     'mr' are available for this model.  See Xu and Atchley (1996) and
     Broman (2003).

     *The two-part model* is appropriate for the case of a spike in the
     phenotype distribution (for example, metastatic density when many
     individuals show no metastasis, or survival time following an
     infection when individuals may recover from the infection and fail
     to die).  The two-part model was described by  Boyartchuk et al.
     (2001) and Broman (2003).  Individuals with QTL genotype g have
     probability p[g] of having an undefined phenotype (the spike),
     while if their phenotype is defined, it comes from a normal
     distribution with mean mu[g] and common standard deviation s.
     Three LOD scores are calculated: LOD(p,mu) is for the test of the
     hypothesis that p[g] = p and mu[g] = mu. LOD(p) is for the test
     that p[g] = p while the mu[g] may vary. LOD(mu) is for the test
     that mu[g] = mu while the p[g] may vary. 

     *With the non-parametric "model"*, an extension of the
     Kruskal-Wallis test is used; this is similar to the method
     described by Kruglyak and Lander (1995).  In the case of
     incomplete genotype information (such as at locations between
     genetic markers), the Kruskal-Wallis statistic is modified so that
     the rank for each individual is weighted by the genotype
     probabilities, analogous to Haley-Knott regression.  For this
     method, if the argument 'ties.random' is TRUE, ties in the
     phenotypes are assigned random ranks; if it is FALSE, average
     ranks are used and a corrected LOD score is calculate.  Currently
     the 'method' argument is ignored for this model.

_M_e_t_h_o_d_s:

     *'em'*: maximum likelihood is performed via the EM algorithm
     (Dempster et al. 1977), first used in this context by Lander and
     Botstein (1989).

     *'imp'*: multiple imputation is used, as described by Sen and
     Churchill (2001).

     *'hk'*: Haley-Knott regression is used (regression of the
     phenotypes on the multipoint QTL genotype probabilities), as
     described by Haley and Knott (1992).

     *'ehk'*: the extended Haley-Knott method is used (like H-K, but
     taking account of the variances), as described in Feenstra et al.
     (2006).

     *'mr'*: Marker regression is used.  Analysis is performed only at
     the genetic markers, and individuals with missing genotypes are
     discarded.  See Soller et al. (1976).

_C_o_v_a_r_i_a_t_e_s:

     Covariates are allowed only for the normal and binary models.  The
     normal model is y = b[q] + A g + Z d[q] + e where _q_ is the
     unknown QTL genotype, _A_ is a matrix of additive covariates, and
     _Z_ is a matrix of covariates that interact with the QTL genotype.
      The columns of _Z_ are forced to be contained in the matrix _A_. 
     The binary model is the logistic regression analog.

     The LOD score is calculated comparing the likelihood of the above
     model to that of the null model y = m + A g + e.

     Covariates must be numeric matrices.  Individuals with any missing
     covariates are discarded.

_X _c_h_r_o_m_o_s_o_m_e:

     The X chromosome must be treated specially in QTL mapping.  See
     Broman et al. (in press).

     If both males and females are included, male hemizygotes are
     allowed to be different from female homozygotes.  Thus, in a
     backcross, we will fit separate means for the genotype classes AA,
     AB, AY, and BY.  In such cases, sex differences in the phenotype
     could cause spurious linkage to the X chromosome, and so the null
     hypothesis must be changed to allow for a sex difference in the
     phenotype.

     Numerous special cases must be considered, as detailed in the
     following table. 

       *BC*               *Sexes*            *Null*            *Alternative*     *df*
                          both sexes          sex               AA/AB/AY/BY       2
                          all female       grand mean              AA/AB          1
                          all male         grand mean              AY/BY          1
                                                                                   
       *F2*  *Direction*  *Sexes*            *Null*            *Alternative*     *df*
             Both         both sexes  femaleF/femaleR/male  AA/ABf/ABr/BB/AY/BY   3
                          all female          pgm              AA/ABf/ABr/BB      2
                          all male         grand mean              AY/BY          1
             Forward      both sexes          sex               AA/AB/AY/BY       2
                          all female       grand mean              AA/AB          1
                          all male         grand mean              AY/BY          1
             Backward     both sexes          sex               AB/BB/AY/BY       2
                          all female       grand mean              AB/BB          1
                          all male         grand mean              AY/BY          1

     In the case that the number of degrees of freedom for the linkage
     test for the X chromosome is different from that for autosomes, a
     separate X-chromosome LOD threshold is recommended.  Autosome- and
     X-chromosome-specific LOD thresholds may be estimated by
     permutation tests with 'scanone' by setting 'n.perm'>0 and using
     'perm.Xsp=TRUE'.

_A_u_t_h_o_r(_s):

     Karl W Broman, kbroman@biostat.wisc.edu; Hao Wu

_R_e_f_e_r_e_n_c_e_s:

     Boyartchuk V. L., Broman, K. W., Mosher, R. E., D'Orazio S. E. F.,
     Starnbach, M. N. and Dietrich, W. F. (2001) Multigenic control of
     _Listeria monocytogenes_ susceptibility in mice. _Nature Genetics_
     *27*, 259-260.

     Broman,  K. W. (2003) Mapping quantitative trait loci in the case
     of a spike in the phenotype distribution. _Genetics_ *163*,
     1169-1175. 

     Broman, K. W., Sen, \'S, Owens, S. E., Manichaikul, A.,
     Southard-Smith, E. M. and Churchill G. A.  The X chromosome in
     quantitative trait locus mapping.  _Genetics_, to appear.

     Churchill, G. A. and Doerge, R. W. (1994) Empirical threshold
     values for quantitative trait mapping.  _Genetics_ *138*, 963-971.

     Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum
     likelihood from incomplete data via the EM algorithm.  _J. Roy.
     Statist. Soc._ B, *39*, 1-38.

     Feenstra, B., Skovgaard, I. M. and Broman, K. W. (2006) Mapping
     quantitative trait loci by an extension of the Haley-Knott
     regression method using estimating equations. _Genetics_, *173*,
     2111-2119. 

     Haley, C. S. and Knott, S. A. (1992) A simple regression method
     for mapping quantitative trait loci in line crosses using flanking
     markers. _Heredity_ *69*, 315-324.

     Kruglyak, L. and Lander, E. S. (1995) A nonparametric approach for
     mapping quantitative trait loci.  _Genetics_ *139*, 1421-1428. 

     Lander, E. S. and Botstein, D. (1989) Mapping Mendelian factors
     underlying quantitative traits using RFLP linkage maps. 
     _Genetics_ *121*, 185-199.

     Sen, \'S. and Churchill, G. A. (2001) A statistical framework for
     quantitative trait mapping.  _Genetics_ *159*, 371-387.

     Soller, M., Brody, T. and Genizi, A. (1976) On the power of
     experimental designs for the detection of linkage between marker
     loci and quantitative loci in crosses between inbred lines.
     _Theor. Appl. Genet._ *47*, 35-39. 

     Xu, S., and Atchley, W.R. (1996) Mapping quantitative trait loci
     for complex binary diseases using line crosses. _Genetics_ *143*,
     1417-1424.

_S_e_e _A_l_s_o:

     'plot.scanone',  'summary.scanone', 'scantwo', 'calc.genoprob',
     'sim.geno', 'max.scanone', 'summary.scanoneperm', '-.scanone',
     '+.scanone'

_E_x_a_m_p_l_e_s:

     ###################
     # Normal Model
     ###################
     data(hyper)

     # Genotype probabilities for EM and H-K
     hyper <- calc.genoprob(hyper, step=2.5)
     out.em <- scanone(hyper, method="em")
     out.hk <- scanone(hyper, method="hk")

     # Summarize results: peaks above 3
     summary(out.em, thr=3)
     summary(out.hk, thr=3)

     # An alternate method of summarizing:
     #     patch them together and then summarize
     out <- c(out.em, out.hk)
     summary(out, thr=3, format="allpeaks")

     # Plot the results
     plot(out.hk, out.em)
     plot(out.hk, out.em, chr=c(1,4), lty=1, col=c("blue","black"))

     # Imputation; first need to run sim.geno
     # Do just chromosomes 1 and 4, to save time
     hyper.c1n4 <- sim.geno(subset(hyper, chr=c(1,4)),
                            step=2.5, n.draws=8)
     out.imp <- scanone(hyper.c1n4, method="imp")
     summary(out.imp, thr=3)

     # Plot all three results
     plot(out.imp, out.hk, out.em, chr=c(1,4), lty=1,
          col=c("red","blue","black"))

     # extended Haley-Knott
     out.ehk <- scanone(hyper, method="ehk")
     plot(out.hk, out.em, out.ehk, chr=c(1,4))

     # Permutation tests
     ## Not run: 
     permo <- scanone(hyper, method="hk", n.perm=1000)
     ## End(Not run)

     # Threshold from the permutation test
     summary(permo, alpha=c(0.05, 0.10))

     # Results above the 0.05 threshold
     summary(out.hk, perms=permo, alpha=0.05)

     ####################
     # scan with square-root of phenotype
     #   (Note that pheno.col can be a vector of phenotype values)
     ####################
     out.sqrt <- scanone(hyper, pheno.col=sqrt(pull.pheno(hyper, 1)))
     plot(out.em - out.sqrt, ylim=c(-0.1,0.1),
          ylab="Difference in LOD")
     abline(h=0, lty=2, col="gray")

     ####################
     # Stratified permutations
     ####################
     extremes <- (nmissing(hyper)/totmar(hyper) < 0.5)

     ## Not run: 
     operm.strat <- scanone(hyper, method="hk", n.perm=1000,
                            perm.strata=extremes)
     ## End(Not run)

     summary(operm.strat)


     ####################
     # X-specific permutations
     ####################
     data(fake.f2)

     fake.f2 <- calc.genoprob(fake.f2, step=2.5)

     # genome scan
     out <- scanone(fake.f2, method="hk")

     # X-chr-specific permutations
     ## Not run: 
     operm <- scanone(fake.f2, method="hk", n.perm=1000, perm.Xsp=TRUE)
     ## End(Not run)

     # thresholds
     summary(operm)

     # scanone summary with p-values
     summary(out, perms=operm, alpha=0.05, pvalues=TRUE)


     ###################
     # Non-parametric
     ###################
     out.np <- scanone(hyper, model="np")
     summary(out.np, thr=3)

     # Plot with previous results
     plot(out.np, chr=c(1,4), lty=1, col="green")
     plot(out.imp, out.hk, out.em, chr=c(1,4), lty=1,
          col=c("red","blue","black"), add=TRUE)

     ###################
     # Two-part Model
     ###################
     data(listeria)

     listeria <- calc.genoprob(listeria,step=2.5)
     out.2p <- scanone(listeria, model="2part", upper=TRUE)
     summary(out.2p, thr=c(5,3,3), format="allpeaks")

     # Plot all three LOD scores together
     plot(out.2p, out.2p, out.2p, lodcolumn=c(2,3,1), lty=1, chr=c(1,5,13),
          col=c("red","blue","black"))

     # Permutation test
     ## Not run: 
     permo <- scanone(listeria, model="2part", upper=TRUE,
                      n.perm=1000)
     ## End(Not run)

     # Thresholds
     summary(permo)

     ###################
     # Binary model
     ###################
     binphe <- as.numeric(pull.pheno(listeria,1)==264)
     out.bin <- scanone(listeria, pheno.col=binphe, model="binary")
     summary(out.bin, thr=3)

     # Plot LOD for binary model with LOD(p) from 2-part model
     plot(out.bin, out.2p, lodcolumn=c(1,2), lty=1, col=c("black", "red"),
          chr=c(1,5,13))

     # Permutation test
     ## Not run: 
     permo <- scanone(listeria, pheno.col=binphe, model="binary",
                      n.perm=1000)
     ## End(Not run)

     # Thresholds
     summary(permo)

     ###################
     # Covariates
     ###################
     data(fake.bc)

     plot(fake.bc)
     fake.bc <- calc.genoprob(fake.bc, step=2.5)

     # genome scans without covariates
     out.nocovar <- scanone(fake.bc)

     # genome scans with covariates
     ac <- pull.pheno(fake.bc, c("sex","age"))
     ic <- pull.pheno(fake.bc, "sex")

     out.covar <- scanone(fake.bc, pheno.col=1,
                          addcovar=ac, intcovar=ic)
     summary(out.nocovar, thr=3)
     summary(out.covar, thr=3)
     plot(out.covar, out.nocovar, chr=c(2,5,10))

