Package 'qtl2pleio'

Title: Testing Pleiotropy in Multiparental Populations
Description: We implement an adaptation of Jiang & Zeng's (1995) <https://www.genetics.org/content/140/3/1111> likelihood ratio test for testing the null hypothesis of pleiotropy against the alternative hypothesis, two separate quantitative trait loci. The test differs from that in Jiang & Zeng (1995) <https://www.genetics.org/content/140/3/1111> and that in Tian et al. (2016) <doi:10.1534/genetics.115.183624> in that our test accommodates multiparental populations.
Authors: Frederick J Boehm [aut, cre]
Maintainer: Frederick J Boehm <[email protected]>
License: MIT + file LICENSE
Version: 1.4.3.9000
Built: 2024-11-05 05:44:15 UTC
Source: https://github.com/fboehm/qtl2pleio

Help Index


Add physical map contents to tibble

Description

Add physical map contents to tibble

Usage

add_pmap(tib, pmap)

Arguments

tib

a tibble with 3 columns: marker, trace, and profile lod values, typically outputted by calc_profile_lods()

pmap

a physical map for a single chromosome

Value

a tibble with 4 columns: marker, trait, profile_lod, marker_position

Examples

pm <- 1:3
names(pm) <- as.character(paste0('m', 1:3))
expand.grid(paste0('m', 1:3), paste0('m', 1:3)) %>%
    tibble::as_tibble() %>%
    dplyr::mutate(log10lik = rgamma(9, 5)) %>%
    calc_profile_lods() %>%
    add_pmap(pm)

Perform bootstrap sampling and calculate test statistic for each bootstrap sample

Description

Create a bootstrap sample, perform multivariate QTL scan, and calculate log10 LRT statistic

Usage

boot_pvl(
  probs,
  pheno,
  addcovar = NULL,
  kinship = NULL,
  start_snp = 1,
  n_snp,
  pleio_peak_index,
  nboot = 1,
  max_iter = 10000,
  max_prec = 1/1e+08,
  cores = 1
)

Arguments

probs

founder allele probabilities three-dimensional array for one chromosome only (not a list)

pheno

n by d matrix of phenotypes

addcovar

n by c matrix of additive numeric covariates

kinship

a kinship matrix, not a list

start_snp

positive integer indicating index within probs for start of scan

n_snp

number of (consecutive) markers to use in scan

pleio_peak_index

positive integer index indicating genotype matrix for bootstrap sampling. Typically acquired by using 'find_pleio_peak_tib'.

nboot

number of bootstrap samples to acquire and scan

max_iter

maximum number of iterations for EM algorithm

max_prec

stepwise precision for EM algorithm. EM stops once incremental difference in log likelihood is less than max_prec

cores

number of cores to use when calling mclapply to parallelize the bootstrap analysis.

Details

Performs a parametric bootstrap method to calibrate test statistic values in the test of pleiotropy vs. separate QTL. It begins by inferring parameter values at the 'pleio_peak_index' index value in the object 'probs'. It then uses these inferred parameter values in sampling from a multivariate normal distribution. For each of the 'nboot' sampled phenotype vectors, a two-dimensional QTL scan, starting at the marker indexed by 'start_snp' within the object 'probs' and extending for a total of 'n_snp' consecutive markers. The two-dimensional scan is performed via the function 'scan_pvl_clean'. For each two-dimensional scan, a log10 likelihood ratio test statistic is calculated. The outputted object is a vector of 'nboot' log10 likelihood ratio test statistics from 'nboot' distinct bootstrap samples.

Value

numeric vector of (log) likelihood ratio test statistics from 'nboot_per_job' bootstrap samples

References

Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.

Walling GA, Visscher PM, Haley CS (1998) A comparison of bootstrap methods to construct confidence intervals in QTL mapping. Genet. Res. 71: 171–180.

Examples

n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
boot_pvl(probs = probs, pheno = pheno,
        start_snp = 1, n_snp = 5, pleio_peak_index = 3, nboot = 1, cores = 1)

Calculate estimated allele effects, B matrix

Description

Calculate estimated allele effects, B matrix

Usage

calc_Bhat(X, Sigma_inv, Y)

Arguments

X

dn by df block-diagonal design matrix that incorporates genetic info for d markers. Note that we can use the same marker data twice.

Sigma_inv

dn by dn inverse covariance matrix, often composed as the inverse of KVg+InVeK \otimes V_g + I_n \otimes V_e

Y

dn by 1 matrix, ie, a column vector, of d phenotypes' measurements

Value

a df by 1 matrix of GLS-estimated allele effects

Examples

X1 <- as.matrix(rbinom(n = 100, size = 1, prob = 1 / 2))
X <- gemma2::stagger_mats(X1, X1)
Sigma_inv <- diag(200)
Y <- runif(200)
calc_Bhat(X, Sigma_inv, Y)

Calculate Vg and Ve from d-variate phenotype and kinship

Description

Calculate Vg and Ve from d-variate phenotype and kinship

Usage

calc_covs(
  pheno,
  kinship,
  X1pre = rep(1, nrow(kinship)),
  max_iter = 1e+06,
  max_prec = 1/1e+08,
  covariates = NULL
)

Arguments

pheno

n by d matrix of phenotypes

kinship

a kinship matrix, n by n

X1pre

n by c design matrix. c = 1 to ignore genotypes

max_iter

maximum number of EM iterations

max_prec

maximum precision for stepwise increments in EM algorithm

covariates

a n by n.cov matrix of numeric covariates

Value

a list with 2 named components, Vg and Ve. Each is a d by d covariance matrix.

Examples

calc_covs(pheno = matrix(data = rnorm(100), nrow = 50, ncol = 2), kinship = diag(50))

Calculate matrix inverse square root for a covariance matrix

Description

Calculate matrix inverse square root for a covariance matrix

Usage

calc_invsqrt_mat(A)

Arguments

A

covariance matrix


Calculate a likelihood ratio test statistic from the output of scan_pvl()

Description

Calculate a likelihood ratio test statistic from the output of scan_pvl()

Usage

calc_lrt_tib(scan_pvl_out)

Arguments

scan_pvl_out

outputted tibble from scan_pvl

Value

a number, the (log) likelihood ratio test statistic

Examples

rep(paste0('Marker', 1:3), times = 3) -> marker1
rep(paste0('Marker', 1:3), each = 3) -> marker2
runif(9, -1, 0) -> ll
tibble::tibble(marker1, marker2, ll) -> scan_out
calc_lrt_tib(scan_out)

Calculate profile lods for all traits

Description

Calculate profile lods for all traits

Usage

calc_profile_lods(scan_pvl_out)

Arguments

scan_pvl_out

tibble outputted from scan_pvl

Value

a tibble with 3 columns, indicating 'marker identity, trace (pleiotropy or profile1, profile2, etc.), and value of the profile lod (base 10) for that trace at that marker.


Calculate the phenotypes covariance matrix Sigma

Description

Calculate the phenotypes covariance matrix Sigma

Usage

calc_Sigma(Vg, Ve, kinship = NULL, n_mouse = nrow(kinship))

Arguments

Vg

d by d genetic covariance matrix for the d phenotypes

Ve

d by d error covariance matrix for the d phenotypes

kinship

optional n by n kinship matrix. if NULL, Vg is not used.

n_mouse

number of subjects

Value

dn by dn covariance matrix


Calculate matrix square root for a covariance matrix

Description

Calculate matrix square root for a covariance matrix

Usage

calc_sqrt_mat(A)

Arguments

A

covariance matrix


Check whether a vector, x, has all its entries equal to its first entry

Description

Check whether a vector, x, has all its entries equal to its first entry

Usage

check_identical(x)

Arguments

x

a vector

Value

a logical indicating whether all vector entries are the same

Examples

x <- 1:5
check_identical(x)
y <- rep(1, 5)
check_identical(y)

Check for missingness in phenotypes or covariates

Description

We use 'is.finite' from base R to identify those subjects that have one or more missing values in 'input_matrix'. We then return a character vector of subjects that have no missingness in 'input_matrix'.

Usage

check_missingness(input_matrix)

Arguments

input_matrix

phenotypes or covariates matrix

Value

character vector of subjects that have no missingness


Convert 'scan_multi_oneqtl' output of 'qtl2::scan1' output

Description

We convert output of 'scan_multi_oneqtl' into format outputted by 'qtl2::scan1'.

Usage

convert_to_scan1_output(sm_output, trait_name)

Arguments

sm_output

tibble output from scan_multi_oneqtl for one chromosome only

trait_name

character vector (of length one) specifying the trait names

Value

object of class 'scan1'

Examples

# read data
iron <- qtl2::read_cross2(system.file("extdata", "iron.zip", package="qtl2"))


# insert pseudomarkers into map
map <- qtl2::insert_pseudomarkers(iron$gmap, step=1)

# calculate genotype probabilities
probs <- qtl2::calc_genoprob(iron, map, error_prob=0.002)

# grab phenotypes and covariates; ensure that covariates have names attribute
pheno <- iron$pheno
covar <- match(iron$covar$sex, c("f", "m")) # make numeric
names(covar) <- rownames(iron$covar)
Xcovar <- qtl2::get_x_covar(iron)

aprobs <- qtl2::genoprob_to_alleleprob(probs)
sm_out <- scan_multi_oneqtl(probs = aprobs, pheno = pheno)
sm_to_s1 <- convert_to_scan1_output(sm_out[[1]], trait_name = "tr1and2")

# 95% Bayes credible interval for QTL on chr 7, first phenotype
qtl2::bayes_int(sm_to_s1, map)

Find the marker index corresponding to the peak of the pleiotropy trace in a tibble where the last column contains log likelihood values and the first d columns contain marker ids

Description

Find the marker index corresponding to the peak of the pleiotropy trace in a tibble where the last column contains log likelihood values and the first d columns contain marker ids

Usage

find_pleio_peak_tib(tib, start_snp)

Arguments

tib

a (d+1) column tibble with first d columns containing marker ids and the last containing log likelihood values. Typically this is the output from 'scan_pvl'.

start_snp

positive integer, from the two-dimensional scan, that indicates where the scan started on the chromosome

Value

positive integer indicating marker index for maximum value of log lik under pleiotropy

Examples

marker1 <- rep(paste0('SNP', 1:3), times = 3)
marker2 <- rep(paste0('SNP', 1:3), each = 3)
loglik <- runif(9, -5, 0)
tibble::tibble(marker1, marker2, loglik) -> tib
find_pleio_peak_tib(tib, start_snp = 1)

Fit a model for a specified d-tuple of markers

Description

'fit1_pvl' uses several functions in the package qtl2pleio to fit the linear mixed effects model for a single d-tuple of markers. Creation of 'fit1_pvl' - from code that originally resided in 'scan_pvl', enabled parallelization via the 'parallel' R package.

Usage

fit1_pvl(indices, start_snp, probs, addcovar, inv_S, S, pheno)

Arguments

indices

a vector of indices for extracting elements of 'probs' array

start_snp

an integer to specify the index of the marker where the scan - in call to scan_pvl - starts. This argument is needed because 'mytab' has only relative indices (relative to the 'start_snp' marker)

probs

founder allele probabilities array

addcovar

additive covariates matrix

inv_S

inverse covariance matrix for the vectorized phenotype

S

covariance matrix for the vectorized phenotype, ie, the inverse of inv_S. By making this a function input, we avoid inverting the matrix many many times.

pheno

a n by d phenotypes matrix

Value

a number, the log-likelihood for the specified model

Examples

n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
Vg <- diag(2)
Ve <- diag(2)
Sigma <- calc_Sigma(Vg, Ve, diag(n))
Sigma_inv <- solve(Sigma)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
mytab <- prep_mytab(d_size = 2, n_snp = 5)
fit1_pvl(mytab[1, ], start_snp = 1,
probs = probs, addcovar = NULL, inv_S = Sigma_inv,
S = Sigma,
pheno = pheno
)

Extract founder allele effects at a single marker from output of qtl2::scan1coef

Description

Extract founder allele effects at a single marker from output of qtl2::scan1coef

Usage

get_effects(marker_index, allele_effects_matrix, map, columns = 1:8)

Arguments

marker_index

an integer indicating where in the 'map' object the peak position (or position of interest) is located

allele_effects_matrix

output of 'qtl2::scan1coef' for a single chromosome

map

a map object for the chromosome of interest

columns

which columns to choose within the 'allele_effects_matrix'. Default is 1:8 to reflect 8 founder alleles of Diversity Outbred mice

Value

a vector of 8 founder allele effects at a single marker

a vector of founder allele effects at a single marker

Examples

# set up allele effects matrix
ae <- matrix(dat = rnorm(100 * 8), ncol = 8, nrow = 100)
ae[, 8] <- - rowSums(ae[, 1:7])
colnames(ae) <- LETTERS[1:8]
rownames(ae) <- paste0(1, "_", 1:100)
# set up map
map <- 1:100
names(map) <- rownames(ae)
# call get_effects
get_effects(marker_index = 15, allele_effects_matrix = ae, map = map)

Identify shared subject ids among all inputs: covariates, allele probabilities array, kinship, and phenotypes

Description

We consider only those inputs that are not NULL. We then use 'intersect' on pairs of inputs' rownames to identify those subjects are shared among all non-NULL inputs.

Usage

make_id2keep(probs, pheno, addcovar = NULL, kinship = NULL)

Arguments

probs

an allele probabilities array

pheno

a phenotypes matrix

addcovar

a covariates matrix

kinship

a kinship matrix

Value

a character vector of subject IDs common to all (non-null) inputs


Plot tidied results of a pvl scan

Description

Plot tidied results of a pvl scan

Usage

plot_pvl(
  dat,
  units = "Mb",
  palette = c("#999999", "#E69F00", "#56B4E9"),
  linetype = c("solid", "longdash", "dotted")
)

Arguments

dat

a profile lod tibble

units

a character vector of length one to indicate units for physical or genetic map

palette

a character vector of length 3 containing strings for colors

linetype

a character vector of length 3 specifying the linetype values for the 3 traces

Value

a ggplot object with profile LODs


Prepare mytab object for use within scan_pvl R code

Description

Prepare mytab object for use within scan_pvl R code

Usage

prep_mytab(d_size, n_snp, pvl = TRUE)

Arguments

d_size

an integer, the number of traits

n_snp

an integer, the number of markers

pvl

logical indicating whether to output dataframe with all d-tuples for a d-QTL scan, or only those models that examine one marker at a time.

Value

a data.frame with d_size + 1 columns and (n_snp)^d_size rows. Last column is NA and named loglik.

Examples

prep_mytab(2, 10)

Create a list of component X matrices for input to stagger_mats, to ultimately create design matrix

Description

Create a list of component X matrices for input to stagger_mats, to ultimately create design matrix

Usage

prep_X_list(indices, start_snp, probs, covariates)

Arguments

indices

a vector of integers

start_snp

an integer denoting the index (within genotype probabilities array) where the scan should start

probs

a three-dimensional array of genotype probabilities for a single chromosome

covariates

a matrix of covariates

Value

a list of design matrices, ultimately useful when constructing the (multi-locus) design matrix

Examples

pp <- array(rbinom(n = 200, size = 1, prob = 0.5), dim = c(10, 2, 10))
prep_X_list(1:3, 1, probs = pp, covariates = NULL)

Process inputs to scan functions

Description

Process inputs to scan functions

Usage

process_inputs(
  probs,
  pheno,
  addcovar,
  kinship,
  n_snp = dim(probs)[3],
  start_snp = 1,
  max_iter = 10^4,
  max_prec = 1/10^8
)

Arguments

probs

a three-dimensional array of founder allele probabilities

pheno

a matrix of d trait values

addcovar

a matrix of covariates

kinship

a kinship matrix

n_snp

number of markers

start_snp

index number of start position in the probs object.

max_iter

max number of iterations for EM

max_prec

max precision for stopping EM


qtl2pleio.

Description

Testing pleiotropy vs. separate QTL in multiparental populations


Estimate allele effects matrix, B hat, with Rcpp functions

Description

Estimate allele effects matrix, B hat, with Rcpp functions

Usage

rcpp_calc_Bhat(X, Sigma_inv, Y)

Arguments

X

dn by df block-diagonal design matrix that incorporates genetic info for two markers. Note that we can use the same marker data twice.

Sigma_inv

dn by dn inverse covariance matrix, where its inverse, ie, Sigma, is often composed as KVg+InVeK \otimes V_g + I_n \otimes V_e

Y

dn by 1 matrix, ie, a column vector, of d phenotypes' measurements

Value

a df by 1 matrix of GLS-estimated allele effects

Examples

X1 <- as.matrix(rbinom(n = 100, size = 1, prob = 1 / 2))
X <- gemma2::stagger_mats(X1, X1)
Sigma_inv <- diag(200)
Y <- runif(200)
rcpp_calc_Bhat(X = X, Sigma_inv = Sigma_inv, Y = Y)

Estimate allele effects matrix, B hat, with Rcpp functions

Description

Estimate allele effects matrix, B hat, with Rcpp functions

Usage

rcpp_calc_Bhat2(X, Y, Sigma_inv)

Arguments

X

dn by df block-diagonal design matrix that incorporates genetic info for two markers. Note that we can use the same marker data twice.

Y

dn by 1 matrix, ie, a column vector, of d phenotypes' measurements

Sigma_inv

dn by dn inverse covariance matrix, often composed as inverse of KVg+InVgK \otimes V_g + I_n \otimes V_g

Value

a df by 1 matrix of GLS-estimated allele effects

Examples

X1 <- as.matrix(rbinom(n = 100, size = 1, prob = 1 / 2))
X <- gemma2::stagger_mats(X1, X1)
Sigma_inv <- diag(200)
Y <- runif(200)
rcpp_calc_Bhat2(X = X, Y = Y, Sigma_inv = Sigma_inv)

Calculate log likelihood for a multivariate normal

Description

Calculate log likelihood for a multivariate normal

Usage

rcpp_log_dmvnorm2(inv_S, mu, x, S)

Arguments

inv_S

inverse covariance matrix

mu

mean vector

x

data vector

S

covariance matrix, ie, the inverse of inv_S


Perform multivariate, one-QTL model fitting for markers on one chromosome

Description

'scan_multi_onechr' calculates log likelihood for d-variate phenotype model fits. Inputted parameter 'start_snp' indicates where in the 'probs' object to start the scan.

Usage

scan_multi_onechr(
  probs,
  pheno,
  kinship = NULL,
  addcovar = NULL,
  start_snp = 1,
  n_snp = dim(probs)[3],
  max_iter = 10000,
  max_prec = 1/1e+08,
  cores = 1
)

Arguments

probs

an array of founder allele probabilities for a single chromosome

pheno

a matrix of phenotypes

kinship

a kinship matrix for one chromosome

addcovar

a matrix, n subjects by c additive covariates

start_snp

index of where to start the scan within probs

n_snp

the number of (consecutive) markers to include in the scan

max_iter

maximum number of iterations for EM algorithm

max_prec

stepwise precision for EM algorithm. EM stops once incremental difference in log likelihood is less than max_prec

cores

number of cores for parallelization

Value

a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood

References

Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.

Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.

Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.

Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.

Examples

# read data
n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
scan_multi_onechr(probs = probs, pheno = pheno, kinship = NULL, cores = 1)

Perform multivariate, one-QTL model fitting for markers on all chromosomes

Description

The function first discards individuals with one or more missing phenotypes or missing covariates. It then infers variance components, Vg and Ve. Both Vg and Ve are d by d covariance matrices. It uses an expectation maximization algorithm, as implemented in the 'gemma2' R package. 'gemma2' R package is an R implementation of the GEMMA algorithm for multivariate variance component estimation (Zhou & Stephens 2014 Nature methods). Note that variance components are fitted on a model that uses the d-variate phenotype but contains no genetic information. This model does, however, use the specified covariates (after dropping dependent columns in the covariates matrix). These inferred covariance matrices, Vg^\hat{Vg} and Ve^\hat{Ve}, are then used in subsequent model fitting via generalized least squares. Generalized least squares model fitting is applied to every marker on every chromosome. For a single marker, we fit the model:

vec(Y)=Xvec(B)+vec(G)+vec(E)vec(Y) = Xvec(B) + vec(G) + vec(E)

where

GMN(0,K,Vg^)G \sim MN(0, K, \hat{Vg})

and

EMN(0,I,Ve^)E \sim MN(0, I, \hat{Ve})

where MNMN denotes the matrix-variate normal distribution with three parameters: mean matrix, covariance among rows, and covariance among columns. vecvec denotes the vectorization operation, ie, stacking by columns. KK is a kinship matrix, typically calculated by leave-one-chromosome-out methods. YY is the n by d phenotypes matrix. XX is a block-diagonal nd by fd matrix consisting of d blocks each of dimension n by f. Each n by f block (on the diagonal) contains a matrix of founder allele probabilities for the n subjects at a single marker. The off-diagonal blocks have only zero entries. The log-likelihood is returned for each model. The outputted object is a tibble with d + 1 columns. The first d columns specify the markers used in the corresponding model fit, while the last column specifies the log-likelihood value at that d-tuple of markers.

Usage

scan_multi_oneqtl(
  probs_list,
  pheno,
  kinship_list = NULL,
  addcovar = NULL,
  cores = 1
)

Arguments

probs_list

an list of arrays of founder allele probabilities

pheno

a matrix of phenotypes

kinship_list

a list of kinship matrices, one for each chromosome

addcovar

a matrix, n subjects by c additive covariates

cores

number of cores for parallelization via parallel::mclapply()

Value

a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood

References

Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.

Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.

Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.

Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.

Examples

# read data
n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
scan_multi_oneqtl(probs_list = list(probs, probs), pheno = pheno, cores = 1)

Permute the phenotypes matrix and then scan the genome. Record the genomewide greatest LOD score for each permuted data set.

Description

Permute the phenotypes matrix and then scan the genome. Record the genomewide greatest LOD score for each permuted data set.

Usage

scan_multi_oneqtl_perm(
  probs_list,
  pheno,
  kinship_list = NULL,
  addcovar = NULL,
  n_perm = 1,
  cores = 1
)

Arguments

probs_list

a list of founder allele probabilities, one array per chromosome

pheno

a matrix of trait values

kinship_list

a list of kinship matrices, one per chromosome

addcovar

a matrix of covariate values

n_perm

positive integer for the number of permuted data sets to scan.

cores

number of cores for parallelization

Value

a vector of 'n_perm' max lod statistics


Perform model fitting for all ordered pairs of markers in a genomic region of interest

Description

'scan_pvl' calculates log likelihood for d-variate phenotype model fits. Inputted parameter 'start_snp' indicates where in the 'probs' object to start the scan.

Usage

scan_pvl(
  probs,
  pheno,
  kinship = NULL,
  addcovar = NULL,
  start_snp = 1,
  n_snp,
  max_iter = 10000,
  max_prec = 1/1e+08,
  cores = 1
)

Arguments

probs

an array of founder allele probabilities for a single chromosome

pheno

a matrix of phenotypes

kinship

a kinship matrix for one chromosome

addcovar

a matrix, n subjects by c additive covariates

start_snp

index of where to start the scan within probs

n_snp

the number of (consecutive) markers to include in the scan

max_iter

maximum number of iterations for EM algorithm

max_prec

stepwise precision for EM algorithm. EM stops once incremental difference in log likelihood is less than max_prec

cores

number of cores to use when parallelizing via parallel::mclapply. Set to 1 for no parallelization.

Details

The function first discards individuals with one or more missing phenotypes or missing covariates. It then infers variance components, Vg and Ve. Both Vg and Ve are d by d covariance matrices. It uses an expectation maximization algorithm, as implemented in the 'gemma2' R package. 'gemma2' R package is an R implementation of the GEMMA algorithm for multivariate variance component estimation (Zhou & Stephens 2014 Nature methods). Note that variance components are fitted on a model that uses the d-variate phenotype but contains no genetic information. This model does, however, use the specified covariates (after dropping dependent columns in the covariates matrix). These inferred covariance matrices, Vg^\hat{Vg} and Ve^\hat{Ve}, are then used in subsequent model fitting via generalized least squares. Generalized least squares model fitting is applied to every d-tuple of markers within the specified genomic region for 'scan_pvl'. For a single d-tuple of markers, we fit the model:

vec(Y)=Xvec(B)+vec(G)+vec(E)vec(Y) = Xvec(B) + vec(G) + vec(E)

where

GMN(0,K,Vg^)G \sim MN(0, K, \hat{Vg})

and

EMN(0,I,Ve^)E \sim MN(0, I, \hat{Ve})

where MNMN denotes the matrix-variate normal distribution with three parameters: mean matrix, covariance among rows, and covariance among columns. vecvec denotes the vectorization operation, ie, stacking by columns. KK is a kinship matrix, typically calculated by leave-one-chromosome-out methods. YY is the n by d phenotypes matrix. XX is a block-diagonal nd by fd matrix consisting of d blocks each of dimension n by f. Each n by f block (on the diagonal) contains a matrix of founder allele probabilities for the n subjects at a single marker. The off-diagonal blocks have only zero entries. The log-likelihood is returned for each model. The outputted object is a tibble with d + 1 columns. The first d columns specify the markers used in the corresponding model fit, while the last column specifies the log-likelihood value at that d-tuple of markers.

Value

a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood

References

Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.

Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.

Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.

Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.

Examples

# read data
n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
scan_pvl(probs = probs, pheno = pheno, kinship = NULL,
start_snp = 1, n_snp = 5, cores = 1)

Simulate a single multivariate data set consisting of n subjects and d phenotypes for each

Description

Simulate a single multivariate data set consisting of n subjects and d phenotypes for each

Usage

sim1(X, B, Sigma)

Arguments

X

design matrix (incorporating genotype probabilities from two loci), dn by df

B

a matrix of allele effects, f rows by d columns

Sigma

dn by dn covariance matrix

Value

a vector of length dn. The first n entries are for trait 1, the second n for trait 2, etc.

Examples

n_mouse <- 20
geno <- rbinom(n = n_mouse, size = 1, prob = 1 / 2)
X <- gemma2::stagger_mats(geno, geno)
B <- matrix(c(1, 2), ncol = 2, nrow = 1)
sim1(X, B, Sigma = diag(2 * n_mouse))

Subset an input object - allele probabilities array or phenotypes matrix or covariates matrix. Kinship has its own subset function

Description

An inputted matrix or 3-dimensional array is first subsetted - by rownames - to remove those subjects who are not in ‘id2keep'. After that, the object’s rows are ordered to match the ordering of subject ids in the vector 'id2keep'. This (possibly reordered) object is returned.

Usage

subset_input(input, id2keep)

Arguments

input

a matrix of either phenotypes or covariates or array of allele probabilities

id2keep

a character vector of subject ids to identify those subjects that are shared by all inputs

Value

an object resulting from subsetting of 'input'. Its rows are ordered per 'id2keep'

Examples

# define s_id
s_id <- paste0("s", 1:10)
# set up input matrix
foo <- matrix(data = rnorm(10 * 3), nrow = 10, ncol = 3)
rownames(foo) <- s_id
subset_input(input = foo, id2keep = s_id)

Subset a kinship matrix to include only those subjects present in all inputs

Description

Since a kinship matrix has subject ids in both rownames and colnames, so we need to remove rows and columns according to names in 'id2keep'. We first remove rows and columns of subjects that are not in 'id2keep'. We then order rows and columns of the resulting matrix by the ordering in 'id2keep'.

Usage

subset_kinship(kinship, id2keep)

Arguments

kinship

a kinship matrix

id2keep

a character vector of subject ids to identify those subjects that are shared by all inputs