Calculate the standard error of genome uniqueness

Part of Genetic Value Analysis

Usage

calcGUSE(alleles, threshold = 1L, byID = FALSE, pop = NULL)

Arguments

alleles: dataframe containing an AlleleTable (the same input calcGU takes): one integer column per gene-drop iteration, followed by an id column and a parent column. Produced by geneDrop().
threshold: an integer indicating the maximum number of copies of an allele that can be present in the population for it to be considered rare. Default is 1.
byID: logical variable of length 1 that is passed through to eventually be used by alleleFreq(), which calculates the count of each allele in the provided vector. If byID is TRUE and ids are provided, the function will only count the unique alleles for an individual (homozygous alleles will be counted as 1).
pop: character vector with animal IDs to consider as the population of interest, otherwise all animals will be considered. The default is NULL.

Value

Dataframe rows: id, col: guSE A single-column table of genome-uniqueness standard errors as percentages. Rownames are set to 'id' values that are part of the population.

Details

Genome uniqueness (calcGU) is a Monte Carlo estimate: it is the average, over the gene-drop iterations, of the proportion of an animal's two allele copies that are population-rare. Because it is an average over independent simulated iterations, it carries sampling error that shrinks as the number of iterations grows.

For animal i let m_ik = rare[i, k] / 2 be the per-iteration value in iteration k (so the mean of m_ik over the K iterations equals gu_i / 100). This function returns the exact Monte Carlo standard error of that mean, on the same percentage scale as calcGU:

$$guSE_i = 100 \times \sqrt{\frac{var(m_{i\cdot})}{K}}$$

The standard error is computed from the same per-iteration rare-allele matrix (calcA) that calcGU averages, so it is correct for any threshold / byID without a closed-form approximation. An animal whose rare-allele count does not vary across iterations has a standard error of 0.

Examples

library(nprcgenekeepr)
ped1Alleles <- nprcgenekeepr::ped1Alleles
guSE <- calcGUSE(ped1Alleles, threshold = 3, byID = FALSE, pop = NULL)