
Gene-Drop Iteration Convergence
nprcgenekeepr: an R Package for the Genetic Management of Colonies
R. Mark Sharp, Ph.D.
2026-06-25
Source:vignettes/gvaConvergence.Rmd
gvaConvergence.RmdHow many gene-drop iterations does a pedigree need?
The Genetic Value Analysis ranks animals by mean kinship and
genome uniqueness. Mean kinship is deterministic, but
genome uniqueness (gu) is estimated by a gene-drop
simulation: alleles are dropped down the pedigree a chosen number of
times, and gu is the average result. Like any estimate from
a finite number of draws, it carries sampling noise, so the precision of
gu – and, more importantly, the selection
order it produces – depends on how many iterations you run.
There is no single iteration count that is right for every colony. A
pedigree with many animals whose true genome uniqueness sits near a
ranking boundary needs more iterations before the chosen set of animals
stops changing run to run; a pedigree whose ranked animals are well
separated settles almost immediately. gvaConvergence()
answers the question for the pedigree in hand: it reports how
reproducible the selection order is at a range of iteration counts, and
the smallest count at which it has settled.
What it measures
Because the gene-drop iteration columns are independent replicates,
the whole convergence picture is recoverable from a
single gene drop. gvaConvergence() runs
one gene drop at a budget nMax, then for each candidate
iteration count N it splits the columns into two disjoint
halves of N columns each – two genuinely independent
N-iteration runs – ranks each half through the same
ordering pipeline the report uses, and compares the two orderings on two
criteria:
-
top-
koverlap – do the two runs choose the same top animals? (defaultk = 20, thresholdoMin = 0.90) -
Kendall rank agreement – do the chosen animals come
out in the same order? (threshold
rhoMin = 0.95)
The run is reproducible at N when both
hold, and the recommended iteration count is the smallest N
meeting both. The issue #76 “Undetermined” animals (both parents
unknown, no recorded origin) are a policy constant with rank
NA; they are excluded from the order and reported
separately as nUndetermined.
This is distinct from seed reproducibility. A fixed seed already
makes gu bit-identical run to run – that is reproducibility
of the process. gvaConvergence() reports the
sampling reproducibility of the estimate: would a fresh,
independent run lead to the same animals being chosen, in the same
order?
A pedigree where the iteration count matters
No bundled pedigree exercises the diagnostic well – after the issue
#76 de-inflation the shipped pedigrees have no gu signal
left to rank on, so their selection order is settled at the smallest
iteration count. To show a pedigree where the count does
matter, we build a small half-sib web in which the founders (the sources
of the private alleles) are excluded from the analyzed population, so
their alleles survive among the offspring only through descendants.
Overlapping sire/dam mating windows give the offspring distinct true
genome-uniqueness values that straddle the ranking boundary, so at low
iteration counts the gene-drop estimate randomly crosses the boundary
and the selection order churns.
## A deterministic dense-mid-range pedigree: 14 founder sires, each mated to a
## wrapping window of 5 of 15 founder dams (one offspring per pair). Windows
## overlap, so dam fan sizes vary and the 70 offspring get distinct genome
## uniqueness straddling the 10% ranking cutoff. Founders are excluded from the
## analyzed population (`pop`); the offspring are the probands.
makeConvergenceFixture <- function() {
w <- rep(5L, 14L) # sire window widths
b <- 15L # number of founder dams
a <- length(w)
sids <- sprintf("S%03d", seq_len(a))
dids <- sprintf("D%03d", seq_len(b))
id <- c(sids, dids)
sire <- rep(NA_character_, a + b)
dam <- rep(NA_character_, a + b)
sex <- c(rep("M", a), rep("F", b))
off <- character(0L)
ocount <- 0L
start <- 1L
for (i in seq_len(a)) {
for (j in seq_len(w[i])) {
dj <- ((start + j - 2L) %% b) + 1L
ocount <- ocount + 1L
o <- sprintf("O%04d", ocount)
id <- c(id, o)
sire <- c(sire, sids[i])
dam <- c(dam, dids[dj])
sex <- c(sex, if (ocount %% 2L == 0L) "F" else "M")
off <- c(off, o)
}
start <- start + max(1L, w[i] - 1L)
}
ped <- data.frame(id = id, sire = sire, dam = dam, sex = sex,
stringsAsFactors = FALSE)
ped$gen <- findGeneration(ped$id, ped$sire, ped$dam)
list(ped = ped, pop = off)
}
fx <- makeConvergenceFixture()
nrow(fx$ped) # total animals (founders + offspring)
#> [1] 99
length(fx$pop) # offspring (the analyzed population)
#> [1] 70We assess iteration counts from 25 up to 1500 (a fixed
seed makes the curve reproducible):
conv <- gvaConvergence(
fx$ped, pop = fx$pop, nMax = 3000L,
grid = c(25L, 50L, 100L, 200L, 400L, 800L, 1500L), seed = 11L
)
kable(
conv$convergence,
digits = c(0L, 3L, 3L),
col.names = c("Iterations (N)", "Top-20 overlap", "Kendall agreement"),
caption = "Selection-order reproducibility vs. iteration count (half-sib web)."
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Iterations (N) | Top-20 overlap | Kendall agreement |
|---|---|---|
| 25 | 0.75 | 0.701 |
| 50 | 0.80 | 0.781 |
| 100 | 0.85 | 0.853 |
| 200 | 1.00 | 0.916 |
| 400 | 1.00 | 0.945 |
| 800 | 1.00 | 0.953 |
| 1500 | 1.00 | 0.971 |
conv$recommendedIter # smallest N meeting both criteria
#> [1] 800
conv$converged
#> [1] TRUE
conv$nRankable
#> [1] 70
conv$nUndetermined
#> [1] 0At the smallest counts the two independent half-runs disagree on both
which animals are chosen and their order; both measures climb as
N grows, and the selection order has settled by the higher
counts and stays settled there. recommendedIter (printed
below the table) reports the smallest count meeting both criteria. For
this pedigree the default of 1000 iterations is in the right range, and
gvaConvergence() is how you would confirm that rather than
guess it.
A pedigree that converges immediately
Contrast the bundled qcPed. After the issue #76
de-inflation none of its ranked animals carry a non-zero genome
uniqueness, so the selection order is driven by deterministic mean
kinship and does not move with the iteration count.
convQc <- gvaConvergence(
nprcgenekeepr::qcPed, nMax = 400L,
grid = c(25L, 50L, 100L, 200L), seed = 11L
)
kable(
convQc$convergence,
digits = c(0L, 3L, 3L),
col.names = c("Iterations (N)", "Top-20 overlap", "Kendall agreement"),
caption = "Selection-order reproducibility vs. iteration count (qcPed)."
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Iterations (N) | Top-20 overlap | Kendall agreement |
|---|---|---|
| 25 | 1 | 1 |
| 50 | 1 | 1 |
| 100 | 1 | 1 |
| 200 | 1 | 1 |
convQc$recommendedIter # converges at the grid floor
#> [1] 25
convQc$nRankable
#> [1] 156
convQc$nUndetermined # the excluded issue #76 set
#> [1] 124Overlap and agreement are at their maximum from the smallest count,
so the recommended count is the floor of the grid: this pedigree needs
almost no iterations for the order to be reproducible. The
nUndetermined count reports how many animals were set aside
without a rank.
Choosing an iteration count
Run gvaConvergence() on your own pedigree and read
recommendedIter. If converged is
FALSE, no count in the grid settled the order – raise
nMax and extend the grid (each candidate
N needs 2 * N <= nMax), or accept that this
pedigree is intrinsically near a ranking boundary and interpret the
ranking with care. The thresholds k, oMin, and
rhoMin are arguments, so you can make the definition of
“reproducible” stricter or looser for your purpose.
Keep two ideas separate. The per-animal sampling standard error
(guSE, reported beside gu in
reportGV()) tells you how precise the genome uniqueness
number is; it shrinks like one over the square root of the
iteration count. The selection order – what
gvaConvergence() measures – is what actually decides which
animals are chosen for breeding. A small guSE does not by
itself mean the order has settled, which is exactly why the convergence
check exists.
See also reportGV() (the analysis whose iterations this
advises on), calcGUSE() (the per-animal standard error),
and the Colony Manager Tutorial vignette.