
Building a Focal-Animal Pedigree Offline
Source:vignettes/articles/offline-focal-animal-workflow.qmd
Overview
Often you do not want a whole colony pedigree – you want the pedigree of a short list of focal animals: a few breeders, the candidates for a study, or the residents of a proposed group, together with everyone they are genetically linked to. nprcgenekeepr builds that focal pedigree two ways:
-
online –
getFocalAnimalPed()reads the focal IDs from a file and pulls the surrounding pedigree from a LabKey / EHR database; -
offline –
getFocalAnimalPedFromFile()does the same thing from a second file you supply, so the workflow needs no database connection at all.
This article covers the offline path. Use it when you have a focal-animal ID list and a pedigree file on disk and either no LabKey access or no need for it – on a laptop, behind a firewall, or with a studbook exported from another system. The two functions are siblings by design: getFocalAnimalPedFromFile() reads the focal IDs exactly as getFocalAnimalPed() does, then builds the same connected pedigree component from your file instead of from the database.
The Shiny app (runModularApp()) drives this same offline path from its Input tab (see the In the Shiny app section below); here we script it directly.
Setup
getFocalAnimalPedFromFile() runs no random simulation – it is deterministic – so, unlike the genetic-value and breeding-group analyses, this article needs no set_seed() call to be reproducible.
The two inputs
The offline workflow takes two separate files:
- a focal-animal ID file (
fileName) – a CSV, delimited text, or Excel file whose first column is the list of focal animal IDs. The column may have any header; only the first column is read. Any other columns are ignored. - a pedigree file (
pedigreeFileName) – a CSV, delimited text, or Excel file that must contain at least the columnsid,sire, anddam. Any additional columns (sex,birth,exit, …) are carried through to the result unchanged.
Both files are read with a header row. The sep argument sets the field separator for delimited text (default ","); it is ignored for Excel files, which are detected automatically.
A self-contained example
The clearest way to see the workflow is to build both files inline. Here is a four-animal pedigree – two founders (A, B) and their two offspring (C, D) – written to a temporary CSV, with a one-line focal file naming a single focal animal, C:
ped <- data.frame(
id = c("A", "B", "C", "D"),
sire = c(NA, NA, "A", "A"),
dam = c(NA, NA, "B", "B"),
stringsAsFactors = FALSE
)
pedFile <- tempfile(fileext = ".csv")
write.csv(ped, pedFile, row.names = FALSE)
focalFile <- tempfile(fileext = ".csv")
write.csv(data.frame(id = "C"), focalFile, row.names = FALSE)
getFocalAnimalPedFromFile(focalFile, pedFile)
#> id sire dam
#> 1 A <NA> <NA>
#> 2 B <NA> <NA>
#> 3 C A B
#> 4 D A BAlthough we asked only for C, the result has four rows. The function returns C’s full connected pedigree component: its parents A and B (ancestors), and its full sibling D – a collateral relative pulled in because it shares both parents. Building the connected component, not just the direct ancestors, is what makes the result usable for kinship and breeding-group work, where an animal’s relatives matter as much as its ancestry.
Using the bundled colony data
The package ships a realistic pair you can use the same way: a five-ID focal list and the example colony pedigree it resolves into. Reference them with system.file(), which returns the path inside the installed package.
focalListFile <- system.file("extdata", "focalAnimalsShortList.csv",
package = "nprcgenekeepr")
pedigreeFile <- system.file("extdata", "ExamplePedigree.csv",
package = "nprcgenekeepr")
colonyPed <- getFocalAnimalPedFromFile(focalListFile, pedigreeFile)
dim(colonyPed)
#> [1] 2922 11
names(colonyPed)
#> [1] "id" "sire" "dam" "sex" "gen" "birth"
#> [7] "exit" "age" "ancestry" "origin" "status"Five focal animals expand to a connected component of several thousand – a reminder that in a real colony a handful of animals can be linked, through shared ancestors and descendants, to a large fraction of the studbook. The returned columns are exactly those of the pedigree file (here id, sire, dam, sex, gen, birth, exit, age, ancestry, origin, status).
head(colonyPed[, c("id", "sire", "dam", "sex")])
#> id sire dam sex
#> 2 VJ08BW <NA> <NA> F
#> 22 SUFWJI <NA> <NA> F
#> 23 9747LE <NA> <NA> F
#> 32 RMJKJ4 <NA> <NA> F
#> 40 WTE53B <NA> <NA> F
#> 42 HHFN1E <NA> <NA> F
all(read.csv(focalListFile)[[1]] %in% colonyPed$id) # the focal IDs are included
#> [1] TRUEThe result is returned in pedigree-file order, so head() shows founder records (no known sire or dam); the focal animals themselves sit deeper in the table. A quality-control pass with qcStudbook() – see the Studbook Quality Control article – is the usual next step before any analysis.
When a file cannot be read
getFocalAnimalPedFromFile() is the application boundary, so it is fail-soft: it never throws. On any failure it returns a classed nprcgenekeeprFileErr object – a list with a message element naming why the read failed – which you test for with inherits():
result <- getFocalAnimalPedFromFile(focalListFile, tempfile(fileext = ".csv"))
inherits(result, "nprcgenekeeprFileErr")
#> [1] TRUE
result$message
#> [1] "Pedigree file not found."
strangers <- tempfile(fileext = ".csv")
write.csv(data.frame(id = "NOSUCHID"), strangers, row.names = FALSE)
getFocalAnimalPedFromFile(strangers, pedigreeFile)$message
#> [1] "None of the focal IDs were found in the pedigree file."The full set of reasons:
message |
Cause |
|---|---|
The focal animal ID list file could not be read. |
the focal-id file is missing or unreadable |
A pedigree file must be supplied to build the focal pedigree offline. |
pedigreeFileName was omitted (NULL) |
Pedigree file not found. |
the pedigree file path does not exist |
The pedigree file must contain columns id, sire, and dam. |
the pedigree file is missing a required column |
The pedigree file could not be read. |
any other pedigree read failure |
None of the focal IDs were found in the pedigree file. |
the files read, but no focal ID matched any pedigree id
|
This fail-soft contract is specific to the offline file path. The online getFocalAnimalPed() instead returns an nprcgenekeeprErr on a database failure, so code that handles both paths should check for each class. In the Shiny app, an nprcgenekeeprFileErr surfaces as a File Read Error row on the Errors tab, with its message as the detail.
In the Shiny app
To do the same thing interactively, launch runModularApp() and, on the Input tab:
- under File Content, choose “Focal animals only; pedigree built from database”;
- upload your focal-animal ID file under Select Focal Animals File;
- upload your pedigree file under Optional: Pedigree File (build offline; no database) – supplying this second file is what runs the workflow offline (leave it empty and the app falls back to the LabKey/database path);
- click Read and Check Pedigree.
The app then calls getFocalAnimalPedFromFile() for you and routes any nprcgenekeeprFileErr to the Errors tab.
Key arguments
| Argument | Default | Meaning |
|---|---|---|
fileName |
– | path to the focal-animal ID file (CSV, delimited text, or Excel); the first column holds the IDs |
pedigreeFileName |
NULL |
path to the pedigree file; must contain at least id, sire, and dam columns |
sep |
"," |
field separator for delimited text files; ignored for Excel |
See also
- The Studbook Quality Control article –
qcStudbook()validates and standardizes a pedigree, the usual next step once you have a focal pedigree. - The Genetic Value Analysis article – rank a quality-controlled pedigree by mean kinship and genome uniqueness with
reportGV(). - The Forming Breeding Groups article – assemble genetically diverse breeding groups with
groupAddAssign(). - The Age-Sex Pyramid Plots article – picture the colony’s age and sex structure with
getPyramidPlot(). -
getFocalAnimalPed()– the online sibling that pulls the surrounding pedigree from a LabKey / EHR database instead of a file. -
getFileDirectRelatives()– the lower-level function that walks the pedigree file to build the connected component (and exposes theunrelatedParentsoption, whichgetFocalAnimalPedFromFile()leaves at its default). -
runModularApp()– the Shiny app, whose Input tab drives this same offline workflow interactively.
Reference.
Vinson A, Raboin MJ (2015). “A Practical Approach for Designing Breeding Groups to Maximize Genetic Diversity in a Large Colony of Captive Rhesus Macaques (Macaca mulatta).” Journal of the American Association for Laboratory Animal Science 54(6):700-707.