Main pedigree curation function that performs basic quality control on pedigree information
Arguments
- sb
A dataframe containing a table of pedigree and demographic information.
The function recognizes the following columns (optional columns will be used if present, but are not required):
{id} {— Character vector with Unique identifier for all individuals}
{sire} {— Character vector with unique identifier for the father of the current id}
{dam} {— Character vector with unique identifier for the mother of the current id}
{sex} {— Factor {levels: "M", "F", "U"} Sex specifier for an individual}
{birth} {— Date or
NA
(optional) with the individual's birth date}{departure} {— Date or
NA
(optional) an individual was sold or shipped from the colony}{death} {— date or
NA
(optional) Date of death, if applicable}{status} {— Factor {levels: ALIVE, DEAD, SHIPPED} (optional) Status of an individual}
{origin} {— Character or
NA
(optional) Facility an individual originated from, if other than ONPRC}{ancestry} {— Character or
NA
(optional) Geographic population to which the individual belongs}{spf} {— Character or
NA
(optional) Specific pathogen-free status of an individual}{vasxOvx} {— Character or
NA
(optional) Indicator of the vasectomy/ovariectomy status of an animal;NA
if animal is intact, assume all other values indicate surgical alteration}{condition} {— Character or
NA
(optional) Indicator of the restricted status of an animal. "Nonrestricted" animals are generally assumed to be naive.}
- minParentAge
numeric values to set the minimum age in years for an animal to have an offspring. Defaults to 2 years. The check is not performed for animals with missing birth dates.
- reportChanges
logical value that if
TRUE
, theerrorLst
contains the list of changes made to the column names. Default isFALSE
.- reportErrors
logical value if
TRUE
will scan the entire file and report back changes made to input and errors in a list of list where each sublist is a type of change or error found. Changes will include column names, case of categorical values (male, female, unknown), etc. Errors will include missing columns, invalid date rows, male dams, female sires, and records with one or more parents below minimum age of parents.The following changes are made to the cols.
{Column cols are converted to all lower case}
{Periods (".") within column cols are collapsed to no space ""}
{
egoid
is converted toid
}{
sireid
is convert tosire
}{
damid
is converted todam
}
If the dataframe (
sb
does not contain the five required columns (id
,sire
,dam
,sex
), andbirth
the function throws an error by callingstop()
.If the
id
field has the string UNKNOWN (any case) or both the fieldssire
ordam
haveNA
or UNKNOWN (any case), the record is removed. If either of the fieldssire
ordam
have the string UNKNOWN (any case), they are replaced with a unique identifier with the formUnnnn
, wherennnn
represents one of a series of sequential integers representing the number of missing sires and dams right justified in a pattern of0000
. SeeaddUIds
function.The function
addParents
is used to add records for parents missing their own record in the pedigree.The function
convertSexCodes
is used withignoreHerm == TRUE
to convert sex codes according to the following factors of standardized codes:{F} {– replacing "FEMALE" or "2"}
{M} {– replacing "MALE" or "1"}
{H} {– replacing "HERMAPHRODITE" or "4", if ignore.herm == FALSE}
{U} {– replacing "HERMAPHRODITE" or "4", if ignore.herm == TRUE}
{U} {– replacing "UNKNOWN" or "3"}
The function
correctParentSex
is used to ensure no parent is both a sire and a dam. If this error is detected, the function throws an error and halts the program.The function
convertStatusCodes
converts status indicators to the following factors of standardized codes. Case of the original status value is ignored.{"ALIVE"} {— replacing "alive", "A" and "1"}
{"DECEASED"} {— replacing "deceased", "DEAD", "D", "2"}
{"SHIPPED"} {— replacing "shipped", "sold", "sale", "s", "3"}
{"UNKNOWN"} {— replacing is.na(status)}
{"UNKNOWN"} {— replacing "unknown", "U", "4"}
The function
convertAncestry
coverts ancestry indicators using regular expressions such that the following conversions are made from character strings that match selected substrings to the following factors.{"INDIAN"} {— replacing "ind" and not "chin"}
{"CHINESE"} {— replacing "chin" and not "ind"}
{"HYBRID"} {— replacing "hyb" or "chin" and "ind"}
{"JAPANESE"} {— replacing "jap"}
{"UNKNOWN"} {— replacing
NA
}{"OTHER"} {— replacing not matching any of the above}
The function
convertDate
converts character representations of dates in the columnsbirth
,death
,departure
, andexit
to dates using theas.Date
function.The function
setExit
uses heuristics and the columnsdeath
anddeparture
to setexit
if it is not already defined.The function
calcAge
uses thebirth
and theexit
columns to define theage
column. The numerical values is rounded to the nearest 0.1 of a year. Ifexit
is not defined, the current system date (Sys.Date()
) is used.The function
findGeneration
is used to define the generation number for each animal in the pedigree.The function
removeDuplicates
checks for any duplicated records and removes the duplicates. I also throws an error and stops the program if an ID appears in more than one record where one or more of the other columns have a difference.Columns that cannot be used subsequently are removed and the rows are ordered by generation number and then ID.
Finally the columns
id
sire
, anddam
are coerce to character.
Examples
examplePedigree <- nprcgenekeepr::examplePedigree
ped <- qcStudbook(examplePedigree, minParentAge = 2, reportChanges = FALSE,
reportErrors = FALSE)
names(ped)
#> [1] "id" "sire" "dam" "sex" "gen"
#> [6] "birth" "exit" "age" "ancestry" "origin"
#> [11] "status" "recordStatus"