vignettes/a2interactive.Rmd
a2interactive.Rmd
This tutorial demonstrates the major functions used within the Shiny application provided by the nprcgenekeepr package and provides sufficient insight into those functions that they may be used independently.
This tutorial is primarily directed toward someone with experience using R who wants to better understand how the Shiny application works or to perform some actions not directly supported by the Shiny application.
Please provide any comments, questions, or bug reports through the GitHub issue tracker at .
You can install nprcgenekeepr from GitHub with the following code.
install.packages(nprcgenekeepr)
## Use the following code to get the development version
# install.packages("devtools")
# devtools::install_github("rmsharp/nprcgenekeepr")
All missing dependencies should be automatically installed.
You can get help from the R console with
?nprcgenekeepr
The help provided by this (nprcgenekeepr.R) needs to be more complete and include links to the tutorials.
A pedigrees can be imported using either Excel worksheets or text files that contain all of the pedigree information or using either Excel worksheets or text files that contain a list of focal animals with the remainder of the pedigree information is pulled in through the LabKey API.
This tutorial will use a pedigree file that can be created using the makeExamplePedigreeFile function as shown below. The function makeExamplePedigreeFile both saves a file and returns the full path name to the saved file, which we are saving into the variable pedigreeFile. Note: the user will select where to store the file.
library(nprcgenekeepr)
pedigreeFile <- makeExamplePedigreeFile()
This writes ExamplePedigree.csv to a place you select within your file system.
You use the file name provided by the makeExamplePedigreeFile function to tell read.table what file to read.
breederPedCsv <- read.table(pedigreeFile, sep = ",", header = TRUE,
stringsAsFactors = FALSE)
Note the number of rows read. Each row represents an individual within the pedigree.
nrow(breederPedCsv)
## [1] 3694
The next step is to put the information read from the file into a pedigree object. This is done with the qcStudbook function, which examines the file contents and tests for common pedigree errors.
You can see the errors that can be detected by qcStudbook by returning the empty error list with getEmptyErrorLst(). We are not showing the output of the function call now because later in this tutorial we will explore errors in more depth.
qcStudbook can take four arguments sb, minParentAge (in years), reportChanges, and reportErrors. However, all but sb have default values and only the sb argument is required.
It is prudent to ensure that parents are at least of breeding age, which is species specific. I have used a minParentAge of 2 years.1
breederPed <- qcStudbook(breederPedCsv, minParentAge = 2)
If qcStudbook reports an error, change the call by adding the reportErrors argument set to TRUE and examine the returned object. More on this is presented in the Pedigree Errors section.
You may want to focus your work on a focal group of animals. This can be done by reading in a list of animal IDs that make up the focal group and use that list to update the pedigree. Alternatively you can created a list of animal IDs based on criteria you have selected.
For example, to select living animals at the facility with at least one parent, the following can be used.
focalAnimals <- breederPed$id[!(is.na(breederPed$sire) &
is.na(breederPed$dam)) &
is.na(breederPed$exit)]
print(stri_c("There are ", length(focalAnimals),
" animals in the vector _focalAnimals_."))
[1] “There are 327 animals in the vector focalAnimals.”
As can be seen, these animals have at least one parent and have not left the facility.
breederPed[breederPed$id %in%
focalAnimals, c("id", "sire", "dam", "exit")][1:10, ]
## id sire dam exit
## 1669 01QRQ4 VDBGDP TH7HTY <NA>
## 1743 CLSVU6 ULV9M7 SUFWJI <NA>
## 1887 1SPLS8 U9APLW 142GKP <NA>
## 1934 5IAFMK U4YSS5 WVE6Y4 <NA>
## 2072 HLQ9SY UI3RFL VEWC1E <NA>
## 2234 XFWVVX U3MFSZ L4LM1F <NA>
## 2337 6X6BG9 ENI6HX IUF0HC <NA>
## 2377 B228Q6 UEUIRJ CBSIAA <NA>
## 2378 B2CKHA ENI6HX WBFBR5 <NA>
## 2383 BCJJKN UA379T JPVAT3 <NA>
We indicate that these are the animals of interest by using the setPopulation function. This function simply sets a column named population2 to the logical value of TRUE if the row represents an animal in the list and FALSE otherwise.
The first line of code below sets the population column and the second counts the number of rows where the value was set to TRUE.
breederPed <- setPopulation(ped = breederPed, ids = focalAnimals)
nrow(breederPed[breederPed$population, ])
## [1] 327
The IDs used to populate the population flag can be used to trim the pedigree so that it contains only those individuals who are in the ID list or are ancestors of those individuals.
trimmedPed <- trimPedigree(focalAnimals, breederPed)
nrow(breederPed); nrow(trimmedPed)
## [1] 3694
## [1] 704
The trimPedigree function has the ability to remove those ancestors that do not contribute genetic information. Uninformative founders are those individuals who are parents of only one individual and who have no parental information. (Currently genotypic information is ignored by trimPedigree).
trimmedPedInformative <- trimPedigree(focalAnimals, breederPed,
removeUninformative = TRUE)
nrow(trimmedPedInformative)
## [1] 509
We can find all of the animals that are in the trimmed pedigree but are not focal animals.
nonfocalInTrimmedPed <- trimmedPed$id[!trimmedPed$id %in% focalAnimals]
length(nonfocalInTrimmedPed)
## [1] 377
We can see which of these 377 are and are not parents. We will first make sure we have all of the parents by getting our list of parents from the entire pedigree. We then demonstrate that they are all in the trimmed pedigree.
allFocalParents <- c(breederPed$sire[breederPed$id %in% focalAnimals],
breederPed$dam[breederPed$id %in% focalAnimals])
trimmedFocalParents <- c(trimmedPed$sire[trimmedPed$id %in% focalAnimals],
trimmedPed$dam[trimmedPed$id %in% focalAnimals])
all.equal(allFocalParents, trimmedFocalParents) # Are the IDs the same?
## [1] TRUE
However, not all of the animals in the trimmed pedigree are either the focal animals or their parents. They are more distant ancestors as we will show.
notFocalNotParent <- trimmedPed$id[!trimmedPed$id %in% c(focalAnimals, allFocalParents)]
length(notFocalNotParent)
## [1] 187
Since the trimming process is supposed to retain the focal animals and their ancestors, we will leave it as an exercise for you to demonstrate that at least some of the remaining animals are grandparents of the focal animals. Hint: there are 490 grandparents in both the trimmed and the complete pedigree.
As you can see from the number of rows in the full pedigree (3694) versus the trimmed pedigree (704), trimmed pedigrees can be much smaller. Of the additional 377 animals, 182 provide genetic information while the others (195) are genetically uninformative.
As is shown below only 4 (0ZX29Q, 1QBKW9, 5PWJ0G, and Y3CJ5A) living animals are still in the colony but are not in the trimmed pedigree.3
unknownBirth <- breederPed$id[is.na(breederPed$birth)]
knownExit <- breederPed$id[ !is.na(breederPed$exit)]
unknownBirthKnownExit <- breederPed$id[is.na(breederPed$birth) | !is.na(breederPed$exit)]
knownPed <- breederPed[!breederPed$id %in% unknownBirthKnownExit, ]
otherIds <- knownPed$id[!knownPed$id %in% trimmedPed$id[is.na(trimmedPed$exit)]]
print(stri_c("The living animals in the pedigree that are not in the trimmed ",
"pedigree are ", get_and_or_list(otherIds), "."))
[1] “The living animals in the pedigree that are not in the trimmed pedigree are 0ZX29Q, 1QBKW9, 5PWJ0G, and Y3CJ5A.”
You can examine the population structure using an age-sex pyramid plot with a single function. We will limit our view to just the focal animals and their living relatives. This is appropriate for colony management because in addition to the genetic diversity we seek, we have to remain cognizant of the age and sex distributions within the colonies we manage.
getPyramidPlot(ped = trimmedPed[is.na(trimmedPed$exit), ])
## 45 45
## [1] 5.1 4.1 4.1 2.1
Your genetic value analysis must be carefully performed. The next three commands set up the entire pedigree for analysis. The first of these three commands set all of the pedigree members to be part of the population of interest by setting the population column to TRUE for all individuals.
ped <- setPopulation(breederPed, NULL)
Note that a new pedigree object (ped) is being created.
probands <- ped$id[ped$population]
ped <- trimPedigree(probands, ped, removeUninformative = FALSE,
addBackParents = FALSE)
The arguments to reportGV are all optional except for ped, but you may often want to non-default values.
ped Pedigree information in data.frame format
guIter Integer indicating the number of iterations for the gene-drop analysis. Default is 5000 iterations
guThresh Integer indicating the threshold number of animals for defining a unique allele. Default considers an allele “unique” if it is found in only 1 animal.
pop Character vector with animal IDs to consider as the population of interest. The default is NULL.
byID Logical variable of length 1 that is passed through to eventually be used by alleleFreq(), which calculates the count of each allele in the provided vector. If byID is TRUE and ids are provided, the function will only count the unique alleles for an individual (homozygous alleles will be counted as 1).
geneticValue <- reportGV(ped, guIter = 50,
guThresh = 3,
byID = TRUE,
updateProgress = NULL)
summary(geneticValue)
## The genetic value report
## Individuals in Pedigree: 3694
## Male Founders: 141
## Female Founders: 122
## Total Founders: 263
## Founder Equivalents: 241.84
## Founder Genome Equivalents: 163.62
## Live Offspring: 4052
## High Value Individuals: 2957
## Low Value Individuals: 737
What happens if we limit our analysis to the trimmed pedigree? Remember that the trimmed pedigree still contains all of the genetic information that the full pedigree has for the focal animals.
trimmedGeneticValue <- reportGV(trimmedPed, guIter = 50,
guThresh = 3,
byID = TRUE,
updateProgress = NULL)
summary(trimmedGeneticValue)
## The genetic value report
## Individuals in Pedigree: 327
## Male Founders: 3
## Female Founders: 17
## Total Founders: 20
## Founder Equivalents: 109.67
## Founder Genome Equivalents: 47.44
## Live Offspring: 321
## High Value Individuals: 223
## Low Value Individuals: 104
It is clear that limiting your analysis to the animals of interest reduces the effort required to examine the animals of interest.
The names of the object within the genetic value report object (trimmedGeneticValue) can be listed as shown in the next line of code.
names(trimmedGeneticValue)
## [1] "report" "kinship" "gu" "fe"
## [5] "fg" "maleFounders" "femaleFounders" "nMaleFounders"
## [9] "nFemaleFounders" "total"
The report object (an R dataframe) can in-turn be examined.
names(trimmedGeneticValue$report) ## column names
## [1] "id" "sex" "age" "birth"
## [5] "exit" "population" "origin" "indivMeanKin"
## [9] "zScores" "gu" "totalOffspring" "livingOffspring"
## [13] "value" "rank"
nrow(trimmedGeneticValue$report) ## Number of rows
## [1] 327
The report is more conveniently used as a separate object. The next section of code rounds some of the numerical values and converts all columns to characters for display as a table where only the first 10 lines are included.
rpt <- trimmedGeneticValue[["report"]]
rpt$indivMeanKin <- round(rpt$indivMeanKin, 5)
rpt$zScores <- round(rpt$zScores, 2)
rpt$gu <- round(rpt$gu, 5)
rpt <- toCharacter(rpt)
names(rpt) <- headerDisplayNames(names(rpt))
knitr::kable(rpt[1:10, ]) # needs more work for display purposes.
Ego ID | Sex | Age (in years) | Birth Date | Exit Date | Breeding Colony Member | Origin | Individual Mean Kinship | Z-score (Mean Kinship) | Genome Uniqueness (%) | Total Offspring | Living Offspring | Value Designation | Rank |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
KZM9RB | M | 30.1 | 1989-05-03 | NA | TRUE | 0.00329 | -1.90 | 90 | 0 | 0 | High Value | 1 | |
CLSVU6 | F | 23.9 | 1995-08-02 | NA | TRUE | 0.00287 | -1.97 | 87 | 1 | 1 | High Value | 2 | |
1SPLS8 | F | 7.9 | 2011-07-26 | NA | TRUE | 0.00373 | -1.83 | 83 | 0 | 0 | High Value | 3 | |
WK89I9 | F | 21.1 | 1998-05-26 | NA | TRUE | 0.00582 | -1.49 | 78 | 0 | 0 | High Value | 4 | |
8YP6PA | M | 5.0 | 2014-07-04 | NA | TRUE | 0.00485 | -1.65 | 76 | 0 | 0 | High Value | 5 | |
01QRQ4 | F | 18.2 | 2001-04-04 | NA | TRUE | 0.00373 | -1.83 | 75 | 0 | 0 | High Value | 6 | |
IZDV8K | M | 7.7 | 2011-09-29 | NA | TRUE | 0.00480 | -1.66 | 74 | 0 | 0 | High Value | 7 | |
3MMZD4 | M | 12.2 | 2007-03-24 | NA | TRUE | 0.00536 | -1.57 | 73 | 0 | 0 | High Value | 8 | |
G2GYST | M | 19.1 | 2000-05-16 | NA | TRUE | 0.00488 | -1.64 | 71 | 2 | 2 | High Value | 9 | |
CHK1ZX | F | 7.8 | 2011-09-06 | NA | TRUE | 0.00481 | -1.66 | 70 | 0 | 0 | High Value | 10 |
We start the next lines of code by getting a fresh copy of the genetic value report since we changed all of the numeric values to characters in the last section to print the table. These lines demonstrate one way of extracting the component objects from the genetic value object.
rpt <- trimmedGeneticValue[["report"]]
kmat <- trimmedGeneticValue[["kinship"]]
f <- trimmedGeneticValue[["total"]]
mf <- trimmedGeneticValue[["maleFounders"]]
ff <- trimmedGeneticValue[["femaleFounders"]]
nmf <- trimmedGeneticValue[["nMaleFounders"]]
nff <- trimmedGeneticValue[["nFemaleFounders"]]
fe <- trimmedGeneticValue[["fe"]]
fg <- trimmedGeneticValue[["fg"]]
It is informative to examine the distribution of genetic uniqueness, mean kinship, and z-scores (normalized mean kinship values).
Creation of the boxplot for the genetic uniqueness values is shown below.
gu <- rpt[, "gu"]
guBox <- ggplot(data.frame(gu = gu), aes(x = "", y = gu)) +
geom_boxplot(
color = "darkblue",
fill = "lightblue",
notch = TRUE,
outlier.color = "red",
outlier.shape = 1
) +
theme_classic() + geom_jitter(width = 0.2) + coord_flip() +
ylab("Score") + ggtitle("Genetic Uniqueness")
print(guBox)
Extraction of the individual mean kinship values and their corresponding z-scores is shown in the next code chunk.
mk <- rpt[, "indivMeanKin"]
zs <- rpt[, "zScores"]
Creation of boxplots for the mean kinship and z-scores is left as an exercise.
The primary purpose of nprcgenekeepr is to form breeding groups according to our best information regarding maintaining the genetic characteristics we desire and the realities associated with other animal husbandry needs.
There are several options you must consider when forming groups using nprcgenekeepr, which we will examine using code below.
You decisions regarding each of the above options are expressed in a call to the function groupAddAssign. A complete description of the function and its arguments is available using the code shown below.
?groupAddAssign
Below is the descriptions of the function parameters extracted from the documentation near the time this tutorial was prepared.
candidates Character vector of IDs of the animals available for use in forming the groups. The animals that may be present in currentGroups are not included within candidates.
currentGroups List of character vectors of IDs of animals currently assigned to groups. Defaults to a list with character(0) in each sub-list element (one for each group being formed) assuming no groups are pre-populated.
kmat Numeric matrix of pairwise kinship values. Rows and columns are named with animal IDs.
ped Dataframe that is the ‘Pedigree’. It contains pedigree information including the IDs listed in candidates.
threshold Numeric value indicating the minimum kinship level to be considered in group formation. Pairwise kinship below this level will be ignored. The default values is 0.015625.
ignore List of character vectors representing the sex combinations to be ignored. If provided, the vectors in the list specify if pairwise kinship should be ignored between certain sexes. Default is to ignore all pairwise kinship between females.
minAge Integer value indicating the minimum age to consider in group formation. Pairwise kinships involving an animal of this age or younger will be ignored. Default is 1 year.
iter Integer indicating the number of times to perform the random group formation process. Default value is 1000 iterations.
numGp Integer value indicating the number of groups that should be formed from the list of IDs. Default is 1.
updateProgress Function or NULL. If this function is defined, it will be called during each iteration to update a shiny::Progress object.
harem Logical variable when set to TRUE, the formed groups have a single male at least minAge old.
sexRatio Numeric value indicating the ratio of females to males x from 0.5 to 20 by increments of 0.5.
withKin Logical variable when set to TRUE, the kinship matrix for the group is returned along with the group and score. Defaults to not return the kinship matrix. This maintains compatibility with earlier versions.
We will use the trimmedPed pedigree in our code.
For illustration purposes we are going to keep the numbers of candidates, groups, and iterations fairly small.
We will get first some animal IDs to use for our candidates by selecting animals at least 2 years old at the time this pedigree was sampled (01-01-2015).
candidates <- trimmedPed$id[trimmedPed$birth < as.Date("2013-01-01") &
!is.na(trimmedPed$birth) &
is.na(trimmedPed$exit)]
table(trimmedPed$sex[trimmedPed$id %in% candidates])
##
## F M H U
## 184 96 0 0
Our candidates are made up of 184 females and 96 males. The parameters currentGroups, threshold, ignore, minAge, sexRatio, withKin, and updateProgress are allowed to take their default values. The setting of the sexRatio parameter to 0 is ignored in the following call of the groupAddAssign function. This is consistent with the a value of 0 making little since in a breeding colony.
The empty seventh group at the bottom is evidence that all of the candidate animals could be placed in a group without exceeding the default value of 0.015625.
The following group assignments will be forming harem groups. This is done by setting harem to . Setting iter
to 100 or more will increase optimal composition of breeding groups
haremGrp <- groupAddAssign(candidates = candidates,
kmat = trimmedGeneticValue[["kinship"]],
ped = trimmedPed,
iter = 10,
numGp = 6,
harem = TRUE)
haremGrp$group
## [[1]]
## [1] "G2GYST" "XEC0M5" "RJ4JPC" "72LYDE" "SCFSBF" "13B1QL" "1QVS67" "0HYZ23"
## [9] "BKWE4D" "321LLB" "NN3GDQ" "DCJJYS" "XYRDKV" "7NE2UT" "PYPM1W" "ESUIAF"
## [17] "AFZKBS" "3DTD2N" "YTJ2UL" "2F6J3U" "0IIAEN" "PBAFJF" "W5WIRP" "N5QBWD"
## [25] "PVY432" "W0GUKI" "D9P18Y" "G8MCV7"
##
## [[2]]
## [1] "1E8KD1" "S222R3" "X694YR" "CMMUKU" "WNEAS6" "DI4AHD" "Q8U9LB" "GCBYDW"
## [9] "WTE53B" "QW2Z3R" "LS184H" "B228Q6" "TQEMY6" "LYSLPP" "BS3RLE" "1SSCJC"
## [17] "B1WVCN" "5ERY5Z" "EZ2F8A" "N4NV8B" "CHK1ZX" "S056D5" "Y6DB6L" "D4B0RM"
## [25] "QCA36T" "83HQBN" "E5Q33K" "TYEWF1" "9MG040" "MPIQ4N" "EMV4P6" "WLMGS1"
## [33] "30J3CQ"
##
## [[3]]
## [1] "WNKKW3" "T3QPW5" "S5H1GC" "FJS7RQ" "KX0RJ3" "3GECJJ" "1SPLS8" "DPXEQE"
## [9] "3SKITJ" "QQMBT1" "46ZHKN" "VWC5ZH" "I5CI33" "ZH3YG1" "5KWNMZ" "H2J6UA"
## [17] "5W621W" "N79QXB" "G58RGY" "AIHJ8Z" "G25E3F" "F45799" "8JUUJ9" "JLFKV8"
## [25] "TEACA3" "3YJIMV" "PI4VHT" "KEA4QG" "WKY2SZ" "MKY9TK"
##
## [[4]]
## [1] "CHJ9D2" "GAS52W" "HE0SCR" "ZQXZYB" "DHNQ1W" "1CZM30" "87AQLF" "WK89I9"
## [9] "IH1KPA" "XFWVVX" "MH88T6" "ZPS15A" "YLRNIK" "QWKFBH" "967Y3D" "MYUMMX"
## [17] "C18V6I" "J3F6PD" "7ZNY75" "EX5K0S" "1VP3UC" "AR17R5" "6F9FB8" "Y0TCYX"
## [25] "D33J06" "WI38KZ" "MX4J7G" "BCJJKN" "Q7U139" "ILVQVB" "QRZK48" "38K2SR"
## [33] "2Z4YLY" "MB6NYQ"
##
## [[5]]
## [1] "T38W6H" "R5AYJK" "0SGJ12" "SHG3RB" "AP1YLW" "GIIEUD" "5IAFMK" "CLSVU6"
## [9] "5BPBUI" "S3EBGZ" "FG0SFA" "NK802Y" "92UG4N" "FB5L3N" "ZATMEE" "1GF3GM"
## [17] "KZY6PD" "Z904TJ" "TXZUKC" "0XTZQ1" "AZ3L0D" "465ERA" "6X6BG9" "5EDIEE"
## [25] "50D77I" "1KJ2MG" "7RA57Q" "CS23RV" "I8ABC7" "1CIRC9" "0X4W26" "QCENKM"
## [33] "01QRQ4" "MTCAIG" "M9PVG5" "PJ72W1"
##
## [[6]]
## [1] "AZ4D19" "7B9CA6" "K3TNHP" "S7IWWA" "CRPXY7" "MFKT9C" "PU7RSG" "W6MDVK"
## [9] "LVYYNY" "FL170P" "9P0DES" "WJXIH9" "DH9WJQ" "DKIM6U" "GTLA8R" "AR5U44"
## [17] "Q17CG3" "6KWVRI" "SH3FB7" "B134XZ" "414N7M" "DRXMW4" "99BMJW" "1FAZ0K"
## [25] "YFCIHJ" "AW400C" "RVHVTZ" "F7I2ED" "5EDLL7"
##
## [[7]]
## [1] NA
We can identify and list the males in each group with the following code.
sapply(haremGrp$group, function(ids) {ids[ids %in% trimmedPed$id[trimmedPed$sex == "M"]]})
## [[1]]
## [1] "G2GYST"
##
## [[2]]
## [1] "1E8KD1"
##
## [[3]]
## [1] "WNKKW3"
##
## [[4]]
## [1] "CHJ9D2"
##
## [[5]]
## [1] "T38W6H"
##
## [[6]]
## [1] "AZ4D19"
##
## [[7]]
## logical(0)
It is easy to notice that the male is listed first within each breeding group.
We can also see the number of animals and the sex ratios created in each group. Since these are harem groups the sex ratios are determined by the number of animals in the individual groups.
lines <- sapply(haremGrp$group, function(ids) {
paste0("Count: ", length(ids), " Sex Ratio: ",
round(calculateSexRatio(ids, trimmedPed), 2))})
for (line in lines) print(line)
## [1] "Count: 28 Sex Ratio: 27"
## [1] "Count: 33 Sex Ratio: 32"
## [1] "Count: 30 Sex Ratio: 29"
## [1] "Count: 34 Sex Ratio: 33"
## [1] "Count: 36 Sex Ratio: 35"
## [1] "Count: 29 Sex Ratio: 28"
## [1] "Count: 1 Sex Ratio: Inf"
Examination of this table shows that of the 184 females 157 are included.
The following group assignments will be forming harem groups. This is done by setting harem to .
sexRatioGrp <- groupAddAssign(candidates = candidates,
kmat = trimmedGeneticValue[["kinship"]],
ped = trimmedPed,
iter = 10,
numGp = 6,
sexRatio = 9)
sexRatioGrp$group
## [[1]]
## [1] "SH3FB7" "EEGLWY" "N5QBWD" "WKY2SZ" "9MG040" "Y6DB6L" "DH9WJQ" "QCA36T"
## [9] "R5AYJK" "GIIEUD" "RJ4JPC" "Q17CG3" "PI4VHT" "A98D7P" "7RA57Q" "GCBYDW"
## [17] "PBAFJF" "DI4AHD" "EMV4P6" "92UG4N" "VWC5ZH" "1KJ2MG" "BCJJKN" "5EDIEE"
## [25] "A6A1M1"
##
## [[2]]
## [1] "6F9FB8" "0V4SAC" "5EDLL7" "1FAZ0K" "5BPBUI" "01QRQ4" "0X4W26" "W0GUKI"
## [9] "CRPXY7" "1QVS67" "FJS7RQ" "HE0SCR" "F45799" "KXHGRH" "RVHVTZ" "XEC0M5"
## [17] "3YJIMV" "99BMJW" "YLRNIK" "K3TNHP" "2Z4YLY" "LVYYNY" "EZ2F8A" "B134XZ"
## [25] "R6HV9A"
##
## [[3]]
## [1] "83HQBN" "MEUZ85" "BKWE4D" "Q7U139" "1SSCJC" "PU7RSG" "IH1KPA" "LYSLPP"
## [9] "50D77I" "FL170P" "465ERA" "FB5L3N" "EX5K0S" "GDXWJ1" "KX0RJ3" "BS3RLE"
## [17] "T3QPW5" "YFCIHJ" "TQEMY6" "PJ72W1" "G8MCV7" "46ZHKN" "8JUUJ9" "GAS52W"
## [25] "5KFB90"
##
## [[4]]
## [1] "1CZM30" "80F2MI" "1VP3UC" "G58RGY" "W5WIRP" "ZPS15A" "3SKITJ" "MPIQ4N"
## [9] "XYRDKV" "B1WVCN" "W6MDVK" "1SPLS8" "AR17R5" "ZW2X4N" "XFWVVX" "6X6BG9"
## [17] "S3EBGZ" "414N7M" "C18V6I" "Y0TCYX" "S222R3" "72LYDE" "E5Q33K" "ILVQVB"
## [25] "KZM9RB"
##
## [[5]]
## [1] "87AQLF" "3QHAFI" "0SGJ12" "MTCAIG" "AFZKBS" "WNEAS6" "38K2SR" "DPXEQE"
## [9] "321LLB" "WTE53B" "S5H1GC" "MKY9TK" "I5CI33" "IZDV8K" "S056D5" "AP1YLW"
## [17] "5ERY5Z" "DKIM6U" "PVY432" "ZH3YG1" "3GECJJ" "I8ABC7" "NN3GDQ" "ZATMEE"
## [25] "G2GYST"
##
## [[6]]
## [1] "MB6NYQ" "CHSCFG" "ESUIAF" "7NE2UT" "WI38KZ" "7B9CA6" "TYEWF1" "FG0SFA"
## [9] "B228Q6" "CS23RV" "QCENKM" "QRZK48" "MX4J7G" "K7900I" "KZY6PD" "MH88T6"
## [17] "967Y3D" "M9PVG5" "MFKT9C" "QWKFBH" "AZ3L0D" "D9P18Y" "5KWNMZ" "1CIRC9"
## [25] "LN1DLY" "AW400C" "MYUMMX" "X694YR" "N79QXB" "TXZUKC" "S7IWWA" "J3F6PD"
## [33] "LS184H" "Q8U9LB" "JSAP3H"
##
## [[7]]
## [1] "CLSVU6" "5IAFMK" "HLQ9SY" "B2CKHA" "DCJJYS" "TR5L57" "WK89I9" "XC304E"
## [9] "Z7NBA2" "1E8KD1" "5PW7WT" "AEP5EG" "BW10CL" "CFD12A" "CHJ9D2" "D33J06"
## [17] "D4B0RM" "FTVE03" "IRFJ09" "LMJWTN" "N4NV8B" "Q9LWGX" "RNQU14" "SCFSBF"
## [25] "TEACA3" "09LFE4" "13B1QL" "1GF3GM" "2F6J3U" "3DTD2N" "3MMZD4" "55BPSE"
## [33] "5XVTVH" "6KWVRI" "8IG767" "9FRCIE" "ER464J" "F7I2ED" "FFGPS4" "FG6L7S"
## [41] "G25E3F" "GTLA8R" "H2J6UA" "NHWTJ9" "P7RBPI" "TBCE78" "YI16QD" "0HYZ23"
## [49] "2F1IV1" "30J3CQ" "4LHK19" "59NYZE" "5IYDXN" "6KLWVC" "8TV4MT" "AZ4D19"
## [57] "BTTHAJ" "CHK1ZX" "DHNQ1W" "DRXMW4" "FX9E4X" "G91ZM6" "J1R2EW" "JLFKV8"
## [65] "LDND6J" "MQT080" "NK802Y" "NSIC4I" "PHB6TE" "PYPM1W" "QRWYQZ" "QW2Z3R"
## [73] "RY1AZM" "SHG3RB" "WHQLH5" "WLMGS1" "WQUN84" "XL658N" "XX0GYV" "YHHVC7"
## [81] "YP910X" "YTJ2UL" "Z904TJ" "ZQXZYB" "0IIAEN" "0X1RZ9" "0XTZQ1" "3YHBC1"
## [89] "55VDSQ" "5W621W" "653J82" "6MEP2C" "76DIT4" "7ZNY75" "80KACX" "9P0DES"
## [97] "AIHJ8Z" "B2YJJP" "CMMUKU" "E3JP0C" "FLIZQI" "GM371F" "KEA4QG" "PA9F3J"
## [105] "QQMBT1" "SXSVEH" "T38W6H" "TJN1AD" "WJXIH9" "WNKKW3" "XY2CK7" "XZH41H"
## [113] "YDRD81" "Z25D52" "ZDRSG0" "3P9BX6" "7D09WH" "AR5U44" "DGZLV3" "S63QDN"
Again we can identify and list the males in each group with the following code.
sapply(sexRatioGrp$group, function(ids) {ids[ids %in% trimmedPed$id[trimmedPed$sex == "M"]]})
## [[1]]
## [1] "EEGLWY" "A98D7P" "A6A1M1"
##
## [[2]]
## [1] "0V4SAC" "KXHGRH" "R6HV9A"
##
## [[3]]
## [1] "MEUZ85" "GDXWJ1" "5KFB90"
##
## [[4]]
## [1] "80F2MI" "ZW2X4N" "KZM9RB"
##
## [[5]]
## [1] "3QHAFI" "IZDV8K" "G2GYST"
##
## [[6]]
## [1] "CHSCFG" "K7900I" "LN1DLY" "JSAP3H"
##
## [[7]]
## [1] "HLQ9SY" "B2CKHA" "TR5L57" "XC304E" "Z7NBA2" "1E8KD1" "5PW7WT" "AEP5EG"
## [9] "BW10CL" "CFD12A" "CHJ9D2" "FTVE03" "IRFJ09" "LMJWTN" "Q9LWGX" "RNQU14"
## [17] "09LFE4" "3MMZD4" "55BPSE" "5XVTVH" "8IG767" "9FRCIE" "ER464J" "FFGPS4"
## [25] "FG6L7S" "NHWTJ9" "P7RBPI" "TBCE78" "YI16QD" "2F1IV1" "4LHK19" "59NYZE"
## [33] "5IYDXN" "6KLWVC" "8TV4MT" "AZ4D19" "BTTHAJ" "FX9E4X" "G91ZM6" "J1R2EW"
## [41] "LDND6J" "MQT080" "NSIC4I" "PHB6TE" "QRWYQZ" "RY1AZM" "WHQLH5" "WQUN84"
## [49] "XL658N" "XX0GYV" "YHHVC7" "YP910X" "0X1RZ9" "3YHBC1" "55VDSQ" "653J82"
## [57] "6MEP2C" "76DIT4" "80KACX" "B2YJJP" "E3JP0C" "FLIZQI" "GM371F" "PA9F3J"
## [65] "SXSVEH" "T38W6H" "TJN1AD" "WNKKW3" "XY2CK7" "XZH41H" "YDRD81" "Z25D52"
## [73] "ZDRSG0" "3P9BX6" "7D09WH" "DGZLV3" "S63QDN"
We can also see the number of animals and the sex ratios created in each group.
lines <- sapply(sexRatioGrp$group, function(ids) {
paste0("Count: ", length(ids), " Sex Ratio: ",
round(calculateSexRatio(ids, trimmedPed), 2))})
for (line in lines) print(line)
## [1] "Count: 25 Sex Ratio: 7.33"
## [1] "Count: 25 Sex Ratio: 7.33"
## [1] "Count: 25 Sex Ratio: 7.33"
## [1] "Count: 25 Sex Ratio: 7.33"
## [1] "Count: 25 Sex Ratio: 7.33"
## [1] "Count: 35 Sex Ratio: 7.75"
## [1] "Count: 120 Sex Ratio: 0.56"
Examination of this table shows that of the 184 females 249 are included.
As stated earlier you can see which types of errors are detected by qcStudbook by looking at names returned by getEmptyErrorLst() as shown below.
## [1] "failedDatabaseConnection" "missingColumns"
## [3] "invalidDateRows" "suspiciousParents"
## [5] "femaleSires" "maleDams"
## [7] "sireAndDam" "duplicateIds"
## [9] "fatalError" "changedCols"
Each is defined below.
Error | Definition |
---|---|
failedDatabaseConnection | Database connection failed: configuration or permissions are invalid |
missingColumns | Columns that must be within the pedigree file are missing. |
invalidDateRows | Values, which are supposed to be dates, cannot be interpreted as a date. |
suspiciousParents | Parents were too young on the date of birth of to have been the parent. |
femaleSires | Individuals listed as female or hermaphroditic and as a sire. |
maleDams | Individuals are listed as male and as a dam. |
sireAndDam | Individuals who are listed as both a sire and a dam. |
duplicateIds | IDs listed more than once. |
fatalError | Fatal Errors. |
changedCols | Columns that have been changed to conform to internal naming conventions and what they were changed to. |
We are going to use the small imaginary pedigree listed below that has multiple errors to discuss pedigree error detection and reporting. First note the birth dates of ego_id o4 (2006-04-13) and the purported sire s2 (2006-06-19). Note also the purported birth date of the d2 and the birth dates of her offspring. Obviously dates or IDs may be incorrect.
This is the pedigree. (We will discuss the column names shortly.)
ego_id | si re | dam_id | sex | birth_date |
---|---|---|---|---|
s1 | NA | NA | F | 2000-07-18 |
d1 | NA | NA | M | 2003-04-13 |
s2 | NA | NA | M | 2006-06-19 |
d2 | NA | NA | F | 2015-09-16 |
o1 | s1 | d1 | F | 2015-02-04 |
o2 | s1 | d2 | F | 2009-03-17 |
o3 | s2 | d2 | F | 2012-04-11 |
o4 | s2 | d2 | M | 2006-04-13 |
If we try to convert this pedigree file into the standardized studbook format, we are going to get an error message and the creation of a file in the R sessions temporary directory that lists the records that have generated the errors.
pedOne <- nprcgenekeepr::pedOne # put it in the local environment
ped <- qcStudbook(pedOne, minParentAge = 0)
## Error in qcStudbook(pedOne, minParentAge = 0): Parents with low age at birth of offspring are listed in /var/folders/y3/5z1skj6s5tq0pktmsq6x80v80000gn/T//RtmpH9VBLT/lowParentAge.csv.
The contents of lowParentAge.csv is shown below.
dam | sire | id | sex | birth | recordStatus | exit | sireBirth | damBirth | sireAge | damAge |
---|---|---|---|---|---|---|---|---|---|---|
d2 | s1 | o2 | F | 2009-03-17 | original | NA | 2000-07-18 | 2015-09-16 | 8.66 | -6.50 |
d2 | s2 | o3 | F | 2012-04-11 | original | NA | 2006-06-19 | 2015-09-16 | 5.81 | -3.43 |
d2 | s2 | o4 | M | 2006-04-13 | original | NA | 2006-06-19 | 2015-09-16 | -0.18 | -9.43 |
Examination of the ages of the parents reveals the issues being reported.
We can remove the errors by changing the birth dates of o4 from 2006-04-13 to 2015-09-16 and of d2 from 2015-09-16 to 2006-04-13.
pedOne$birth_date[pedOne$ego_id == "o4"] <- as.Date("2015-09-16")
pedOne$birth_date[pedOne$ego_id == "d2"] <- as.Date("2006-04-13")
Note the changes made to the column names between the original pedOne pedigree and the pedigree (ped) we get from qcStudbook. We have chosen to limit the displayed pedigree by selecting the ego_id’s and id’s in pedOne and ped respectively.
ped <- qcStudbook(pedOne, minParentAge = 0)
ped[ped$id %in% c("s2", "d2", "o3", "o4"), ]
## id sire dam sex gen birth exit age recordStatus
## 2 d2 <NA> <NA> F 0 2006-04-13 <NA> 14.9 original
## 4 s2 <NA> <NA> M 0 2006-06-19 <NA> 14.8 original
## 7 o3 s2 d2 F 1 2012-04-11 <NA> 8.9 original
## 8 o4 s2 d2 M 1 2015-09-16 <NA> 5.5 original
However, the preferred method of creating the standardized studbook format with qcStudbook is to examine all errors found and correcting them before proceeding. This is done by explicitly requesting that all aspects inconsistent with the studbook format be identified by setting reportChanges and reportErrors to .
errorList <- qcStudbook(pedOne, minParentAge = 0, reportChanges = TRUE,
reportErrors = TRUE)
summary(errorList)
## Error: The animal listed as a sire and also listed as a female is: s1.
## Error: The animal listed as a dam and also listed as a male is: d1.
## Change: The column where space was removed is: si re to sire.
## Change: The columns where underscore was removed are: ego_id, dam_id, and birth_date to egoid, damid, and birthdate.
## Change: The column changed from: egoid to id.
## Change: The column changed from: damid to dam.
## Change: The column changed from: birthdate to birth.
##
## Please check and correct the pedigree file.
##
We will discuss each of these newly identified errors in a moment, however, let’s look at shortening this report, because often you will not be interested in the more trivial changes to the column names made by qcStudbook and in those cases you simply choose not to report changes to the column names as is shown here by setting reportChanges to . For this illustration, we are going to bring back the original copy of pedOne to see how the suspicious parents are reported by the summary function.
pedOne <- nprcgenekeepr::pedOne
errorList <- qcStudbook(pedOne, minParentAge = 0, reportChanges = FALSE,
reportErrors = TRUE)
options(width = 90)
summary(errorList)
## Error: The animal listed as a sire and also listed as a female is: s1.
## Error: The animal listed as a dam and also listed as a male is: d1.
##
## Please check and correct the pedigree file.
##
## Animal records where parent records are suspicous because of dates.
## One or more parents appear too young at time of birth.
## dam sire id sex birth recordStatus exit sireBirth damBirth sireAge damAge
## 2 d2 s1 o2 F 2009-03-17 original <NA> 2000-07-18 2015-09-16 8.66 -6.5
## 3 d2 s2 o3 F 2012-04-11 original <NA> 2006-06-19 2015-09-16 5.81 -3.4
## 4 d2 s2 o4 M 2006-04-13 original <NA> 2006-06-19 2015-09-16 -0.18 -9.4
The first two errors mentioned are of particular interest. Currently qcStudbook automatically changes the sex of dams to F (female) and sires to M (male) when reportErrors is set to .
This feature is not supported within the Shiny application and is not fully implemented.
To use the findLoops function run the following code and select a pedigree as your input file that has a loop in it. We are continuing to use the example pedigree that comes with the software Example_Pedigree.csv.
exampleTree <- createPedTree(breederPed)
exampleLoops <- findLoops(exampleTree)
You can count how many loops you have with the following code.
length(exampleLoops[exampleLoops == TRUE])
## [1] 145
nLoops <- countLoops(exampleLoops, exampleTree)
sum(unlist(nLoops[nLoops > 0]))
## [1] 258
You can list the first 10 sets of ids, sires and dams in loops with the following line of code:
examplePedigree[exampleLoops == TRUE, c("id", "sire", "dam")][1:10, ]
## id sire dam
## 2519 V49H3Y UFI88T 9T7Y2Z
## 2572 61FUGE UDQ5WC GL88CF
## 2695 LWJ3A5 KZM9RB GCBYDW
## 2722 RNQU14 H2RDE2 DKIM6U
## 2752 L9M1DC 3PU50K WFQENR
## 2755 Q8U9LB 3PU50K CLSVU6
## 2905 FVJ14K UXC40T L5VC2M
## 2922 531HAC UMV4BE 5DIPZN
## 2924 85ESBB UQFY9C Q2RK1E
## 2941 0VLW56 6KPKH7 MMEHXV
elapsed_time <- get_elapsed_time_str(start_time)
The current date and time is 2021-03-22 08:59:21. The processing time for this document was 19 seconds..
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## Random number generation:
## RNG: Mersenne-Twister
## Normal: Inversion
## Sample: Rounding
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] nprcgenekeepr_1.0.5 knitr_1.31 ggplot2_3.3.3 stringi_1.5.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 lubridate_1.7.10 lattice_0.20-41 assertthat_0.2.1
## [5] rprojroot_2.0.2 digest_0.6.27 utf8_1.2.1 mime_0.10
## [9] cellranger_1.1.0 R6_2.5.0 backports_1.2.1 WriteXLS_6.2.0
## [13] futile.options_1.0.1 evaluate_0.14 highr_0.8 httr_1.4.2
## [17] pillar_1.5.1 rlang_0.4.10 readxl_1.3.1 rstudioapi_0.13
## [21] jquerylib_0.1.3 Matrix_1.3-2 checkmate_2.0.0 rmarkdown_2.7
## [25] pkgdown_1.6.1 labeling_0.4.2 textshaping_0.3.3 desc_1.3.0
## [29] stringr_1.4.0 htmlwidgets_1.5.3 munsell_0.5.0 shiny_1.6.0
## [33] anytime_0.3.9 compiler_4.0.4 httpuv_1.5.5 xfun_0.22
## [37] pkgconfig_2.0.3 systemfonts_1.0.1 htmltools_0.5.1.1 tidyselect_1.1.0
## [41] htmlTable_2.1.0 tibble_3.1.0 fansi_0.4.2 crayon_1.4.1
## [45] dplyr_1.0.5 withr_2.4.1 later_1.1.0.1 grid_4.0.4
## [49] jsonlite_1.7.2 xtable_1.8-4 gtable_0.3.0 lifecycle_1.0.0
## [53] DBI_1.1.1 magrittr_2.0.1 formatR_1.8 scales_1.1.1
## [57] debugme_1.1.0 cachem_1.0.4 farver_2.1.0 fs_1.5.0
## [61] promises_1.2.0.1 futile.logger_1.4.4 bslib_0.2.4 ellipsis_0.3.1
## [65] ragg_1.1.1 generics_0.1.0 vctrs_0.3.6 lambda.r_1.2.4
## [69] tools_4.0.4 Rlabkey_2.6.0 glue_1.4.2 purrr_0.3.4
## [73] plotrix_3.8-1 fastmap_1.1.0 yaml_2.2.1 colorspace_2.0-0
## [77] memoise_2.0.0 sass_0.3.1
Setting the minParentAge to 3.5 and above will cause an error along with the creation of a file ~/lowParentAge.csv that will list the parents with low age at the birth of an offspring.↩︎
The population column is created and added to the pedigree object if it does not already exist.↩︎
All animals within the colony have a known birth date.↩︎