# Simulation of plot errors and phenotypes in a plant breeding field trial

Source:`vignettes/spatial_variation_demo.Rmd`

`spatial_variation_demo.Rmd`

###
**Background**

The **R package ‘FieldSimR’**
enables simulation of multi-environment plant breeding trial phenotypes
through the simulation of plot errors and their subsequent combination
with (simulated) genetic values. Its **core function generates
plot errors** comprising 1) a spatially correlated error term, 2)
a random error term, and 3) an extraneous error term. Spatially
correlated errors are simulated using either bivariate interpolation, or
a two-dimensional autoregressive process of order one (AR1:AR1).The
three error terms are combined at a user-deﬁned ratio.

**This document demonstrates how to:**

- Simulate plot errors for multiple traits tested in multiple environments, and
- Simulate phenotypes through the combination of the plot errors with simulated genetic values.

The **simulation of plot errors** requires specification
of various simulation parameters that define:

- The field trial layout
- The total error variance
- The spatial error
- Extraneous variation

**Phenotypes** are then simulated through the
combination of the plot errors with the genetic values stored in the
package’s example data frame `gv_df_unstr`

. The data frame
contains simulated genetic values for two traits in three environments
based on an unstructured model for genotype-by-environment (GxE)
interaction. The simulation of the genetic values is shown in the
vignette on the Simulation
of genetic values based on an unstructured model for
genotype-by-environment (GxE) interaction.

###
**1. Simulation of plot errors**

####
**1.1 Field trial layout**

We conceive a scenario in which 100 maize hybrids are measured for grain yield (t/ha) and plant height (cm) in three environments. The first and the third environment include two blocks, and the second environment includes three blocks. Each block comprises 20 rows and 5 columns. The blocks are arranged in column direction (“side-by-side”). The plot length (column direction) is 8 meters, and the plot width (row direction) is 2 meters.

```
ntraits <- 2 # Number of traits
nenvs <- 3 # Number of environments
nblocks <- c(2, 2, 3) # Number of blocks per environment
block_dir <- "col" # Arrangement of blocks ("side-by-side")
ncols <- c(10, 10, 15) # Number of columns per environment
nrows <- 20 # Number of rows per environment
plot_length <- 8 # Plot length; here in meters (column direction)
plot_width <- 2 # Plot width; here in meters (row direction)
```

**Note:** `plot_length`

and
`plot_width`

are only required when
`spatial_model = "Bivariate"`

. The two arguments are used to
set the x-coordinates and y-coordinates required for the bivariate
interpolation algorithm to model a spatial correlation between plots.
Therefore, the assumed unit of length (meters here) has no actual
meaning, and the ratio between `plot_length`

and
`plot_width`

is important rather than the absolute values. We
recommend to use realistic, absolute values for `plot_length`

and `plot_width`

regardless of the unit of length.

When spatially correlated errors are simulated based on a
**two-dimensional autoregressive process (AR1:AR1)**,
`col_cor`

and `row_cor`

have to be defined
instead.

####
**1.2 Total error**

To obtain pre-defined target heritabilities at the plot-level, we
need to define the total error variances for the two simulated traits
relative to their genetic variances. We assume broad-sense
heritabilities at the plot level of
*H ^{2}*

**= 0.3 for grain yield**and

*H*^{2}**= 0.5 for plant height**at all three environments. The heritabilities for the six trait x environment combinations are stored in a single vector.

`H2 <- c(0.3, 0.3, 0.3, 0.5, 0.5, 0.5) # c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)`

The total genetic variances of the six trait x environment combinations (environments nested within traits) can be extracted from the description of the simulation of the genetic values in FieldSimR::unstructured_GxE_demo.

`var <- c(0.086, 0.12, 0.06, 15.1, 8.5, 11.7) # c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)`

We now create a simple function to calculate the total error
variances based on our pre-defined target heritabilities in vector
`H2`

and the total genetic variances in `var`

.

```
# Calculation of error variances based on the genetic variance and target heritability vectors.
calc_varR <- function(var, H2) {
varR <- (var / H2) - var
return(varR)
}
varR <- calc_varR(var, H2)
round(varR, 2) # Vector of error variances: c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)
#> [1] 0.20 0.28 0.14 15.10 8.50 11.70
```

####
**1.3 Spatial error**

We simulate a spatial error term using bivariate interpolation and assume the proportion of spatial error variance to total error variance to be 0.4 in all three environments. Additionally, we assume a correlation between the spatial error of the two traits. However, since we have no information on the magnitude of the correlation, we randomly sample a value between 0 and 0.5.

```
spatial_model <- "Bivariate" # Spatial error model.
prop_spatial <- 0.4 # Proportion of spatial trend.
ScorR <- rand_cor_mat(ntraits, min.cor = 0, max.cor = 0.5, pos.def = TRUE)
#> 'cor_mat' is already positive (semi)-definite, matrix was not altered
round(ScorR, 2)
#> 1 2
#> 1 1.00 0.04
#> 2 0.04 1.00
```

####
**1.4 Extraneous variation**

Extraneous effects can, for example, result from trial management procedures in row and/or column direction, such as soil tillage, sowing or harvesting. We want to simulate extraneous variation in row direction and assume the proportion of extraneous error variance to total error variance to be 0.2. We assume a correlation between the error of the two traits because of extraneous variation and randomly sample a value between 0 and 0.5.

```
ext_ord <- "zig-zag"
ext_dir <- "row"
prop_ext <- 0.2
EcorR <- rand_cor_mat(ntraits, min.cor = 0, max.cor = 0.5, pos.def = TRUE)
#> 'cor_mat' is already positive (semi)-definite, matrix was not altered
round(EcorR, 2)
#> 1 2
#> 1 1.00 0.42
#> 2 0.42 1.00
```

**Note:** the **proportion of the random error
variance to total error variance** is defined as 1 -
(prop_spatial + prop_ext). Hence, `prop_spatial`

and
`prop_ext`

are set with reference to the random error, and
the sum of the two proportions must not be greater than 1.

####
**1.5 Simulation of plot errors**

Finally, we use all the parameters defined above with the function
`field_trial_error()`

to simulate plot errors for grain yield
and plant height in the three test environments:

```
error_ls <- field_trial_error(
ntraits = ntraits,
nenvs = nenvs,
nblocks = nblocks,
block.dir = block_dir,
ncols = ncols,
nrows = nrows,
plot.length = plot_length,
plot.width = plot_width,
varR = varR,
ScorR = ScorR,
EcorR = EcorR,
RcorR = NULL,
spatial.model = spatial_model,
prop.spatial = prop_spatial,
ext.ord = ext_ord,
ext.dir = ext_dir,
prop.ext = prop_ext,
return.effects = TRUE
)
```

**Note:** by default, the function
`field_trial_error()`

generates a data frame with the
following columns: ** environment id**
(environment),

**,**

*block id***,**

*column id***, and the**

*row id***. When**

*total plot error for each trait*`return_effects = TRUE`

, ‘FieldSimR’
returns a list with an additional entry for each trait containing the
spatial error, the extraneous effect and the random error.We will now plot the **total error for grain yield (“e.Trait1”)
in Environment 1**, as well as the **spatial error
component**, the **random error component**, and the
**extraneous variation**. Therefore, we first extract the
required data from `error_df`

and then create an individual
graphic for each of the four simulated error terms.

```
e_total_env1 <- error_ls$error.df[error_ls$error.df$env == 1, ]
e_terms_env1 <- error_ls$Trait1[error_ls$Trait1$env == 1, ]
```

**Total plot error**

`plot_effects(e_total_env1, effect = "e.Trait1", labels = TRUE)`

**Spatial error** simulated using bivariate
interpolation

`plot_effects(e_terms_env1, effect = "e.spat", labels = TRUE)`

**Random error**

`plot_effects(e_terms_env1, effect = "e.rand", labels = TRUE)`

**Extraneous variation **in the row direction

`plot_effects(e_terms_env1, effect = "e.ext.row")`

###
**2. Simulation of plot-level phenotypes**

To simulate **grain yield and plant height phenotypes**
for our multi-environment maize experiment, we now combine the simulated
plot errors with the genetic values stored in the example data frame
`gv_df_unstr`

. This is done using the function
`make_phenotypes()`

, which allocates genotypes to plots
within blocks according to a randomized complete block design
(RCBD).

**Note:** more complex field designs require allocation
of genotypes to plots using an external R package or software.

```
gv_df <- gv_df_unstr
pheno_df <- make_phenotypes(
gv_df,
error_ls$error.df,
randomise = TRUE
)
pheno_env1 <- pheno_df[pheno_df$env == 1, ] # Extract phenotypes in environment 1.
```

**Simulated phenotypes**

`plot_effects(pheno_env1, effect = "y.Trait1")`

**Histogram showing the phenotypes of the 100 maize hybrids for
grain yield in the two blocks in Environment 1**