Estimates the Dissimilarity Index (DI) and the Area of Applicability
(AOA) for new data given the training data and a fitted model. This
function is designed to be applied directly to the output of
h3sdm_predict(), so that both the predicted values and the AOA
are available in a single sf object ready for mapping.
Arguments
- newdata
An
sfobject, typically the direct output ofh3sdm_predict(), containing the predictor variables and apredictioncolumn.- train
An
sfobject, typically the output ofh3sdm_data(), containing the training observations with the same predictor variables used infit_object.- fit_object
The list returned by
h3sdm_fit_model(), used to extract the model formula (predictor variable names) and variable importances when available.- cv
An
rsetobject returned byh3sdm_spatial_cv(), used to extract cross-validation fold assignments. IfNULL(default), Leave-One-Out (LOO) cross-validation is used.- verbose
Logical. Should progress messages be printed? Default
TRUE.
Value
The input newdata sf object with two additional
columns:
DINumeric. Dissimilarity Index for each observation. Values near 0 indicate high similarity to the training data; larger values indicate increasing dissimilarity.
AOAInteger.
1= inside the AOA;0= outside the AOA.
Details
The algorithm follows Meyer & Pebesma (2021). Predictor variables are
extracted automatically from the model formula inside
fit_object. They are standardized using z-score scaling
computed from train, then optionally weighted by variable
importance. The mean nearest-neighbor distance among training points
(trainDist_avrgmean) is used to normalize DI values. The AOA
threshold is the maximum cross-validated training DI after removing
outliers with Tukey's rule (Q3 + 1.5 * IQR). Locations with
DI <= threshold are inside the AOA; locations above the
threshold should be interpreted with caution.
Variable importance is extracted automatically for ranger and
xgboost models via vip::vi(). For GAM models, or when
importance cannot be extracted, all variables receive equal weight.
References
Meyer, H., Pebesma, E. (2021): Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution 12: 1620–1633. doi:10.1111/2041-210X.13650
Examples
if (FALSE) { # \dontrun{
cv <- h3sdm_spatial_cv(dat, method = "block")
fit <- h3sdm_fit_model(workflow, cv)
pred <- h3sdm_predict(fit, new_data = h7)
result <- h3sdm_aoa(pred, train = dat, fit_object = fit, cv = cv)
} # }