Skip to contents

Generates sampling points using the maxvol algorithm, which selects locations by maximizing the determinant (volume) of a feature submatrix. This is a D-optimal design approach that selects points with the most diverse feature characteristics.

Usage

ss_maxvol(
  x,
  n,
  features = NULL,
  coords = NULL,
  min_dist = NULL,
  normalize = TRUE,
  add_coords = TRUE,
  tol = 1.1,
  max_iters = 100,
  verbose = FALSE,
  seed = NULL
)

Arguments

x

An sf object representing the study area with feature attributes, OR a numeric matrix of features (rows = locations, columns = features).

n

Integer, the number of sampling points to select.

features

Character vector of feature names to use from x if it's an sf object. If NULL, all numeric attributes are used.

coords

Matrix (n_locations x 2) of coordinates. Required if x is a matrix. If x is sf, coordinates are extracted automatically.

min_dist

Numeric, minimum distance between sampling points. If NULL, no distance constraint is applied.

normalize

Logical, whether to normalize features before applying maxvol (default TRUE). Recommended when features have different scales.

add_coords

Logical, whether to add coordinates as features (default TRUE). This helps balance feature space and geographic space.

tol

Numeric, convergence tolerance for maxvol algorithm (default 1.1).

max_iters

Integer, maximum iterations for maxvol (default 100).

verbose

Logical, whether to print progress messages (default FALSE).

seed

Optional integer seed for reproducibility.

Value

An object of class ss_samples containing:

samples

An sf object with the sampling points.

method

Character, "maxvol".

n_samples

Integer, number of samples.

features_used

Character vector of feature names used.

converged

Logical, whether maxvol converged.

iterations

Integer, number of maxvol iterations.

crs

The coordinate reference system.

Details

The maxvol algorithm selects sampling locations by finding a submatrix of the feature matrix with approximately maximum determinant (volume). Geometrically, this maximizes the volume of the parallelepiped spanned by the selected feature vectors, ensuring that sampled locations have maximal diversity in feature space.

The algorithm is based on D-optimal experimental design and is particularly effective when:

  • Features explain the main soil-forming factors

  • You want deterministic (non-random) point selection

  • You need optimal coverage with few samples

Feature Selection: Terrain features typically used include:

  • Elevation

  • Slope

  • Aspect

  • Topographic Wetness Index (TWI)

  • Closed depressions

  • Flow accumulation

Distance Constraint: The min_dist parameter prevents spatial clustering. Points closer than this distance will not be selected together. The value should be chosen based on:

  • Study area size

  • Terrain ruggedness (more rugged = smaller min_dist)

  • Typical size of soil mapping units

References

Petrovskaia, A., Ryzhakov, G., & Oseledets, I. (2021). Optimal soil sampling design based on the maxvol algorithm. Geoderma, 383, 114733. doi:10.1016/j.geoderma.2020.114733

Goreinov, S. A., Oseledets, I. V., Savostyanov, D. V., Tyrtyshnikov, E. E., & Zamarashkin, N. L. (2010). How to find a good submatrix. In Matrix Methods: Theory, Algorithms And Applications (pp. 247-256). World Scientific.

Examples

if (FALSE) { # \dontrun{
library(sf)

# Create a study area with feature attributes
poly <- st_polygon(list(rbind(c(0, 0), c(100, 0), c(100, 50), c(0, 50), c(0, 0))))
study_area <- st_sf(geometry = st_sfc(poly))

# In practice, you would load terrain features from a DEM
# For this example, we'll use stratification cells with computed features
strata <- ss_stratify(study_area, n_strata = 100, n_try = 3)

# Add terrain-like features to cells
cells_sf <- strata$cells
coords <- st_coordinates(cells_sf)
cells_sf$elevation <- coords[,2] + rnorm(nrow(coords), 0, 5)
cells_sf$slope <- abs(rnorm(nrow(coords), 5, 2))

# Select 20 sampling points using maxvol
samples <- ss_maxvol(
  cells_sf,
  n = 20,
  features = c("elevation", "slope"),
  min_dist = 5,
  normalize = TRUE,
  add_coords = TRUE
)

# Plot results
ss_plot_samples(samples)
} # }