Generates plots for BenchmarkAggr, all assume that there are multiple, independent, tasks. Choices depending on the argument type:

  • "mean" (default): Assumes there are at least two independent tasks. Plots the sample mean of the measure for all learners with error bars computed with the standard error of the mean.

  • "box": Boxplots for each learner calculated over all tasks for a given measure.

  • "fn": Plots post-hoc Friedman-Nemenyi by first calling BenchmarkAggr$friedman_posthoc and plotting significant pairs in coloured squares and leaving non-significant pairs blank, useful for simply visualising pair-wise comparisons.

  • "cd": Critical difference plots (Demsar, 2006). Learners are drawn on the x-axis according to their average rank with the best performing on the left and decreasing performance going right. Any learners not connected by a horizontal bar are significantly different in performance. Critical differences are calculated as: $$CD = q_{\alpha} \sqrt{\left(\frac{k(k+1)}{6N}\right)}$$ Where \(q_\alpha\) is based on the studentized range statistic. See references for further details. It's recommended to crop white space using external tools, or function image_trim() from package magick.

# S3 method for BenchmarkAggr
autoplot(
  obj,
  type = c("mean", "box", "fn", "cd"),
  meas = NULL,
  level = 0.95,
  p.value = 0.05,
  minimize = TRUE,
  test = "nem",
  baseline = NULL,
  style = 1L,
  ratio = 1/7,
  col = "red",
  ...
)

Arguments

obj

BenchmarkAggr

type

(character(1))
Type of plot, see description.

meas

(character(1))
Measure to plot, should be in obj$measures, can be NULL if only one measure is in obj.

level

(numeric(1))
Confidence level for error bars for type = "mean"

p.value

(numeric(1))
What value should be considered significant for type = "cd" and type = "fn".

minimize

(logical(1))
For type = "cd", indicates if the measure is optimally minimized. Default is TRUE.

test

(character(1)))
For type = "cd", critical differences are either computed between all learners (test = "nemenyi"), or to a baseline (test = "bd"). Bonferroni-Dunn usually yields higher power than Nemenyi as it only compares algorithms to one baseline. Default is "nemenyi".

baseline

(character(1))
For type = "cd" and test = "bd" a baseline learner to compare the other learners to, should be in $learners, if NULL then differences are compared to the best performing learner.

style

(integer(1))
For type = "cd" two ggplot styles are shipped with the package (style = 1 or style = 2), otherwise the data can be accessed via the returned ggplot.

ratio

(numeric(1))
For type = "cd" and style = 1, passed to ggplot2::coord_fixed(), useful for quickly specifying the aspect ratio of the plot, best used with ggsave().

col

(character(1))
For type = "fn", specifies color to fill significant tiles, default is "red".

...

ANY
Additional arguments, currently unused.

References

Demšar J (2006). “Statistical Comparisons of Classifiers over Multiple Data Sets.” Journal of Machine Learning Research, 7(1), 1-30. https://jmlr.org/papers/v7/demsar06a.html.

Examples

if (requireNamespaces(c("mlr3learners", "mlr3", "rpart", "xgboost"))) { library(mlr3) library(mlr3learners) library(ggplot2) set.seed(1) task = tsks(c("iris", "sonar", "wine", "zoo")) learns = lrns(c("classif.featureless", "classif.rpart", "classif.xgboost")) bm = benchmark(benchmark_grid(task, learns, rsmp("cv", folds = 3))) obj = as.BenchmarkAggr(bm) # mean and error bars autoplot(obj, type = "mean", level = 0.95) if (requireNamespace("PMCMR", quietly = TRUE)) { # critical differences autoplot(obj, type = "cd",style = 1) autoplot(obj, type = "cd",style = 2) # post-hoc friedman-nemenyi autoplot(obj, type = "fn") } }
#> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.