Generates plots for BenchmarkAggr, all assume that there are multiple, independent, tasks.
Choices depending on the argument type
:
"mean"
(default): Assumes there are at least two independent tasks. Plots the sample mean
of the measure for all learners with error bars computed with the standard error of the mean.
"box"
: Boxplots for each learner calculated over all tasks for a given measure.
"fn"
: Plots post-hoc Friedman-Nemenyi by first calling BenchmarkAggr$friedman_posthoc
and plotting significant pairs in coloured squares and leaving non-significant pairs blank,
useful for simply visualising pair-wise comparisons.
"cd"
: Critical difference plots (Demsar, 2006). Learners are drawn on the x-axis according
to their average rank with the best performing on the left and decreasing performance going
right. Any learners not connected by a horizontal bar are significantly different in performance.
Critical differences are calculated as:
$$CD = q_{\alpha} \sqrt{\left(\frac{k(k+1)}{6N}\right)}$$
Where \(q_\alpha\) is based on the studentized range statistic.
See references for further details.
It's recommended to crop white space using external tools, or function image_trim()
from package magick.
# S3 method for BenchmarkAggr autoplot( obj, type = c("mean", "box", "fn", "cd"), meas = NULL, level = 0.95, p.value = 0.05, minimize = TRUE, test = "nem", baseline = NULL, style = 1L, ratio = 1/7, col = "red", ... )
obj | |
---|---|
type |
|
meas |
|
level |
|
p.value |
|
minimize |
|
test | ( |
baseline |
|
style |
|
ratio | ( |
col | ( |
... |
|
Demšar J (2006). “Statistical Comparisons of Classifiers over Multiple Data Sets.” Journal of Machine Learning Research, 7(1), 1-30. https://jmlr.org/papers/v7/demsar06a.html.
if (requireNamespaces(c("mlr3learners", "mlr3", "rpart", "xgboost"))) { library(mlr3) library(mlr3learners) library(ggplot2) set.seed(1) task = tsks(c("iris", "sonar", "wine", "zoo")) learns = lrns(c("classif.featureless", "classif.rpart", "classif.xgboost")) bm = benchmark(benchmark_grid(task, learns, rsmp("cv", folds = 3))) obj = as.BenchmarkAggr(bm) # mean and error bars autoplot(obj, type = "mean", level = 0.95) if (requireNamespace("PMCMR", quietly = TRUE)) { # critical differences autoplot(obj, type = "cd",style = 1) autoplot(obj, type = "cd",style = 2) # post-hoc friedman-nemenyi autoplot(obj, type = "fn") } }#> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. #> [04:11:15] WARNING: amalgamation/../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.