CGAL 5.6 - Classification
CGAL::Classification::OpenCV::Random_forest_classifier Class Reference

#include <CGAL/Classification/OpenCV/Random_forest_classifier.h>

Definition

Classifier based on the OpenCV version of the random forest algorithm.

Note
This class requires the OpenCV library.
Is Model Of:
CGAL::Classification::Classifier

Constructor

 Random_forest_classifier (const Label_set &labels, const Feature_set &features, int max_depth=20, int min_sample_count=5, int max_categories=15, int max_number_of_trees_in_the_forest=100, float forest_accuracy=0.01f)
 instantiates the classifier using the sets of labels and features. More...
 

Parameters

void set_max_depth (int max_depth)
 
void set_min_sample_count (int min_sample_count)
 
void set_max_categories (int max_categories)
 
void set_max_number_of_trees_in_the_forest (int max_number_of_trees_in_the_forest)
 
void set_forest_accuracy (float forest_accuracy)
 

Training

template<typename LabelIndexRange >
void train (const LabelIndexRange &ground_truth)
 runs the training algorithm. More...
 

Input/Output

void save_configuration (const char *filename)
 saves the current configuration in the file named filename. More...
 
void load_configuration (const char *filename)
 loads a configuration from the file named filename. More...
 

Constructor & Destructor Documentation

◆ Random_forest_classifier()

CGAL::Classification::OpenCV::Random_forest_classifier::Random_forest_classifier ( const Label_set labels,
const Feature_set features,
int  max_depth = 20,
int  min_sample_count = 5,
int  max_categories = 15,
int  max_number_of_trees_in_the_forest = 100,
float  forest_accuracy = 0.01f 
)

instantiates the classifier using the sets of labels and features.

Parameters documentation is copy-pasted from the official documentation of OpenCV. For more details on this method, please refer to it.

Parameters
labelslabel set used.
featuresfeature set used.
max_depththe depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.
min_sample_countminimum samples required at a leaf node for it to be split. A reasonable value is a small percentage of the total data e.g. 1%.
max_categoriesCluster possible values of a categorical variable into \( K \leq max\_categories \) clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than max_categories values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into max_categories clusters that is some categories are merged together. The clustering is applied only in \( n>2-class \) classification problems for categorical variables with \( N > max\_categories \) possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
max_number_of_trees_in_the_forestThe maximum number of trees in the forest (surprise, surprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
forest_accuracySufficient accuracy (OOB error).

Member Function Documentation

◆ load_configuration()

void CGAL::Classification::OpenCV::Random_forest_classifier::load_configuration ( const char *  filename)

loads a configuration from the file named filename.

The input file should be in the XML format written by the save_configuration() method. The feature set of the classifier should contain the exact same features in the exact same order as the ones present when the file was generated using save_configuration().

◆ save_configuration()

void CGAL::Classification::OpenCV::Random_forest_classifier::save_configuration ( const char *  filename)

saves the current configuration in the file named filename.

This allows to easily save and recover a specific classification configuration.

The output file is written in an XML format that is readable by the load_configuration() method.

◆ train()

template<typename LabelIndexRange >
void CGAL::Classification::OpenCV::Random_forest_classifier::train ( const LabelIndexRange &  ground_truth)

runs the training algorithm.

From the set of provided ground truth, this algorithm estimates sets up the random trees that produce the most accurate result with respect to this ground truth.

Precondition
At least one ground truth item should be assigned to each label.
Parameters
ground_truthvector of label indices. It should contain for each input item, in the same order as the input set, the index of the corresponding label in the Label_set provided in the constructor. Input items that do not have a ground truth information should be given the value -1.