Authors: Simon Giraudot, Florent Lafarge

This component implements the algorithm described in [4] (section 2), generalized to handle different types of data, multiple features and multiple labels. It classifies a data set into a user-defined set of labels, such as ground, vegetation and buildings. A flexible API is provided so that users can classify any type of data which they can index and for which they can compute relevant features, compute their own local features on the input data set and define their own labels.

Package Organization

Classification of data sets is achieved as follows (see Figure Figure 83.1):

some analysis is performed on the input data set;
features are computed based on this analysis;
a set of labels (for example: ground, building, vegetation) is defined by the user;
a classifier is defined and trained: from the set of values taken by the features at an input item, it measures the likelihood of this item to belong to one label or another;
classification is computed itemwise using the classifier;
additional regularization can be used by smoothing either locally or globally through a graph cut [1] approach.

Figure 83.1 Organization of the package.

This package is designed to be easily extended by users: more specifically, features and labels can be defined by users to handle any data they need to classify.

Currently, CGAL provides data structures to handle classification of point sets, surface meshes and clusters.

Common Data Structures

Label Set

A label represents how an item should be classified, for example: vegetation, building, road, etc. In CGAL, a label has a name, an index (in a label set), a standard index (for example, the index of the label in the ASPRS standard) and a color. It is simply identified by a Label_handle. Note that names, standard indices and colors are not used for identification: two labels in the same set can have the same name, standard index and color but not the same handle.

If labels are initialized with their names only, standard indices and colors can be deduced in some cases (see Label_set::add()).

The following code snippet shows how to add labels to the classification object:

  Label_set labels;
  // Init name only
  Label_handle ground = labels.add ("ground");
  // Init name and color
  Label_handle vegetation = labels.add ("vegetation", CGAL::IO::Color(0,255,0));
  // Init name, Color and standard index (here, ASPRS building index)
  Label_handle roof = labels.add ("roof", CGAL::IO::Color (255, 0, 0), 6);

Feature Set

Features are defined as scalar fields that associate each input item with a specific value. Note that in order to limit memory consumption, we use the type float for these scalar values (as well as for every floating point value in this package). A feature has a name and is identified by a Feature_handle.

The computation of features and their addition to the feature set is done in a single step using the Feature_set::add<Feature>() method. If CGAL was linked with Intel TBB, features can be computed in parallel (see below).

CGAL provides some predefined features (see Point Set Classification for example). In the following code snippet, a subset of these predefined features are instantiated (in parallel if Intel TBB is available). Note that all the predefined features can also be automatically generated in multiple scales (see Point Set Classification for example).

  float radius_neighbors = 1.7f;
  float radius_dtm = 15.0f;
  std::cerr << "Computing features" << std::endl;
  Feature_set features;
  features.begin_parallel_additions(); // No effect in sequential mode
  Feature_handle distance_to_plane = features.add<Distance_to_plane> (pts, Pmap(), eigen);
  Feature_handle dispersion = features.add<Dispersion> (pts, Pmap(), grid,
                                                        radius_neighbors);
  Feature_handle elevation = features.add<Elevation> (pts, Pmap(), grid,
                                                      radius_dtm);
  features.end_parallel_additions(); // No effect in sequential mode

Users may want to define their own features, especially if the input data set comes with additional properties that were not anticipated by CGAL. A user-defined feature must inherit from Feature_base and provide a method value() that associates a scalar value to each input item.

The following example shows how to define a feature that discriminates points that lie inside a 2D box from the others:

// User-defined feature that identifies a specific area of the 3D
// space. This feature takes value 1 for points that lie inside the
// area and 0 for the others.
class My_feature : public CGAL::Classification::Feature_base
{
  const Point_range& range;
  double xmin, xmax, ymin, ymax;
public:
  My_feature (const Point_range& range,
              double xmin, double xmax, double ymin, double ymax)
    : range (range), xmin(xmin), xmax(xmax), ymin(ymin), ymax(ymax)
  {
    this->set_name ("my_feature");
  }
  float value (std::size_t pt_index)
  {
    if (xmin < range[pt_index].x() && range[pt_index].x() < xmax &&
        ymin < range[pt_index].y() && range[pt_index].y() < ymax)
      return 1.f;
    else
      return 0.f;
  }
};

This feature can then be instantiated from the feature set the same way as the others:

  std::cerr << "Computing features" << std::endl;
  Feature_set features;
  // Feature that identifies points whose x coordinate is between -20
  // and 20 and whose y coordinate is between -15 and 15
  Feature_handle my_feature = features.add<My_feature> (pts, -20., 20., -15., 15.);

Specialized Data Structures

Classification is based on the computation of local features. These features can take advantage of shared data structures that are precomputed and stored separately. Both these features and the underlying data structures depend on the type of data that needs to be classified. CGAL provides data structures to classify point sets, surface meshes and clusters.

Point Set Classification

CGAL provides the following structures:

Point_set_neighborhood stores spatial searching structures and provides adapted queries for points;
Local_eigen_analysis precomputes covariance matrices on local neighborhoods of points and stores the associated eigenvectors and eigenvalues;
Planimetric_grid is a 2D grid used for digital terrain modeling.

Most of these data structures depend on a scale parameter. CGAL provides a method to estimate the average spacing based on a number of neighbors (see CGAL::compute_average_spacing()), which usually provides satisfying results in the absence of noise. In the presence of noise, CGAL::estimate_global_range_scale() provides an estimation of the smallest scale such that the point set has the local dimension of a surface (this method is both robust to noise and outliers, see Result).

The eigen analysis can be used to estimate normals. Note however that this analysis (based on Principal Component Analysis) might not be robust to a high level of noise. CGAL also provides more robust normal estimation functions (see for example CGAL::jet_estimate_normals()).

The following code snippet shows how to instantiate such data structures from an input PLY point set (the full example is given at the end of the manual).

int main (int argc, char** argv)
{
  const std::string filename = (argc > 1) ? argv[1] : CGAL::data_file_path("meshes/b9.ply");
  std::cerr << "Reading input" << std::endl;
  std::vector<Point> pts;
  if (!(CGAL::IO::read_points(filename, std::back_inserter(pts),
                              // the PLY reader expects a binary file by default
                              CGAL::parameters::use_binary_mode(false))))
  {
    std::cerr << "Error: cannot read " << filename << std::endl;
    return EXIT_FAILURE;
  }
  float grid_resolution = 0.34f;
  unsigned int number_of_neighbors = 6;
  std::cerr << "Computing useful structures" << std::endl;
  Iso_cuboid_3 bbox = CGAL::bounding_box (pts.begin(), pts.end());
  Planimetric_grid grid (pts, Pmap(), bbox, grid_resolution);
  Neighborhood neighborhood (pts, Pmap());
  Local_eigen_analysis eigen
    = Local_eigen_analysis::create_from_point_set
    (pts, Pmap(), neighborhood.k_neighbor_query(number_of_neighbors));

CGAL provides some predefined features:

Distance_to_plane measures how far away a point is from a locally estimated plane;
Eigenvalue measures one of the three local eigenvalues;
Elevation computes the local distance to an estimation of the ground;
Height_above computes the distance between the local highest point and the point;
Height_below computes the distance between the point and the local lowest point;
Vertical_dispersion computes how noisy the point set is on a local Z-cylinder;
Vertical_range computes the distance between the local highest and lowest points;
Verticality compares the local normal vector to the vertical vector.

These features are designed for point sets but can easily be used with surface meshes as well (see Mesh Classification). For more details about how these different features can help to identify one label or the other, please refer to their associated reference manual pages.

In addition, if the input data set has additional properties, these can also be used as features. For example, CGAL provides the following features:

Color_channel uses input color information if available;
Echo_scatter uses the number of returns (echo) provided by most LIDAR scanners if available;
Simple_feature uses any property map applicable to the input range and whose value type is castable to float (useful if an additional property of the input set should be used as is, for example an intensity measurement).

Users commonly want to use all predefined features to get the best result possible. CGAL provides a class Point_set_feature_generator that performs the following operations:

it estimates the smallest relevant scale;
it generates all needed analysis structures and provides access to them;
it generates all possible features (among all the CGAL predefined ones) based on which property maps are available (they use colors if available, etc.).

Multiple scales that are sequentially larger can be used to increase the quality of the results [3].

Note that using this class in order to generate features is not mandatory, as features and data structures can all be handled by hand. It is mainly provided to make the specific case of point sets simpler to handle. Users can still add their own features within their feature set.

Some data structure instantiated by the generator will be used by feature: for this reason, the generator should be instantiated within the same scope as the feature set and should not be deleted before the feature set.

The following snippet shows how to use the point set feature generator:

  Feature_set features;
  std::size_t number_of_scales = 5;
  Feature_generator generator (pts, pts.point_map(), number_of_scales);
  features.begin_parallel_additions();
  generator.generate_point_based_features (features);
  features.end_parallel_additions();

Figure 83.2 Example of point set classification (left: input, right: output). Ground is grey, roofs are orange, vegetation is green.

Mesh Classification

Classification of mesh is performed by considering the face of a mesh as an atomic element that should be assign one label or another. Some structures such as neighborhood or Eigen analysis are significantly different from their equivalent for point sets; other data structures from point sets can be directly used by viewing the mesh as a point set through the use of a property map that associates each face with a representative point.

Hereafter, a mesh refers to a model of FaceListGraph.

CGAL provides the following structures:

Face_descriptor_to_center_of_mass_map is a property map that takes the face of a mesh as key type and returns the CGAL::Point_3 located at the center of mass of the face. It is useful to apply point set structures and features to surface meshes;
Face_descriptor_to_face_descriptor_with_bbox_map is a property map that takes the face of a mesh as key type and returns the same face along with its bounding box as a value type. It is useful to call CGAL::Classification::classify_with_graphcut() which requires objects that provide a bbox() method (see Classification Functions);
Mesh_neighborhood provides adapted queries for surface meshes;
Local_eigen_analysis precomputes covariance matrices on local neighborhoods of triangles and stores the associated eigenvectors and eigenvalues (it is the same object as for point sets called with a different named constructor);
Planimetric_grid is the 2D grid used for digital terrain modeling of point sets, it can be used with meshes through Face_descriptor_to_center_of_mass_map.

The point set features can be used for mesh classification as well:

Eigenvalue takes the Local_eigen_analysis object computed on faces as input;
the other ones are computed by considering the mesh as a point set.

Similarly to Point_set_feature_generator, CGAL provides a class Mesh_feature_generator that estimates the smallest scale automatically and computes all predefined features on several scales. As for point sets, using this class in order to generate features is not mandatory, as features and data structures can all be handled by hand. It is mainly provided to make the specific case of meshes simpler to handle. Users can still add their own features within their feature set.

The following snippet shows how to use the mesh feature generator:

  Feature_set features;
  Face_point_map face_point_map (&mesh); // Associates each face to its center of mass
  std::size_t number_of_scales = 5;
  Feature_generator generator (mesh, face_point_map, number_of_scales);
  features.begin_parallel_additions();
  generator.generate_point_based_features (features); // Features that consider the mesh as a point set
  generator.generate_face_based_features (features);  // Features computed directly on mesh faces
  features.end_parallel_additions();

The full example is given at the end of the manual.

Figure 83.3 Example of mesh classification (left: input, right: output). Ground is grey, roofs are orange, vegetation is green.

Cluster Classification

Classifying clusters of items instead of raw sets of items can have several advantages:

if the data set is very large, using clusters can drastically decrease the complexity and thus the need for computation time and memory;
clusters are more complex objects than raw items (isolated points or triangles of a mesh) and thus provide additional information (size of cluster, spatial consistency, etc.);
by construction, the output classification is less noisy (if all points of a facade are in the same cluster, then they are guaranteed to all be classified in the same label).

For example, when dealing with urban scenes that typically contain large planar sections, it may be more efficient to first detect planes (with CGAL::Shape_detection::Region_growing or CGAL::Shape_detection::Efficient_RANSAC for example) and then to classify each subset of points belonging to a specific plane as clusters.

CGAL provides some tools to classify clusters:

the class CGAL::Classification::Cluster can be used to represent a cluster of items (points, for example);
the function CGAL::Classification::create_clusters_from_indices() takes indices over items as input (for example, shape indices taken from shape detection) and generates one cluster per index.

The following snippet shows how to create classification clusters from a shape detection algorithm (Region Growing):

  std::cerr << "Detecting planes and creating clusters" << std::endl;
  t.start();
  const double search_sphere_radius  = 1.0;
  const double max_distance_to_plane = 1.0;
  const double max_accepted_angle    = 25.0;
  const std::size_t min_region_size  = 10;
  Neighbor_query neighbor_query (
    pts,
    search_sphere_radius,
    pts.point_map());
  Region_type region_type (
    pts,
    max_distance_to_plane, max_accepted_angle, min_region_size,
    pts.point_map(), pts.normal_map());
  Region_growing region_growing (
    pts, neighbor_query, region_type);
  std::vector<Cluster> clusters;
  region_growing.detect
    (boost::make_function_output_iterator
     ([&](const std::vector<std::size_t>& region) -> void {
        // Create a new cluster.
        Classification::Cluster<Point_set, Pmap> cluster (pts, pts.point_map());
        for (const std::size_t idx : region)
          cluster.insert(idx);
        clusters.push_back(cluster);
      }));
  t.stop();
  std::cerr << clusters.size() << " clusters created in "
            << t.time() << " second(s)" << std::endl;
  t.reset();

The class Local_eigen_analysis can also take point clusters as input:

  Local_eigen_analysis eigen = Local_eigen_analysis::create_from_point_clusters (clusters);

As clusters are based on simple items, users can compute cluster features based on statistics over the itemwise features:

Cluster_mean_of_feature computes, for a cluster, the mean value of an itemwise feature over the inliers of this cluster;
Cluster_variance_of_feature computes, for a cluster, the variance of an itemwise feature over the inliers of this cluster.

Some additional features are provided specifically for clusters:

Cluster_size uses the number of items in a cluster;
Cluster_vertical_extent computes the length of the smallest interval on the Z axis that contains all the items of a cluster;
Eigenvalue, similarly to point sets and meshes, can use the Local_eigen_analysis object computed on clusters.

The following snippet shows, from a pointwise feature set, how to generate the statistical features from a pointwise feature set (along with these latest cluster features):

  Feature_set features;
  // First, compute means of features.
  features.begin_parallel_additions();
  for (std::size_t i = 0; i < pointwise_features.size(); ++ i)
    features.add<Feature::Cluster_mean_of_feature> (clusters, pointwise_features[i]);
  features.end_parallel_additions();
  // Then, compute variances of features (and remaining cluster features).
  features.begin_parallel_additions();
  for (std::size_t i = 0; i < pointwise_features.size(); ++ i)
    features.add<Feature::Cluster_variance_of_feature> (clusters,
                                                        pointwise_features[i], // i^th feature
                                                        features[i]);          // mean of i^th feature
  features.add<Feature::Cluster_size> (clusters);
  features.add<Feature::Cluster_vertical_extent> (clusters);
  for (std::size_t i = 0; i < 3; ++ i)
    features.add<Feature::Eigenvalue> (clusters, eigen, (unsigned int)(i));
  features.end_parallel_additions();

The full example is given at the end of the manual.

Figure 83.4 Example of cluster classification mesh (left: input, middle: clusters computed from region growing, right: output). Ground is grey, roofs are orange, vegetation is green, points not assigned to a cluster are black.

Classifiers

Classification relies on a classifier: this classifier is an object that, from the set of values taken by the features at an input item, computes the probability that an input item belongs to one label or another. A model of the concept CGAL::Classification::Classifier must take the index of an input item and store the probability associated to each label in a vector. If a classifier returns the value 1 for a pair of label and input item, it means that this item belongs to this label with certainty; values close to 0 mean that this item is not likely to belong to this label.

CGAL provides three models for this concept, ETHZ::Random_forest_classifier, OpenCV::Random_forest_classifier, and Sum_of_weighted_features_classifier.

Note: Currently, ETHZ::Random_forest_classifier is the best classifier available in CGAL and we strongly advise users to use it.

To perform classification based on four classifiers, please refer to Classification Functions.

ETHZ Random Forest

CGAL provides ETHZ::Random_forest_classifier, a classifier based on the Random Forest Template Library developed by Stefan Walk at ETH Zurich [2] (the library is distributed under the MIT license and is included with the CGAL release, the user does not have to install anything more). This classifier uses a ground truth training set to construct several decision trees that are then used to assign a label to each input item.

This classifier cannot be set up by hand and requires a ground truth training set. The training algorithm is fast but usually requires a high number of inliers. The training algorithm uses more memory at runtime and the configuration files are larger than those produced by Sum_of_weighted_features_classifier, but the output quality is usually significantly better, especially in the cases where many labels are used (more than five).

An example shows how to use this classifier. For more details about the algorithm, please refer to README provided in the ETH Zurich's code archive.

Deprecated IO

The IO functions of this classifier were changed in CGAL 5.2. Configurations generated from previous versions are not valid anymore and should be converted first as shown in the following example:

File Classification/example_deprecated_conversion.cpp

#include <CGAL/Classification/ETHZ/Random_forest_classifier.h>
#include <cstdlib>
#include <fstream>
int main (int argc, char** argv)
{
  if (argc != 3)
    std::cerr << "Usage: " << argv[0] << " input.gz output.bin" << std::endl;
  else
  {
    std::ifstream ifile (argv[1], std::ios_base::binary);
    std::ofstream ofile (argv[2], std::ios_base::binary);
    CGAL::Classification::ETHZ::Random_forest_classifier::
      convert_deprecated_configuration_to_new_format(ifile, ofile);
  }
  return EXIT_SUCCESS;
}

OpenCV Random Forest

The second classifier is OpenCV::Random_forest_classifier. It uses the OpenCV library, more specifically the Random Trees package.

Note that this classifier usually produces results with a lower quality than ETHZ::Random_forest_classifier. It is provided for the sake of completeness and for testing purposes, but if you are not sure what to use, we advise using the ETHZ Random Forest instead.

An example shows how to use this classifier. For more details about the algorithm, please refer to the official documentation of OpenCV.

Sum of Weighted Features

This latest classifier defines the following attributes:

a weight applied to each feature;
an effect applied to each pair of feature and label.

For each label, the classifier computes an energy as a sum of features normalized with both their weight and the effect they have on this specific label.

The main advantage of this classifier is that it can be set up by hand. Nevertheless, it also embeds a training algorithm.

Weights and Effects

Each feature is assigned a weight that measures its strength with respect to the other features.

Each pair of feature and label is assigned an effect that can either be:

FAVORING: the label is favored by high values of the feature;
NEUTRAL: the label is not affected by the feature;
PENALIZING: the label is favored by low values of the feature.

For example, vegetation is expected to have a high distance to plane and have a color close to green (if colors are available); facades have a low distance to plane and a low verticality; etc.

Let \(x=(x_i)_{i=1..N}\) be a potential classification result with \(N\) the number of input items and \(x_i\) the class of the \(i^{th}\) item (for example: vegetation, ground, etc.). Let \(f_j(i)\) be the raw value of the \(j^{th}\) feature at the \(i^{th}\) item and \(w_j\) be the weight of this feature. We define the normalized value \(F_j(x_i) \in [0:1]\) of the \(j^{th}\) feature at the \(i^{th}\) item as follows:

\begin{eqnarray*} F_j(x_i) = & (1 - \min(\max(0,\frac{f_j(i)}{w_j}), 1)) & \mbox{if } f_j \mbox{ favors } x_i \\ & 0.5 & \mbox{if } f_j \mbox{ is neutral for } x_i \\ & \min(\max(0,\frac{f_j(i)}{w_j}), 1) & \mbox{if } f_j \mbox{ penalizes } x_i \end{eqnarray*}

The itemwise energy measures the coherence of the label \(x_i\) at the \(i^{th}\) item and is defined as:

\[ E_{di}(x_i) = \sum_{j = 1..N_f} F_j(x_i) \]

The following code snippet shows how to define the weights and effects of features and labels:

  std::cerr << "Setting weights" << std::endl;
  Classifier classifier (labels, features);
  classifier.set_weight (distance_to_plane, 6.75e-2f);
  classifier.set_weight (dispersion, 5.45e-1f);
  classifier.set_weight (elevation, 1.47e1f);
  std::cerr << "Setting effects" << std::endl;
  classifier.set_effect (ground, distance_to_plane, Classifier::NEUTRAL);
  classifier.set_effect (ground, dispersion, Classifier::NEUTRAL);
  classifier.set_effect (ground, elevation, Classifier::PENALIZING);
  classifier.set_effect (vegetation, distance_to_plane,  Classifier::FAVORING);
  classifier.set_effect (vegetation, dispersion, Classifier::FAVORING);
  classifier.set_effect (vegetation, elevation, Classifier::NEUTRAL);
  classifier.set_effect (roof, distance_to_plane,  Classifier::NEUTRAL);
  classifier.set_effect (roof, dispersion, Classifier::NEUTRAL);
  classifier.set_effect (roof, elevation, Classifier::FAVORING);

Training

Each feature has a specific weight and each pair of feature-label has a specific effect. This means that the number of parameters to set up can quickly explode: if 6 features are used to classify between 4 labels, 30 parameters have to be set up (6 weights + 6x4 feature-label relationships).

Though it is possible to set them up one by one, CGAL also provides a method train() that requires a small set of ground truth items provided by users. More specifically, users must provide, for each label they want to classify, a set of known inliers among the input data set (for example, selecting one roof, one tree and one section of the ground). The training algorithm works as follows:

for each feature, a range of weights is tested: the effect each feature has on each label is estimated. For a given weight, if a feature has the same effect on each label, it is non-relevant for classification. The range of weights such that the feature is relevant is estimated;
for each feature, uniformly picked weight values are tested and their effects are estimated;
each inlier provided by the user is classified using this set of weights and effects;
the mean intersection-over-union (see Evaluation) is used to evaluate the quality of this set of weights and effects;
the same mechanism is repeated until all features' ranges have been tested. Weights are only changed one by one, the other ones are kept to the values that gave the latest best score.

This usually converges to a satisfying solution (see Figure Figure 83.5). The number of trials is user defined, set to 300 by default. Using at least 10 times the number of features is advised (for example, at least 300 iterations if 30 features are used). If the solution is not satisfying, more inliers can be selected, for example, in a region that the user identifies as misclassified with the current configuration. The training algorithm keeps, as initialization, the best weights found at the previous round and carries on trying new weights by taking new inliers into account.

Figure 83.5 Example of evolution of the mean intersection-over-union. The purple curve is the score computed at the current iteration, green curve is the best score found so far.

Result

Figure Figure 83.6 shows an example of output on a defect-laden point set. The accuracy on this example is 0.97 with a mean intersection-over-union of 0.85 (see section Evaluation).

Figure 83.6 Example of classification on a point set with medium noise and outliers (left: input, right: output). Ground is orange, roofs are pink, vegetation is green. Outliers are classified with an additional label outlier in black.

Classification Functions

Classification is performed by minimizing an energy over the input data set that may include regularization. CGAL provides three different methods for classification, ranging from high speed / low quality to low speed / high quality:

On a point set of 3 millions of points, the first method takes about 4 seconds, the second about 40 seconds and the third about 2 minutes.

Figure 83.7 Top-Left: input point set. Top-Right: raw output classification represented by a set of colors (ground is orange, facades are blue, roofs are pink and vegetation is green). Bottom-Left: output classification using local smoothing. Bottom-Right: output classification using graphcut.

Mathematical details are provided hereafter.

Raw Classification

CGAL::Classification::classify(): this is the fastest method that provides acceptable but usually noisy results (see Figure Figure 83.7, top-right).

Let \(x=(x_i)_{i=1..N}\) be a potential classification result with \(N\) the number of input items and \(x_i\) the label of the \(i^{th}\) item (for example: vegetation, ground, etc.). The classification is performed by minimizing the following energy:

\[ E(x) = \sum_{i = 1..N} E_{di}(x_i) \]

This energy is a sum of itemwise energies provided by the classifier and involves no regularization.

The following snippet shows how to classify points based on a label set and a classifier. The result is stored in label_indices, following the same order as the input set and providing for each point the index (in the label set) of its assigned label.

  std::vector<int> label_indices (pts.size(), -1);
  CGAL::Real_timer t;
  t.start();
  Classification::classify<CGAL::Parallel_if_available_tag> (pts, labels, classifier, label_indices);
  t.stop();
  std::cerr << "Raw classification performed in " << t.time() << " second(s)" << std::endl;
  t.reset();

Local Regularization

CGAL::Classification::classify_with_local_smoothing(): this method is a tradeoff between quality and efficiency (see Figure Figure 83.7, bottom-left). The minimized energy is defined as follows:

\[ E(x) = \sum_{i = 1..N} E_{si}(x_i) \]

The energy \(E_{si}(x_i)\) is defined on a small local neighborhood \(Nb(i)\) of the \(i^{th}\) item (including itself):

\[ E_{si}(x_i) = \frac{\sum_{k \in Nb(i)} E_{di}(x_k)}{\left| Nb(i) \right|} \]

This allows to eliminate local noisy variations of assigned labels. Increasing the size of the neighborhood increases the noise reduction at the cost of higher computation times.

The following snippet shows how to classify points using local smoothing by providing a model of CGAL::Classification::NeighborQuery.

  t.start();
  Classification::classify_with_local_smoothing<CGAL::Parallel_if_available_tag>
    (pts, Pmap(), labels, classifier,
     neighborhood.sphere_neighbor_query(radius_neighbors),
     label_indices);
  t.stop();
  std::cerr << "Classification with local smoothing performed in " << t.time() << " second(s)" << std::endl;
  t.reset();

Global Regularization (Graph Cut)

CGAL::Classification::classify_with_graphcut(): this method offers the best quality but requires longer computation time (see Figure Figure 83.7, bottom-right). The total energy that is minimized is the sum of the partial data term \(E_{di}(x_i)\) and of a pairwise interaction energy defined by the standard Potts model [5] :

\[ E(x) = \sum_{i = 1..N} E_{di}(x_i) + \gamma \sum_{i \sim j} \mathbf{1}_{x_i \neq x_j} \]

where \(\gamma>0\) is the parameter of the Potts model that quantifies the strengh of the regularization, \(i \sim j\) represents the pairs of neighboring items and \(\mathbf{1}_{\{.\}}\) the characteristic function.

A graph cut based algorithm (alpha expansion) is used to quickly reach an approximate solution close to the global optimum of this energy.

This method allows to consistently segment the input data set in piecewise constant parts and to correct large wrongly classified clusters. Increasing \(\gamma\) produces more regular result with a constant computation time.

To speed up computations, the input domain can be subdivided into smaller subsets such that several smaller graph cuts are applied instead of a big one. The computation of these smaller graph cuts can be done in parallel. Increasing the number of subsets allows for faster computation times but can also reduce the quality of the results.

The following snippet shows how to classify points using a graph cut regularization providing a model of CGAL::Classification::NeighborQuery, a strengh parameter \(\gamma\) and a number of subdivisions.

  t.start();
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (pts, Pmap(), labels, classifier,
     neighborhood.k_neighbor_query(12),
     0.2f, 4, label_indices);
  t.stop();
  std::cerr << "Classification with graphcut performed in " << t.time() << " second(s)" << std::endl;

Evaluation

The class Evaluation allows users to evaluate the reliability of the classification with respect to a provided ground truth. The following measurements are available:

precision() computes, for one label, the ratio of true positives over the total number of detected positives;
recall() computes, for one label, the ratio of true positives over the total number of provided inliers of this label;
f1_score() is the harmonic mean of precision and recall;
intersection_over_union() computes the ratio of true positives over the union of the detected positives and of the provided inliers;
accuracy() computes the ratio of all true positives over the total number of provided inliers;
mean_f1_score();
mean_intersection_over_union().

All these values range from 0 (poor quality) to 1 (perfect quality).

Full Examples

Simple Point Set Classification

The following example:

reads an input file (LIDAR point set in PLY format);
computes useful structures from this input;
computes features from the input and the precomputed structures;
defines 3 labels (vegetation, ground and roof);
sets up the classification classifier Sum_of_weighted_features_classifier;
classifies the point set with the 3 different methods (this is for the sake of the example: each method overwrites the previous result, users should only call one of the methods);
saves the result in a colored PLY format.

File Classification/example_classification.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Classification.h>
#include <CGAL/bounding_box.h>
#include <CGAL/tags.h>
#include <CGAL/IO/read_points.h>
#include <CGAL/IO/write_ply_points.h>
#include <CGAL/Real_timer.h>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
typedef CGAL::Parallel_if_available_tag Concurrency_tag;
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3 Point;
typedef Kernel::Iso_cuboid_3 Iso_cuboid_3;
typedef std::vector<Point> Point_range;
typedef CGAL::Identity_property_map<Point> Pmap;
namespace Classification = CGAL::Classification;
typedef Classification::Sum_of_weighted_features_classifier Classifier;
typedef Classification::Planimetric_grid<Kernel, Point_range, Pmap>             Planimetric_grid;
typedef Classification::Point_set_neighborhood<Kernel, Point_range, Pmap>       Neighborhood;
typedef Classification::Local_eigen_analysis                                    Local_eigen_analysis;
typedef Classification::Label_handle                                            Label_handle;
typedef Classification::Feature_handle                                          Feature_handle;
typedef Classification::Label_set                                               Label_set;
typedef Classification::Feature_set                                             Feature_set;
typedef Classification::Feature::Distance_to_plane<Point_range, Pmap>           Distance_to_plane;
typedef Classification::Feature::Elevation<Kernel, Point_range, Pmap>           Elevation;
typedef Classification::Feature::Vertical_dispersion<Kernel, Point_range, Pmap> Dispersion;
int main (int argc, char** argv)
{
  const std::string filename = (argc > 1) ? argv[1] : CGAL::data_file_path("meshes/b9.ply");
  std::cerr << "Reading input" << std::endl;
  std::vector<Point> pts;
  if (!(CGAL::IO::read_points(filename, std::back_inserter(pts),
                              // the PLY reader expects a binary file by default
                              CGAL::parameters::use_binary_mode(false))))
  {
    std::cerr << "Error: cannot read " << filename << std::endl;
    return EXIT_FAILURE;
  }
  float grid_resolution = 0.34f;
  unsigned int number_of_neighbors = 6;
  std::cerr << "Computing useful structures" << std::endl;
  Iso_cuboid_3 bbox = CGAL::bounding_box (pts.begin(), pts.end());
  Planimetric_grid grid (pts, Pmap(), bbox, grid_resolution);
  Neighborhood neighborhood (pts, Pmap());
  Local_eigen_analysis eigen
    = Local_eigen_analysis::create_from_point_set
    (pts, Pmap(), neighborhood.k_neighbor_query(number_of_neighbors));
  float radius_neighbors = 1.7f;
  float radius_dtm = 15.0f;
  std::cerr << "Computing features" << std::endl;
  Feature_set features;
  features.begin_parallel_additions(); // No effect in sequential mode
  Feature_handle distance_to_plane = features.add<Distance_to_plane> (pts, Pmap(), eigen);
  Feature_handle dispersion = features.add<Dispersion> (pts, Pmap(), grid,
                                                        radius_neighbors);
  Feature_handle elevation = features.add<Elevation> (pts, Pmap(), grid,
                                                      radius_dtm);
  features.end_parallel_additions(); // No effect in sequential mode
  Label_set labels;
  // Init name only
  Label_handle ground = labels.add ("ground");
  // Init name and color
  Label_handle vegetation = labels.add ("vegetation", CGAL::IO::Color(0,255,0));
  // Init name, Color and standard index (here, ASPRS building index)
  Label_handle roof = labels.add ("roof", CGAL::IO::Color (255, 0, 0), 6);
  std::cerr << "Setting weights" << std::endl;
  Classifier classifier (labels, features);
  classifier.set_weight (distance_to_plane, 6.75e-2f);
  classifier.set_weight (dispersion, 5.45e-1f);
  classifier.set_weight (elevation, 1.47e1f);
  std::cerr << "Setting effects" << std::endl;
  classifier.set_effect (ground, distance_to_plane, Classifier::NEUTRAL);
  classifier.set_effect (ground, dispersion, Classifier::NEUTRAL);
  classifier.set_effect (ground, elevation, Classifier::PENALIZING);
  classifier.set_effect (vegetation, distance_to_plane,  Classifier::FAVORING);
  classifier.set_effect (vegetation, dispersion, Classifier::FAVORING);
  classifier.set_effect (vegetation, elevation, Classifier::NEUTRAL);
  classifier.set_effect (roof, distance_to_plane,  Classifier::NEUTRAL);
  classifier.set_effect (roof, dispersion, Classifier::NEUTRAL);
  classifier.set_effect (roof, elevation, Classifier::FAVORING);
  // Run classification
  std::cerr << "Classifying" << std::endl;
  std::vector<int> label_indices (pts.size(), -1);
  CGAL::Real_timer t;
  t.start();
  Classification::classify<CGAL::Parallel_if_available_tag> (pts, labels, classifier, label_indices);
  t.stop();
  std::cerr << "Raw classification performed in " << t.time() << " second(s)" << std::endl;
  t.reset();
  t.start();
  Classification::classify_with_local_smoothing<CGAL::Parallel_if_available_tag>
    (pts, Pmap(), labels, classifier,
     neighborhood.sphere_neighbor_query(radius_neighbors),
     label_indices);
  t.stop();
  std::cerr << "Classification with local smoothing performed in " << t.time() << " second(s)" << std::endl;
  t.reset();
  t.start();
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (pts, Pmap(), labels, classifier,
     neighborhood.k_neighbor_query(12),
     0.2f, 4, label_indices);
  t.stop();
  std::cerr << "Classification with graphcut performed in " << t.time() << " second(s)" << std::endl;
  // Save the output in a colored PLY format
  std::vector<unsigned char> red, green, blue;
  red.reserve(pts.size());
  green.reserve(pts.size());
  blue.reserve(pts.size());
  for (std::size_t i = 0; i < pts.size(); ++ i)
  {
    Label_handle label = labels[std::size_t(label_indices[i])];
    unsigned r = 0, g = 0, b = 0;
    if (label == ground)
    {
      r = 245; g = 180; b = 0;
    }
    else if (label == vegetation)
    {
      r = 0; g = 255; b = 27;
    }
    else if (label == roof)
    {
      r = 255; g = 0; b = 170;
    }
    red.push_back(r);
    green.push_back(g);
    blue.push_back(b);
  }
  std::ofstream f ("classification.ply");
  CGAL::IO::write_PLY_with_properties
    (f, CGAL::make_range (boost::counting_iterator<std::size_t>(0),
                          boost::counting_iterator<std::size_t>(pts.size())),
     CGAL::make_ply_point_writer (CGAL::make_property_map(pts)),
     std::make_pair(CGAL::make_property_map(red), CGAL::PLY_property<unsigned char>("red")),
     std::make_pair(CGAL::make_property_map(green), CGAL::PLY_property<unsigned char>("green")),
     std::make_pair(CGAL::make_property_map(blue), CGAL::PLY_property<unsigned char>("blue")));
  std::cerr << "All done" << std::endl;
  return EXIT_SUCCESS;
}

Defining a Custom Feature

The following example shows how to define a custom feature and how to integrate it in the CGAL classification framework.

File Classification/example_feature.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Classification.h>
#include <CGAL/IO/read_points.h>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3 Point;
typedef Kernel::Iso_cuboid_3 Iso_cuboid_3;
typedef std::vector<Point> Point_range;
typedef CGAL::Identity_property_map<Point> Pmap;
namespace Classification = CGAL::Classification;
typedef Classification::Sum_of_weighted_features_classifier                      Classifier;
typedef Classification::Point_set_neighborhood<Kernel, Point_range, Pmap>       Neighborhood;
typedef Classification::Local_eigen_analysis                                    Local_eigen_analysis;
typedef Classification::Label_handle                                            Label_handle;
typedef Classification::Feature_handle                                          Feature_handle;
typedef Classification::Label_set                                               Label_set;
typedef Classification::Feature_set                                             Feature_set;
typedef Classification::Feature::Verticality<Kernel>                            Verticality;
// User-defined feature that identifies a specific area of the 3D
// space. This feature takes value 1 for points that lie inside the
// area and 0 for the others.
class My_feature : public CGAL::Classification::Feature_base
{
  const Point_range& range;
  double xmin, xmax, ymin, ymax;
public:
  My_feature (const Point_range& range,
              double xmin, double xmax, double ymin, double ymax)
    : range (range), xmin(xmin), xmax(xmax), ymin(ymin), ymax(ymax)
  {
    this->set_name ("my_feature");
  }
  float value (std::size_t pt_index)
  {
    if (xmin < range[pt_index].x() && range[pt_index].x() < xmax &&
        ymin < range[pt_index].y() && range[pt_index].y() < ymax)
      return 1.f;
    else
      return 0.f;
  }
};
int main (int argc, char** argv)
{
  std::string filename (argc > 1 ? argv[1] : CGAL::data_file_path("meshes/b9.ply"));
  std::vector<Point> pts;
  std::cerr << "Reading input" << std::endl;
  if (!(CGAL::IO::read_points(filename, std::back_inserter(pts),
                              // the PLY reader expects a binary file by default
                              CGAL::parameters::use_binary_mode(false))))
  {
    std::cerr << "Error: cannot read " << filename << std::endl;
    return EXIT_FAILURE;
  }
  Neighborhood neighborhood (pts, Pmap());
  Local_eigen_analysis eigen
    = Local_eigen_analysis::create_from_point_set (pts, Pmap(), neighborhood.k_neighbor_query(6));
  Label_set labels;
  Label_handle a = labels.add ("label_A");
  Label_handle b = labels.add ("label_B");
  std::cerr << "Computing features" << std::endl;
  Feature_set features;
  // Feature that identifies points whose x coordinate is between -20
  // and 20 and whose y coordinate is between -15 and 15
  Feature_handle my_feature = features.add<My_feature> (pts, -20., 20., -15., 15.);
  Feature_handle verticality = features.add<Verticality> (pts, eigen);
  Classifier classifier (labels, features);
  std::cerr << "Setting weights" << std::endl;
  classifier.set_weight(verticality, 0.5);
  classifier.set_weight(my_feature, 0.25);
  std::cerr << "Setting up labels" << std::endl;
  classifier.set_effect (a, verticality, Classifier::FAVORING);
  classifier.set_effect (a, my_feature, Classifier::FAVORING);
  classifier.set_effect (b, verticality, Classifier::PENALIZING);
  classifier.set_effect (b, my_feature, Classifier::PENALIZING);
  std::cerr << "Classifying" << std::endl;
  std::vector<int> label_indices(pts.size(), -1);
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (pts, Pmap(), labels, classifier,
     neighborhood.k_neighbor_query(12),
     0.5, 1, label_indices);
  std::cerr << "All done" << std::endl;
  return EXIT_SUCCESS;
}

Feature Generation and Training

The following example:

reads a point set with a training set (embedded as a PLY feature label);
automatically generates features on 5 scales;
trains the classifier Sum_of_weighted_features_classifier using 800 trials;
runs the algorithm using the graphcut regularization;
prints some evaluation measurements;
saves the configuration of the classifier for further use.

File Classification/example_generation_and_training.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Classification.h>
#include <CGAL/Point_set_3.h>
#include <CGAL/Point_set_3/IO.h>
#include <CGAL/Real_timer.h>
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3 Point;
typedef CGAL::Point_set_3<Point> Point_set;
typedef Kernel::Iso_cuboid_3 Iso_cuboid_3;
typedef Point_set::Point_map Pmap;
typedef Point_set::Property_map<int> Imap;
namespace Classification = CGAL::Classification;
typedef Classification::Label_handle                                            Label_handle;
typedef Classification::Feature_handle                                          Feature_handle;
typedef Classification::Label_set                                               Label_set;
typedef Classification::Feature_set                                             Feature_set;
typedef Classification::Sum_of_weighted_features_classifier                     Classifier;
typedef Classification::Point_set_feature_generator<Kernel, Point_set, Pmap>    Feature_generator;
int main (int argc, char** argv)
{
  std::string filename (argc > 1 ? argv[1] : CGAL::data_file_path("points_3/b9_training.ply"));
  std::ifstream in (filename.c_str(), std::ios::binary);
  Point_set pts;
  std::cerr << "Reading input" << std::endl;
  in >> pts;
  Imap label_map;
  bool lm_found = false;
  std::tie (label_map, lm_found) = pts.property_map<int> ("label");
  if (!lm_found)
  {
    std::cerr << "Error: \"label\" property not found in input file." << std::endl;
    return EXIT_FAILURE;
  }
  std::cerr << "Generating features" << std::endl;
  CGAL::Real_timer t;
  t.start();
  Feature_set features;
  std::size_t number_of_scales = 5;
  Feature_generator generator (pts, pts.point_map(), number_of_scales);
  features.begin_parallel_additions();
  generator.generate_point_based_features (features);
  features.end_parallel_additions();
  t.stop();
  std::cerr << features.size() << " feature(s) generated in " << t.time() << " second(s)" << std::endl;
  Label_set labels = { "ground", "vegetation", "roof" };
  Classifier classifier (labels, features);
  std::cerr << "Training" << std::endl;
  t.reset();
  t.start();
  classifier.train<CGAL::Parallel_if_available_tag> (pts.range(label_map), 800);
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  t.reset();
  t.start();
  std::vector<int> label_indices(pts.size(), -1);
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (pts, pts.point_map(), labels, classifier,
     generator.neighborhood().k_neighbor_query(12),
     0.2f, 10, label_indices);
  t.stop();
  std::cerr << "Classification with graphcut done in " << t.time() << " second(s)" << std::endl;
  std::cerr << "Precision, recall, F1 scores and IoU:" << std::endl;
  Classification::Evaluation evaluation (labels, pts.range(label_map), label_indices);
  for (Label_handle l : labels)
  {
    std::cerr << " * " << l->name() << ": "
              << evaluation.precision(l) << " ; "
              << evaluation.recall(l) << " ; "
              << evaluation.f1_score(l) << " ; "
              << evaluation.intersection_over_union(l) << std::endl;
  }
  std::cerr << "Accuracy = " << evaluation.accuracy() << std::endl
            << "Mean F1 score = " << evaluation.mean_f1_score() << std::endl
            << "Mean IoU = " << evaluation.mean_intersection_over_union() << std::endl;
  std::ofstream fconfig ("config.xml");
  classifier.save_configuration (fconfig);
  fconfig.close();
  std::cerr << "All done" << std::endl;
  return EXIT_SUCCESS;
}

ETHZ Random Forest

The following example shows how to use the classifier ETHZ::Random_forest_classifier using an input training set.

File Classification/example_ethz_random_forest.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Classification.h>
#include <CGAL/Point_set_3.h>
#include <CGAL/Point_set_3/IO.h>
#include <CGAL/Real_timer.h>
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3 Point;
typedef CGAL::Point_set_3<Point> Point_set;
typedef Kernel::Iso_cuboid_3 Iso_cuboid_3;
typedef Point_set::Point_map Pmap;
typedef Point_set::Property_map<int> Imap;
typedef Point_set::Property_map<unsigned char> UCmap;
namespace Classification = CGAL::Classification;
typedef Classification::Label_handle                                            Label_handle;
typedef Classification::Feature_handle                                          Feature_handle;
typedef Classification::Label_set                                               Label_set;
typedef Classification::Feature_set                                             Feature_set;
typedef Classification::Point_set_feature_generator<Kernel, Point_set, Pmap>    Feature_generator;
int main (int argc, char** argv)
{
  std::string filename = CGAL::data_file_path("points_3/b9_training.ply");
  if (argc > 1)
    filename = argv[1];
  std::ifstream in (filename.c_str(), std::ios::binary);
  Point_set pts;
  std::cerr << "Reading input" << std::endl;
  in >> pts;
  Imap label_map;
  bool lm_found = false;
  std::tie (label_map, lm_found) = pts.property_map<int> ("label");
  if (!lm_found)
  {
    std::cerr << "Error: \"label\" property not found in input file." << std::endl;
    return EXIT_FAILURE;
  }
  Feature_set features;
  std::cerr << "Generating features" << std::endl;
  CGAL::Real_timer t;
  t.start();
  Feature_generator generator (pts, pts.point_map(),
                               5);  // using 5 scales
  features.begin_parallel_additions();
  generator.generate_point_based_features (features);
  features.end_parallel_additions();
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  // Add labels
  Label_set labels;
  Label_handle ground = labels.add ("ground");
  Label_handle vegetation = labels.add ("vegetation");
  Label_handle roof = labels.add ("roof");
  // Check if ground truth is valid for this label set
  if (!labels.is_valid_ground_truth (pts.range(label_map), true))
    return EXIT_FAILURE;
  std::vector<int> label_indices(pts.size(), -1);
  std::cerr << "Using ETHZ Random Forest Classifier" << std::endl;
  Classification::ETHZ::Random_forest_classifier classifier (labels, features);
  std::cerr << "Training" << std::endl;
  t.reset();
  t.start();
  classifier.train (pts.range(label_map));
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  t.reset();
  t.start();
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (pts, pts.point_map(), labels, classifier,
     generator.neighborhood().k_neighbor_query(12),
     0.2f, 1, label_indices);
  t.stop();
  std::cerr << "Classification with graphcut done in " << t.time() << " second(s)" << std::endl;
  std::cerr << "Precision, recall, F1 scores and IoU:" << std::endl;
  Classification::Evaluation evaluation (labels, pts.range(label_map), label_indices);
  for (Label_handle l : labels)
  {
    std::cerr << " * " << l->name() << ": "
              << evaluation.precision(l) << " ; "
              << evaluation.recall(l) << " ; "
              << evaluation.f1_score(l) << " ; "
              << evaluation.intersection_over_union(l) << std::endl;
  }
  std::cerr << "Accuracy = " << evaluation.accuracy() << std::endl
            << "Mean F1 score = " << evaluation.mean_f1_score() << std::endl
            << "Mean IoU = " << evaluation.mean_intersection_over_union() << std::endl;
  // Color point set according to class
  UCmap red = pts.add_property_map<unsigned char>("red", 0).first;
  UCmap green = pts.add_property_map<unsigned char>("green", 0).first;
  UCmap blue = pts.add_property_map<unsigned char>("blue", 0).first;
  for (std::size_t i = 0; i < label_indices.size(); ++ i)
  {
    label_map[i] = label_indices[i]; // update label map with computed classification
    Label_handle label = labels[label_indices[i]];
    const CGAL::IO::Color& color = label->color();
    red[i] = color.red();
    green[i] = color.green();
    blue[i] = color.blue();
  }
  // Save configuration for later use
  std::ofstream fconfig ("ethz_random_forest.bin", std::ios_base::binary);
  classifier.save_configuration(fconfig);
  // Write result
  std::ofstream f ("classification.ply");
  f.precision(18);
  f << pts;
  std::cerr << "All done" << std::endl;
  return EXIT_SUCCESS;
}

OpenCV Random Forest

The following example shows how to use the classifier OpenCV::Random_forest_classifier using an input training set.

File Classification/example_opencv_random_forest.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Classification.h>
#include <CGAL/Point_set_3.h>
#include <CGAL/Point_set_3/IO.h>
#include <CGAL/Real_timer.h>
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3 Point;
typedef CGAL::Point_set_3<Point> Point_set;
typedef Kernel::Iso_cuboid_3 Iso_cuboid_3;
typedef Point_set::Point_map Pmap;
typedef Point_set::Property_map<int> Imap;
typedef Point_set::Property_map<unsigned char> UCmap;
namespace Classification = CGAL::Classification;
typedef Classification::Label_handle                                            Label_handle;
typedef Classification::Feature_handle                                          Feature_handle;
typedef Classification::Label_set                                               Label_set;
typedef Classification::Feature_set                                             Feature_set;
typedef Classification::Point_set_feature_generator<Kernel, Point_set, Pmap>    Feature_generator;
int main (int argc, char** argv)
{
  std::string filename = (argc > 1) ? argv[1] : CGAL::data_file_path("points_3/b9_training.ply");
  std::cerr << "Reading input" << std::endl;
  std::ifstream in (filename.c_str(), std::ios::binary);
  Point_set pts;
  in >> pts;
  Imap label_map;
  bool lm_found = false;
  std::tie (label_map, lm_found) = pts.property_map<int> ("label");
  if (!lm_found)
  {
    std::cerr << "Error: \"label\" property not found in input file." << std::endl;
    return EXIT_FAILURE;
  }
  Feature_set features;
  std::cerr << "Generating features" << std::endl;
  CGAL::Real_timer t;
  t.start();
  Feature_generator generator (pts, pts.point_map(),
                               5);  // using 5 scales
  features.begin_parallel_additions();
  generator.generate_point_based_features (features);
  features.end_parallel_additions();
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  // Add labels
  Label_set labels;
  Label_handle ground = labels.add ("ground");
  Label_handle vegetation = labels.add ("vegetation");
  Label_handle roof = labels.add ("roof");
  std::vector<int> label_indices(pts.size(), -1);
  std::cerr << "Using OpenCV Random Forest Classifier" << std::endl;
  Classification::OpenCV::Random_forest_classifier classifier (labels, features);
  std::cerr << "Training" << std::endl;
  t.reset();
  t.start();
  classifier.train (pts.range(label_map));
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  t.reset();
  t.start();
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (pts, pts.point_map(), labels, classifier,
     generator.neighborhood().k_neighbor_query(12),
     0.2f, 1, label_indices);
  t.stop();
  std::cerr << "Classification with graphcut done in " << t.time() << " second(s)" << std::endl;
  std::cerr << "Precision, recall, F1 scores and IoU:" << std::endl;
  Classification::Evaluation evaluation (labels, pts.range(label_map), label_indices);
  for (Label_handle l : labels)
  {
    std::cerr << " * " << l->name() << ": "
              << evaluation.precision(l) << " ; "
              << evaluation.recall(l) << " ; "
              << evaluation.f1_score(l) << " ; "
              << evaluation.intersection_over_union(l) << std::endl;
  }
  std::cerr << "Accuracy = " << evaluation.accuracy() << std::endl
            << "Mean F1 score = " << evaluation.mean_f1_score() << std::endl
            << "Mean IoU = " << evaluation.mean_intersection_over_union() << std::endl;
  // Color point set according to class
  UCmap red = pts.add_property_map<unsigned char>("red", 0).first;
  UCmap green = pts.add_property_map<unsigned char>("green", 0).first;
  UCmap blue = pts.add_property_map<unsigned char>("blue", 0).first;
  for (std::size_t i = 0; i < label_indices.size(); ++ i)
  {
    label_map[i] = label_indices[i]; // update label map with computed classification
    Label_handle label = labels[label_indices[i]];
    const CGAL::IO::Color& color = label->color();
    red[i] = color.red();
    green[i] = color.green();
    blue[i] = color.blue();
  }
  // Write result
  std::ofstream f ("classification.ply");
  f.precision(18);
  f << pts;
  std::cerr << "All done" << std::endl;
  return EXIT_SUCCESS;
}

Mesh Classification

The following example:

reads a mesh in OFF format;
automatically generates features on 5 scales;
loads a configuration file for classifier ETHZ::Random_forest_classifier;
runs the algorithm using the graphcut regularization.

File Classification/example_mesh_classification.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Surface_mesh.h>
#include <CGAL/Classification.h>
#include <CGAL/Real_timer.h>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3 Point;
typedef CGAL::Surface_mesh<Point> Mesh;
namespace Classification = CGAL::Classification;
typedef Classification::Label_handle                                            Label_handle;
typedef Classification::Feature_handle                                          Feature_handle;
typedef Classification::Label_set                                               Label_set;
typedef Classification::Feature_set                                             Feature_set;
typedef Classification::Face_descriptor_to_center_of_mass_map<Mesh>             Face_point_map;
typedef Classification::Face_descriptor_to_face_descriptor_with_bbox_map<Mesh>  Face_with_bbox_map;
typedef Classification::Mesh_feature_generator<Kernel, Mesh, Face_point_map>    Feature_generator;
int main (int argc, char** argv)
{
  std::string filename = CGAL::data_file_path("meshes/b9_mesh.off");
  std::string filename_config = "data/b9_mesh_config.bin";
  if (argc > 1)
    filename = argv[1];
  if (argc > 2)
    filename_config = argv[2];
  Mesh mesh;
  if(!CGAL::IO::read_polygon_mesh(filename, mesh,
                                  // the PLY reader expects a binary file by default
                                  CGAL::parameters::use_binary_mode(false)))
  {
    std::cerr << "Invalid input." << std::endl;
    return 1;
  }
  std::cerr << "Generating features" << std::endl;
  CGAL::Real_timer t;
  t.start();
  Feature_set features;
  Face_point_map face_point_map (&mesh); // Associates each face to its center of mass
  std::size_t number_of_scales = 5;
  Feature_generator generator (mesh, face_point_map, number_of_scales);
  features.begin_parallel_additions();
  generator.generate_point_based_features (features); // Features that consider the mesh as a point set
  generator.generate_face_based_features (features);  // Features computed directly on mesh faces
  features.end_parallel_additions();
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  Label_set labels = { "ground", "vegetation", "roof" };
  std::vector<int> label_indices(mesh.number_of_faces(), -1);
  std::cerr << "Using ETHZ Random Forest Classifier" << std::endl;
  Classification::ETHZ::Random_forest_classifier classifier (labels, features);
  std::cerr << "Loading configuration" << std::endl;
  std::ifstream in_config (filename_config, std::ios_base::in | std::ios_base::binary);
  classifier.load_configuration (in_config);
  std::cerr << "Classifying with graphcut" << std::endl;
  t.reset();
  t.start();
  Classification::classify_with_graphcut<CGAL::Parallel_if_available_tag>
    (mesh.faces(), Face_with_bbox_map(&mesh), labels, classifier,
     generator.neighborhood().n_ring_neighbor_query(2),
     0.2f, 1, label_indices);
  t.stop();
  std::cerr << "Classification with graphcut done in " << t.time() << " second(s)" << std::endl;
  return EXIT_SUCCESS;
}

Cluster Classification

The following example:

reads a point set in PLY format;
estimates the normal vectors of the point set;
automatically generates pointwise features on 5 scales;
detects plane using the algorithm CGAL::Shape_detection::Region_growing;
creates Cluster objects from these detected planes;
computes cluster features from the pointwise features;
loads a configuration file for classifier ETHZ::Random_forest_classifier;
runs the algorithm using the raw algorithm.

File Classification/example_cluster_classification.cpp

#if defined (_MSC_VER) && !defined (_WIN64)
#pragma warning(disable:4244) // boost::number_distance::distance()
                              // converts 64 to 32 bits integers
#endif
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
#include <boost/iterator/function_output_iterator.hpp>
#include <CGAL/Simple_cartesian.h>
#include <CGAL/Classification.h>
#include <CGAL/Point_set_3.h>
#include <CGAL/Point_set_3/IO.h>
#include <CGAL/jet_estimate_normals.h>
#include <CGAL/Shape_detection/Region_growing.h>
#include <CGAL/Real_timer.h>
typedef CGAL::Simple_cartesian<double> Kernel;
typedef Kernel::Point_3                Point;
typedef Kernel::Iso_cuboid_3           Iso_cuboid_3;
typedef CGAL::Point_set_3<Point> Point_set;
typedef Point_set::Point_map                   Pmap;
typedef Point_set::Vector_map                  Vmap;
typedef Point_set::Property_map<int>           Imap;
typedef Point_set::Property_map<unsigned char> UCmap;
typedef CGAL::Shape_detection::Point_set::Sphere_neighbor_query<Kernel, Point_set, Pmap>                Neighbor_query;
typedef CGAL::Shape_detection::Point_set::Least_squares_plane_fit_region<Kernel, Point_set, Pmap, Vmap> Region_type;
typedef CGAL::Shape_detection::Region_growing<Point_set, Neighbor_query, Region_type>                   Region_growing;
namespace Classification = CGAL::Classification;
namespace Feature = CGAL::Classification::Feature;
typedef Classification::Label_handle   Label_handle;
typedef Classification::Feature_handle Feature_handle;
typedef Classification::Label_set      Label_set;
typedef Classification::Feature_set    Feature_set;
typedef Classification::Local_eigen_analysis                                 Local_eigen_analysis;
typedef Classification::Point_set_feature_generator<Kernel, Point_set, Pmap> Feature_generator;
typedef Classification::Cluster<Point_set, Pmap>                             Cluster;
int main (int argc, char** argv)
{
  std::string filename        = CGAL::data_file_path("meshes/b9.ply");
  std::string filename_config = "data/b9_clusters_config.bin";
  if (argc > 1)
    filename = argv[1];
  if (argc > 2)
    filename_config = argv[2];
  std::cerr << "Reading input" << std::endl;
  Point_set pts;
  if(!(CGAL::IO::read_point_set(filename, pts,
                                // the PLY reader expects a binary file by default
                                CGAL::parameters::use_binary_mode(true))))
  {
    std::cerr << "Error: cannot read " << filename << std::endl;
    return EXIT_FAILURE;
  }
  std::cerr << "Estimating normals" << std::endl;
  CGAL::Real_timer t;
  t.start();
  pts.add_normal_map();
  CGAL::jet_estimate_normals<CGAL::Parallel_if_available_tag> (pts, 12);
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  t.reset();
  Feature_set pointwise_features;
  std::cerr << "Generating pointwise features" << std::endl;
  t.start();
  Feature_generator generator (pts, pts.point_map(), 5); // using 5 scales
#ifdef CGAL_LINKED_WITH_TBB
  pointwise_features.begin_parallel_additions();
#endif
  generator.generate_point_based_features (pointwise_features);
  // Generator should only be used with variables defined at the scope
  // of the generator object, thus we instantiate the normal map
  // outside of the function
  Vmap normal_map = pts.normal_map();
  generator.generate_normal_based_features (pointwise_features, normal_map);
#ifdef CGAL_LINKED_WITH_TBB
  pointwise_features.end_parallel_additions();
#endif
  t.stop();
  std::cerr << "Done in " << t.time() << " second(s)" << std::endl;
  std::cerr << "Detecting planes and creating clusters" << std::endl;
  t.start();
  const double search_sphere_radius  = 1.0;
  const double max_distance_to_plane = 1.0;
  const double max_accepted_angle    = 25.0;
  const std::size_t min_region_size  = 10;
  Neighbor_query neighbor_query (
    pts,
    search_sphere_radius,
    pts.point_map());
  Region_type region_type (
    pts,
    max_distance_to_plane, max_accepted_angle, min_region_size,
    pts.point_map(), pts.normal_map());
  Region_growing region_growing (
    pts, neighbor_query, region_type);
  std::vector<Cluster> clusters;
  region_growing.detect
    (boost::make_function_output_iterator
     ([&](const std::vector<std::size_t>& region) -> void {
        // Create a new cluster.
        Classification::Cluster<Point_set, Pmap> cluster (pts, pts.point_map());
        for (const std::size_t idx : region)
          cluster.insert(idx);
        clusters.push_back(cluster);
      }));
  t.stop();
  std::cerr << clusters.size() << " clusters created in "
            << t.time() << " second(s)" << std::endl;
  t.reset();
  std::cerr << "Computing cluster features" << std::endl;
  Local_eigen_analysis eigen = Local_eigen_analysis::create_from_point_clusters (clusters);
  t.start();
  Feature_set features;
  // First, compute means of features.
  features.begin_parallel_additions();
  for (std::size_t i = 0; i < pointwise_features.size(); ++ i)
    features.add<Feature::Cluster_mean_of_feature> (clusters, pointwise_features[i]);
  features.end_parallel_additions();
  // Then, compute variances of features (and remaining cluster features).
  features.begin_parallel_additions();
  for (std::size_t i = 0; i < pointwise_features.size(); ++ i)
    features.add<Feature::Cluster_variance_of_feature> (clusters,
                                                        pointwise_features[i], // i^th feature
                                                        features[i]);          // mean of i^th feature
  features.add<Feature::Cluster_size> (clusters);
  features.add<Feature::Cluster_vertical_extent> (clusters);
  for (std::size_t i = 0; i < 3; ++ i)
    features.add<Feature::Eigenvalue> (clusters, eigen, (unsigned int)(i));
  features.end_parallel_additions();
  t.stop();
  Label_set labels = { "ground", "vegetation", "roof" };
  std::vector<int> label_indices(clusters.size(), -1);
  std::cerr << "Using ETHZ Random Forest Classifier" << std::endl;
  Classification::ETHZ::Random_forest_classifier classifier (labels, features);
  std::cerr << "Loading configuration" << std::endl;
  std::ifstream in_config (filename_config, std::ios_base::in | std::ios_base::binary);
  classifier.load_configuration (in_config);
  std::cerr << "Classifying" << std::endl;
  t.reset();
  t.start();
  Classification::classify<CGAL::Parallel_if_available_tag> (clusters, labels, classifier, label_indices);
  t.stop();
  std::cerr << "Classification done in " << t.time() << " second(s)" << std::endl;
  return EXIT_SUCCESS;
}

History

This package is based on a research code by Florent Lafarge that was generalized, extended and packaged by Simon Giraudot in CGAL 4.12. Classification of surface meshes and of clusters were introduced in CGAL 4.13. The Neural Network classifier was introduced in CGAL 4.14.