Chapter 68
Principal Component Analysis

Pierre Alliez, Sylvain Pion and Ankit Gupta

Table of Contents

68.1 Definitions
68.2 Examples
   68.2.1   Bounding Box of 2D and 3D Point Sets
   68.2.2   Centroid of 2D and 3D Objects
   68.2.3   Barycenter of a Set of Weighted Points
   68.2.4   Least Squares Fitting of a Line to a 2D Point Set
   68.2.5   Least Squares Fitting of a Line and a Plane to a 3D Triangle Set

This package provides functions to analyze sets of objects in 2D and 3D. It provides the computation of axis-aligned bounding boxes, centers of mass and principal component analysis for all bounded objects, as well as barycenters for sets of weighted points. Note that unlike most of the Cgal packages, this package uses approximation methods (in particular for linear least squares fitting) and is not intended to provide an exact canonical result in any sense.

68.1   Definitions

A bounding box for a set of objects is a cuboid that contains the set. An axis-aligned bounding box captures the maximum extents of all objects from the set within their coordinate system, i.e., a bounding box aligned with the axes of the coordinate system. Axis-aligned bounding boxes are frequently used in geometric algorithms as an indication of the general location of a data set, for either display, first-approximation spatial query, or spatial indexing purposes.

A centroid of a set of objects is their center of mass, i.e., the point whose coordinates are computed by means of coordinates of all points composing the objects. Note that although the general definition of center of mass incorporates a density function (and hence weighted means), the current implementation assumes a uniform density (see barycenter below defined for weighted points). For a point set {X1,X2,...,XN} the centroid X is computed as

X =
1

N
N
i=1
Xi
.
For a set of segments {S1,S2,...,SN} the centroid X is computed as
X =
1

N
i=1
|Si|
N
i=1
|Si| S_i
,
where |Si| stands for the length of a segment and S_i stands for its uniform barycenter (midpoint). For a set of triangles {T1,T2,...,TN} the centroid X is computed as
X =
1

N
i=1
|Ti|
N
i=1
|Ti| T_i
,
where |Ti| stands for the area of a triangle and T_i stands for its uniform barycenter. Such definition still holds for more general objects where the Lebesgue measure (length in 1D, area in 2D, volume in 3D) is used for weighting the object barycenters. Centers of mass are used to summarize data sets for either approximation, spatial query or spatial indexing purposes.

A barycenter of a set of weighted points is the point whose coordinates are computed by means of weighted coordinates of all weighted points from the set. When all weights are equal the barycenter coincides with the centroid.

Given a set of objects, linear least squares fitting amounts to finding the linear sub-space which minimizes the sum of squared distances from all points composing the objects of the set, to their projection onto this linear sub-space. Such linear sub-space is obtained by so-called principal component analysis (PCA). PCA is defined as a transformation that transforms the objects to a new coordinate system such that the greatest variance by orthogonal projection of the objects comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PCA is often used in geometric applications to reduce data sets to lower dimensions for analysis or approximation. Figure below illustrates (least squares) fitting of a line to a 2D point set, fitting of a line and a plane to a 3D point set and fitting of a plane to a set of 3D triangles.

Figure 68.1:  Left: fitting a line to a 2D point set (centroid dotted in red). Middle: fitting a line and a plane to a 3D point set. Right: fitting a plane to a set of 3D triangles.

68.2   Examples

68.2.1   Bounding Box of 2D and 3D Point Sets

In the following example we use STL containers of 2D points and 3D points, and compute their axis-aligned bounding box.
File: examples/Principal_component_analysis/bounding_box.cpp
#include <CGAL/Cartesian.h>
#include <CGAL/bounding_box.h>

#include <list>
#include <iostream>

typedef double              FT;
typedef CGAL::Cartesian<FT> K;
typedef K::Point_2          Point_2;
typedef K::Point_3          Point_3;

int main()
{
  // axis-aligned bounding box of 2D points
  std::list<Point_2> points_2;
  points_2.push_back(Point_2(1.0, 0.0));
  points_2.push_back(Point_2(2.0, 2.0));
  points_2.push_back(Point_2(3.0, 5.0));

  K::Iso_rectangle_2 c2 = CGAL::bounding_box(points_2.begin(), points_2.end());
  std::cout << c2 << std::endl;

  // axis-aligned bounding box of 3D points
  std::list<Point_3> points_3;
  points_3.push_back(Point_3(1.0, 0.0, 0.5));
  points_3.push_back(Point_3(2.0, 2.0, 1.2));
  points_3.push_back(Point_3(3.0, 5.0, 4.5));

  K::Iso_cuboid_3 c3 = CGAL::bounding_box(points_3.begin(), points_3.end());
  std::cout << c3 << std::endl;

  return 0;
}

68.2.2   Centroid of 2D and 3D Objects

In the following example we use STL containers of 2D, 3D points and 3D triangles, and compute their centroid. The dimension tag indicates the dimension of the objects being fitted (0 for points, 2 for triangles). Note that it is also possible to compute the centroid of the triangle edges (3D segments) by specifying 1 as dimension tag, and to compute the centroid of the triangle vertices (3D points) by specifying 0 as dimension tag.
File: examples/Principal_component_analysis/centroid.cpp
#include <CGAL/Cartesian.h>
#include <CGAL/centroid.h>

#include <list>
#include <iostream>

typedef double               FT;
typedef CGAL::Cartesian<FT>  K;
typedef K::Point_2           Point_2;
typedef K::Point_3           Point_3;
typedef K::Triangle_3        Triangle_3;

int main()
{
  // centroid of 2D points
  std::list<Point_2> points_2;
  points_2.push_back(Point_2(1.0, 0.0));
  points_2.push_back(Point_2(2.0, 2.0));
  points_2.push_back(Point_2(3.0, 5.0));
  Point_2 c2 = CGAL::centroid(points_2.begin(), points_2.end(),CGAL::Dimension_tag<0>());
  std::cout << c2 << std::endl;

  // centroid of 3D points
  std::list<Point_3> points_3;
  points_3.push_back(Point_3(1.0, 0.0, 0.5));
  points_3.push_back(Point_3(2.0, 2.0, 1.2));
  points_3.push_back(Point_3(3.0, 5.0, 4.5));
  Point_3 c3 = CGAL::centroid(points_3.begin(), points_3.end(),CGAL::Dimension_tag<0>());
  std::cout << c3 << std::endl;

  // centroid of 3D triangles
  std::list<Triangle_3> triangles_3;
  Point_3 p(1.0, 0.0, 0.0);
  Point_3 q(1.0, 2.0, 0.0);
  Point_3 r(0.0, 1.0, 3.0);
  Point_3 s(0.0, 2.0, 5.0);
  triangles_3.push_back(Triangle_3(p,q,r));
  triangles_3.push_back(Triangle_3(p,q,s));
  c3 = CGAL::centroid(triangles_3.begin(), triangles_3.end(),CGAL::Dimension_tag<2>());
  std::cout << c3 << std::endl;

  return 0;
}

68.2.3   Barycenter of a Set of Weighted Points

In the following example we use STL containers of 2D and 3D weighted points, and compute their barycenter.
File: examples/Principal_component_analysis/barycenter.cpp
#include <CGAL/Cartesian.h>
#include <CGAL/barycenter.h>

#include <list>
#include <iostream>
#include <utility>

typedef double               FT;
typedef CGAL::Cartesian<FT>  K;
typedef K::Point_2           Point_2;
typedef K::Point_3           Point_3;

int main()
{
  // barycenter of 2D weighted points
  std::list<std::pair<Point_2, FT> > points_2;
  points_2.push_back(std::make_pair(Point_2(1.0, 0.0),  1.0));
  points_2.push_back(std::make_pair(Point_2(2.0, 2.0),  2.0));
  points_2.push_back(std::make_pair(Point_2(3.0, 5.0), -2.0));

  Point_2 c2 = CGAL::barycenter(points_2.begin(), points_2.end());
  std::cout << c2 << std::endl;

  // barycenter of 3D weighted points
  std::list<std::pair<Point_3, FT> > points_3;
  points_3.push_back(std::make_pair(Point_3(1.0, 0.0, 0.5),  1.0));
  points_3.push_back(std::make_pair(Point_3(2.0, 2.0, 1.2),  2.0));
  points_3.push_back(std::make_pair(Point_3(3.0, 5.0, 4.5), -5.0));

  Point_3 c3 = CGAL::barycenter(points_3.begin(), points_3.end());
  std::cout << c3 << std::endl;

  return 0;
}

68.2.4   Least Squares Fitting of a Line to a 2D Point Set

In the following example we use a STL container of 2D points, and compute the best fitting line in the least squares sense.
File: examples/Principal_component_analysis/linear_least_squares_fitting_points_2.cpp
#include <CGAL/Cartesian.h>
#include <CGAL/linear_least_squares_fitting_2.h>
#include <list>

typedef double               FT;
typedef CGAL::Cartesian<FT>  K;
typedef K::Line_2            Line;
typedef K::Point_2           Point;

int main()
{
  std::list<Point> points;
  points.push_back(Point(1.0,2.0));
  points.push_back(Point(3.0,4.0));
  points.push_back(Point(5.0,6.0));

  // fit line 
  Line line;
  linear_least_squares_fitting_2(points.begin(),points.end(),line,CGAL::Dimension_tag<0>());

  return 0;
}

68.2.5   Least Squares Fitting of a Line and a Plane to a 3D Triangle Set

In the following example we use a STL container of 3D triangles, and compute the best fitting line and plane in the least squares sense. Note that we can fit either the whole triangles, the triangle edges or the triangle vertices.
File: examples/Principal_component_analysis/linear_least_squares_fitting_triangles_3.cpp
#include <CGAL/Cartesian.h>
#include <CGAL/linear_least_squares_fitting_3.h>
#include <list>

typedef double               FT;
typedef CGAL::Cartesian<FT>  K;
typedef K::Line_3            Line;
typedef K::Plane_3           Plane;
typedef K::Point_3           Point;
typedef K::Triangle_3        Triangle;

int main(void)
{
  std::list<Triangle> triangles;
  Point a(1.0,2.0,3.0);
  Point b(4.0,0.0,6.0);
  Point c(7.0,8.0,9.0);
  Point d(8.0,7.0,6.0);
  Point e(5.0,3.0,4.0);
  triangles.push_back(Triangle(a,b,c));
  triangles.push_back(Triangle(a,b,d));
  triangles.push_back(Triangle(d,e,c));

  Line line;
  Plane plane;

  // fit plane to whole triangles
  linear_least_squares_fitting_3(triangles.begin(),triangles.end(),plane,CGAL::Dimension_tag<2>());
  
  // fit line to triangle vertices
  linear_least_squares_fitting_3(triangles.begin(),triangles.end(),line, CGAL::Dimension_tag<0>());
  
  return 0;
}