Authors: Hans Tangelder and Andreas Fabri

Introduction

The spatial searching package implements exact and approximate distance browsing by providing implementations of algorithms supporting

both nearest and furthest neighbor searching
both exact and approximate searching
(approximate) range searching
(approximate) k-nearest and k-furthest neighbor searching
(approximate) incremental nearest and incremental furthest neighbor searching
query items representing points and spatial objects.

In these searching problems a set P of data points in d-dimensional space is given. The points can be represented by Cartesian coordinates or homogeneous coordinates. These points are preprocessed into a tree data structure, so that given any query item q the points of P can be browsed efficiently. The approximate spatial searching package is designed for data sets that are small enough to store the search structure in main memory (in contrast to approaches from databases that assume that the data reside in secondary storage).

Neighbor Searching

Spatial searching supports browsing through a collection of d-dimensional spatial objects stored in a spatial data structure on the basis of their distances to a query object. The query object may be a point or an arbitrary spatial object, e.g, a d-dimensional sphere. The objects in the spatial data structure are d-dimensional points.

Often the number of the neighbors to be computed is not know beforehand, e.g., because the number may depend on some properties of the neighbors (for example when querying for the nearest city to Paris with population greater than a million) or the distance to the query point. The conventional approach is k-nearest neighbor searching that makes use of a k-nearest neighbor algorithm, where k is known prior to the invocation of the algorithm. Hence, the number of nearest neighbors has to be guessed. If the guess is too large redundant computations are performed. If the number is too small the computation has to be re-invoked for a larger number of neighbors, thereby performing redundant computations. Therefore, Hjaltason and Samet [5] introduced incremental nearest neighbor searching in the sense that having obtained the k nearest neighbors, the k + 1st neighbor can be obtained without having to calculate the k + 1 nearest neighbor from scratch.

Spatial searching typically consists of a preprocessing phase and a searching phase. In the preprocessing phase one builds a search structure and in the searching phase one makes the queries. In the preprocessing phase the user builds a tree data structure storing the spatial data. In the searching phase the user invokes a searching method to browse the spatial data.

With relatively minor modifications, nearest neighbor searching algorithms can be used to find the furthest object from the query object. Therefore, furthest neighbor searching is also supported by the spatial searching package.

The execution time for exact neighbor searching can be reduced by relaxing the requirement that the neighbors should be computed exactly. If the distances of two objects to the query object are approximately the same, instead of computing the nearest/furthest neighbor exactly, one of these objects may be returned as the approximate nearest/furthest neighbor. I.e., given some non-negative constant \( \epsilon\) the distance of an object returned as an approximate k-nearest neighbor must not be larger than \( (1+\epsilon)r\), where \( r\) denotes the distance to the real kth nearest neighbor. Similar the distance of an approximate k-furthest neighbor must not be smaller than \( r/(1+\epsilon)\). Obviously, for \( \epsilon=0\) we get the exact result, and the larger \( \epsilon\) is, the less exact the result.

While searching the nearest neighbor the algorithm descends the kd-tree and has to decide two things for each node : Which child node should be visited first and could there be possible nearest neighbors in the other child. This basically comes down to computing the distance to the further child, because the distance to the closer child is the same as the one to the parent. There are two options now:

In general, we compute the distance with the given metric. This is the k-neighbor search with a general distance class.
For point queries we can "update" the distance, because it is only changed in one dimension at a time. This is the orthogonal k-neighbor search with an orthogonal distance class. The following example shows the orthogonal distance computation in detail:

Figure 75.1 Orthogonal distance computation technique

Assume we are searching the nearest neighbor, descending the kd-tree, with \( R_{p} \) as the parent rectangle and \( R_{lo} \) and \( R_{hi}\) as its childs in the current step. Further assume \( R_{lo} \) is closer to query point \(q\). Let \(cd\) denote the cutting dimension and let \(cv\) denote the cutting value. At this point we already know the distance \(rd_{p}\) to the parent rectangle and need to check if \(R_{hi}\) could contain nearest neighbors. Because \(R_{lo}\) is the closer rectangle, its distance to \(q\), \(rd_{lo}\), is the same as \(rd_{p}\). Notice that for each dimension \(i \neq cd \), \( \mathrm{dists}_{lo}[i] = \mathrm{dists}_{hi}[i]\), since these coordinates are not affected by the current cut. So the new distance along the cutting dimension is \( \mathrm{dists}_{hi}[cd] = cv - q[cd]\). Now we can compute \(rd_{hi}\) in constant time (independent of dimension) with \(rd_{hi} = rd_{p} - \mathrm{dists}_{lo}[cd]^2 + (cv - q[cd])^2\).
This strategy can be used if and only if the distance changes only in one dimension at a time, which is the case for point queries.

The following two classes implement the standard search strategy for orthogonal distances like the weighted Minkowski distance. The second one is a specialization for incremental neighbor searching and distance browsing. Both require extendes nodes.

Orthogonal_k_neighbor_search<Traits, OrthogonalDistance, Splitter, SpatialTree>

Orthogonal_incremental_neighbor_search<Traits, OrthogonalDistance, Splitter, SpatialTree>

The other two classes implement the standard search strategy for general distances like the Manhattan distance for iso-rectangle queries. Again, the second one is a specialization for incremental neighbor searching and distance browsing .

K_neighbor_search<Traits, GeneralDistance, Splitter, SpatialTree>

Incremental_neighbor_search<Traits, GeneralDistance, Splitter, SpatialTree>

Range Searching

Exact range searching and approximate range searching are supported using exact or fuzzy d-dimensional objects enclosing a region. The fuzziness of the query object is specified by a parameter \( \epsilon\) used to define inner and outer approximations of the query object. Points in the interior of the inner approximation are always reported and points that are not in the closure of the outer approximation are never reported. Other points may or may not be reported. For exact range searching, the fuzziness parameter \( \epsilon\) is set to zero.

The class Kd_tree implements range searching in the method search, which is a template method with an output iterator and a model of the concept FuzzyQueryItem as Fuzzy_iso_box or Fuzzy_sphere. For range searching of large data sets, the user may set the parameter bucket_size used in building the kd tree to a large value (e.g. 100), because in general the query time will be less than using the default value.

Splitting Rules

Instead of using the default splitting rule Sliding_midpoint described below, a user may, depending upon the data, select one from the following splitting rules, which determine how a separating hyperplane is computed. Every splitter has degenerated worst cases, which may lead to a linear tree and a stack overflow. Switching the splitting rule to one of a different kind will solve the problem.

Midpoint_of_rectangle

This splitting rule cuts a rectangle through its midpoint orthogonal to the longest side.

Midpoint_of_max_spread

This splitting rule cuts a rectangle through \( (\mathrm{Mind}+\mathrm{Maxd})/2\) orthogonal to the dimension with the maximum point spread \( [\mathrm{Mind},\mathrm{Maxd}]\).

Sliding_midpoint

This is a modification of the midpoint of rectangle splitting rule. It first attempts to perform a midpoint of rectangle split as described above. If data points lie on both sides of the separating plane the sliding midpoint rule computes the same separator as the midpoint of rectangle rule. If the data points lie only on one side it avoids this by sliding the separator, computed by the midpoint of rectangle rule, to the nearest data point.

As all the midpoint rules cut the bounding box in the middle of the longest side, the tree will become linear for a dataset with exponential increasing distances in one dimension.

Figure 75.2 Midpoint worst case point set in 2d.

Median_of_rectangle

The splitting dimension is the dimension of the longest side of the rectangle. The splitting value is defined by the median of the coordinates of the data points along this dimension.

Median_of_max_spread

The splitting dimension is the dimension of the longest side of the rectangle. The splitting value is defined by the median of the coordinates of the data points along this dimension.

The tree can become linear for the median rules, if many points are collinear in a dimension which is not the cutting dimension.

Figure 75.3 Median worst case point set in 2d.
a is longer than b, so this will be the cutting dimension.

Fair

This splitting rule is a compromise between the median of rectangle splitting rule and the midpoint of rectangle splitting rule. This splitting rule maintains an upper bound on the maximal allowed ratio of the longest and shortest side of a rectangle (the value of this upper bound is set in the constructor of the fair splitting rule). Among the splits that satisfy this bound, it selects the one in which the points have the largest spread. It then splits the points in the most even manner possible, subject to maintaining the bound on the ratio of the resulting rectangles.

Sliding_fair

This splitting rule is a compromise between the fair splitting rule and the sliding midpoint rule. Sliding fair-split is based on the theory that there are two types of splits that are good: balanced splits that produce fat rectangles, and unbalanced splits provided the rectangle with fewer points is fat.

Also, this splitting rule maintains an upper bound on the maximal allowed ratio of the longest and shortest side of a rectangle (the value of this upper bound is set in the constructor of the fair splitting rule). Among the splits that satisfy this bound, it selects the one one in which the points have the largest spread. It then considers the most extreme cuts that would be allowed by the aspect ratio bound. This is done by dividing the longest side of the rectangle by the aspect ratio bound. If the median cut lies between these extreme cuts, then we use the median cut. If not, then consider the extreme cut that is closer to the median. If all the points lie to one side of this cut, then we slide the cut until it hits the first point. This may violate the aspect ratio bound, but will never generate empty cells.

Example Programs

We give seven examples. The first example illustrates k nearest neighbor searching, and the second example incremental neighbor searching. The third is an example of approximate furthest neighbor searching using a d-dimensional iso-rectangle as an query object. Approximate range searching is illustrated by the fourth example. The fifth example illustrates k neighbor searching for a user defined point class. The sixth example shows how to choose another splitting rule in the kd tree that is used as search tree. The last example shows two worst-case scenarios for different splitter types.

Example for K Neighbor Searching

The first example illustrates k neighbor searching with an Euclidean distance and 2-dimensional points. The generated random data points are inserted in a search tree. We then initialize the k neighbor search object with the origin as query. Finally, we obtain the result of the computation in the form of an iterator range. The value of the iterator is a pair of a point and its square distance to the query point. We use square distances, or transformed distances for other distance classes, as they are computationally cheaper.

File Spatial_searching/nearest_neighbor_searching.cpp

#include <CGAL/Simple_cartesian.h>
#include <CGAL/point_generators_2.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <CGAL/Search_traits_2.h>
#include <list>
#include <cmath>
typedef CGAL::Simple_cartesian<double> K;
typedef K::Point_2 Point_d;
typedef CGAL::Search_traits_2<K> TreeTraits;
typedef CGAL::Orthogonal_k_neighbor_search<TreeTraits> Neighbor_search;
typedef Neighbor_search::Tree Tree;
int main() {
  const unsigned int N = 1;
  std::list<Point_d> points;
  points.push_back(Point_d(0,0));
  Tree tree(points.begin(), points.end());
  Point_d query(0,0);
  // Initialize the search structure, and search all N points
  Neighbor_search search(tree, query, N);
   // report the N nearest neighbors and their distance
  // This should sort all N points by increasing distance from origin
  for(Neighbor_search::iterator it = search.begin(); it != search.end(); ++it){
    std::cout << it->first << " "<< std::sqrt(it->second) << std::endl;
  }
  return 0;
}

Example for Incremental Searching

This example program illustrates incremental searching for the closest point with a positive first coordinate. We can use the orthogonal incremental neighbor search class, as the query is also a point and as the distance is the Euclidean distance.

As for the k neighbor search, we first initialize the search tree with the data. We then create the search object, and finally obtain the iterator with the begin() method. Note that the iterator is of the input iterator category, that is one can make only one pass over the data.

File Spatial_searching/distance_browsing.cpp

#include <CGAL/Simple_cartesian.h>
#include <CGAL/Orthogonal_incremental_neighbor_search.h>
#include <CGAL/Search_traits_2.h>
typedef CGAL::Simple_cartesian<double> K;
typedef K::Point_2 Point_d;
typedef CGAL::Search_traits_2<K> TreeTraits;
typedef CGAL::Orthogonal_incremental_neighbor_search<TreeTraits> NN_incremental_search;
typedef NN_incremental_search::iterator NN_iterator;
typedef NN_incremental_search::Tree Tree;
// A functor that returns true, iff the x-coordinate of a dD point is not positive
struct X_not_positive {
  bool operator()(const NN_iterator& it) { return ((*it).first)[0]<0;  }
};
// An iterator that only enumerates dD points with positive x-coordinate
typedef CGAL::Filter_iterator<NN_iterator, X_not_positive> NN_positive_x_iterator;
int main() {
  Tree tree;
  tree.insert(Point_d(0,0));
  tree.insert(Point_d(1,1));
  tree.insert(Point_d(0,1));
  tree.insert(Point_d(10,110));
  tree.insert(Point_d(45,0));
  tree.insert(Point_d(0,2340));
  tree.insert(Point_d(0,30));
  Point_d query(0,0);
  NN_incremental_search NN(tree, query);
  NN_positive_x_iterator it(NN.end(), X_not_positive(), NN.begin()), end(NN.end(), X_not_positive());
  std::cout <<  "The first 5 nearest neighbours with positive x-coord are: " << std::endl;
  for (int j=0; (j < 5)&&(it!=end); ++j,++it)
    std::cout <<   (*it).first << "  at squared distance = " << (*it).second << std::endl;
  return 0;
}

Example for General Neighbor Searching

This example program illustrates approximate nearest and furthest neighbor searching using 4-dimensional Cartesian coordinates. Five approximate furthest neighbors of the query rectangle \( [0.1,0.2]^4\) are computed. Because the query object is a rectangle we cannot use the orthogonal neighbor search. As in the previous examples we first initialize a search tree, create the search object with the query, and obtain the result of the search as iterator range.

File Spatial_searching/general_neighbor_searching.cpp

#include <CGAL/Epick_d.h>
#include <CGAL/point_generators_d.h>
#include <CGAL/Manhattan_distance_iso_box_point.h>
#include <CGAL/K_neighbor_search.h>
#include <CGAL/Search_traits_d.h>
typedef CGAL::Epick_d<CGAL::Dimension_tag<4> > Kernel;
typedef Kernel::Point_d Point_d;
typedef CGAL::Random_points_in_cube_d<Point_d> Random_points_iterator;
typedef Kernel::Iso_box_d Iso_box_d;
typedef Kernel TreeTraits;
typedef CGAL::Manhattan_distance_iso_box_point<TreeTraits> Distance;
typedef CGAL::K_neighbor_search<TreeTraits, Distance> Neighbor_search;
typedef Neighbor_search::Tree Tree;
int  main() {
  const int N = 1000;
  const unsigned int K = 10;
  Tree tree;
  Random_points_iterator rpit(4,1000.0);
  for(int i = 0; i < N; i++){
    tree.insert(*rpit++);
  }
  Point_d pp(0.1,0.1,0.1,0.1);
  Point_d qq(0.2,0.2,0.2,0.2);
  Iso_box_d query(pp,qq);
  Distance tr_dist;
  Neighbor_search N1(tree, query, 5, 10.0, false); // eps=10.0, nearest=false
  std::cout << "For query rectangle = [0.1, 0.2]^4 " << std::endl
        <<  "the " << K << " approximate furthest neighbors are: " << std::endl;
  for (Neighbor_search::iterator it = N1.begin();it != N1.end();it++) {
    std::cout << " Point " << it->first << " at distance  " << tr_dist.inverse_of_transformed_distance(it->second) << std::endl;
  }
  return 0;
}

Example for a Range Query

This example program illustrates approximate range querying for 4-dimensional fuzzy iso-rectangles and spheres using the higher dimensional kernel Epick_d. The range queries are member functions of the kd tree class.

File Spatial_searching/fuzzy_range_query.cpp

#include <CGAL/Epick_d.h>
#include <CGAL/point_generators_d.h>
#include <CGAL/Kd_tree.h>
#include <CGAL/Fuzzy_sphere.h>
#include <CGAL/Fuzzy_iso_box.h>
#include <CGAL/Search_traits_d.h>
const int D = 4;
typedef CGAL::Epick_d<CGAL::Dimension_tag<D> > K;
typedef K::Point_d Point_d;
typedef CGAL::Search_traits_d<K,CGAL::Dimension_tag<D> >  Traits;
typedef CGAL::Random_points_in_cube_d<Point_d>       Random_points_iterator;
typedef CGAL::Counting_iterator<Random_points_iterator> N_Random_points_iterator;
typedef CGAL::Kd_tree<Traits> Tree;
typedef CGAL::Fuzzy_sphere<Traits> Fuzzy_sphere;
typedef CGAL::Fuzzy_iso_box<Traits> Fuzzy_iso_box;
int main() {
  const int N = 1000;
  // generator for random data points in the square ( (-1000,-1000), (1000,1000) )
  Random_points_iterator rpit(4, 1000.0);
  // Insert N points in the tree
  Tree tree(N_Random_points_iterator(rpit,0),
        N_Random_points_iterator(rpit,N));
  // define range query objects
  double  pcoord[D] = { 300, 300, 300, 300 };
  double  qcoord[D] = { 900.0, 900.0, 900.0, 900.0 };
  Point_d p(D, pcoord+0, pcoord+D);
  Point_d q(D, qcoord+0, qcoord+D);
  Fuzzy_sphere fs(p, 700.0, 100.0);
  Fuzzy_iso_box fib(p, q, 100.0);
  std::cout << "points approximately in fuzzy spherical range query" << std::endl;
  std::cout << "with center (300, 300, 300, 300)" << std::endl;
  std::cout << "and fuzzy radius [600, 800] are:" << std::endl;
  tree.search(std::ostream_iterator<Point_d>(std::cout, "\n"), fs);
  std::cout << "points approximately in fuzzy rectangular range query ";
  std::cout << "[[200, 400], [800,1000]]^4 are:" << std::endl;
  tree.search(std::ostream_iterator<Point_d>(std::cout, "\n"), fib);
  return 0;
}

Example for User Defined Point and Distance Class

The neighbor searching works with all CGAL kernels, as well as with user defined points and distance classes. In this example we assume that the user provides the following 3-dimensional points class.

File Spatial_searching/Point.h

struct Point {
  double vec[3];
  Point() { vec[0]= vec[1] = vec[2] = 0; }
  Point (double x, double y, double z) { vec[0]=x; vec[1]=y; vec[2]=z;  }
  double x() const { return vec[ 0 ]; }
  double y() const { return vec[ 1 ]; }
  double z() const { return vec[ 2 ]; }
  double& x() { return vec[ 0 ]; }
  double& y() { return vec[ 1 ]; }
  double& z() { return vec[ 2 ]; }
  bool operator==(const Point& p) const
  {
    return (x() == p.x()) && (y() == p.y()) && (z() == p.z())  ;
  }
  bool  operator!=(const Point& p) const { return ! (*this == p); }
}; //end of class
struct Construct_coord_iterator {
  typedef  const double* result_type;
  const double* operator()(const Point& p) const
  { return static_cast<const double*>(p.vec); }
  const double* operator()(const Point& p, int)  const
  { return static_cast<const double*>(p.vec+3); }
};

We have put the glue layer in this file as well, that is a class that allows to iterate over the Cartesian coordinates of the point, and a class to construct such an iterator for a point. We next need a distance class
File Spatial_searching/Distance.h

struct Distance {
  typedef Point Query_item;
  typedef double FT;
  typedef CGAL::Dimension_tag<3> D;
  double transformed_distance(const Point& p1, const Point& p2) const {
    double distx= p1.x()-p2.x();
    double disty= p1.y()-p2.y();
    double distz= p1.z()-p2.z();
    return distx*distx+disty*disty+distz*distz;
  }
  double min_distance_to_rectangle(const Point& p,
                   const CGAL::Kd_tree_rectangle<FT,D>& b) const {
    double distance(0.0), h = p.x();
    if (h < b.min_coord(0)) distance += (b.min_coord(0)-h)*(b.min_coord(0)-h);
    if (h > b.max_coord(0)) distance += (h-b.max_coord(0))*(h-b.max_coord(0));
    h=p.y();
    if (h < b.min_coord(1)) distance += (b.min_coord(1)-h)*(b.min_coord(1)-h);
    if (h > b.max_coord(1)) distance += (h-b.max_coord(1))*(h-b.min_coord(1));
    h=p.z();
    if (h < b.min_coord(2)) distance += (b.min_coord(2)-h)*(b.min_coord(2)-h);
    if (h > b.max_coord(2)) distance += (h-b.max_coord(2))*(h-b.max_coord(2));
    return distance;
  }
  double min_distance_to_rectangle(const Point& p,
                   const CGAL::Kd_tree_rectangle<FT,D>& b,std::vector<double>& dists){   
    double distance(0.0), h = p.x();
    if (h < b.min_coord(0)){
      dists[0] = (b.min_coord(0)-h);
      distance += dists[0]*dists[0];
    }
    if (h > b.max_coord(0)){
      dists[0] = (h-b.max_coord(0));
      distance += dists[0]*dists[0];
    }
    h=p.y();
    if (h < b.min_coord(1)){
      dists[1] = (b.min_coord(1)-h);
      distance += dists[1]*dists[1];
    }
    if (h > b.max_coord(1)){
      dists[1] = (h-b.max_coord(1));
      distance += dists[1]*dists[1];
    }
    h=p.z();
    if (h < b.min_coord(2)){
      dists[2] = (b.min_coord(2)-h);
      distance += dists[2]*dists[2];
    }
    if (h > b.max_coord(2)){
      dists[2] = (h-b.max_coord(2));
      distance += dists[2]*dists[2];
    }
    return distance;
  }
  double max_distance_to_rectangle(const Point& p,
                   const CGAL::Kd_tree_rectangle<FT,D>& b) const {
    double h = p.x();
    double d0 = (h >= (b.min_coord(0)+b.max_coord(0))/2.0) ?
                (h-b.min_coord(0))*(h-b.min_coord(0)) : (b.max_coord(0)-h)*(b.max_coord(0)-h);
    h=p.y();
    double d1 = (h >= (b.min_coord(1)+b.max_coord(1))/2.0) ?
                (h-b.min_coord(1))*(h-b.min_coord(1)) : (b.max_coord(1)-h)*(b.max_coord(1)-h);
    h=p.z();
    double d2 = (h >= (b.min_coord(2)+b.max_coord(2))/2.0) ?
                (h-b.min_coord(2))*(h-b.min_coord(2)) : (b.max_coord(2)-h)*(b.max_coord(2)-h);
    return d0 + d1 + d2;
  }
  double max_distance_to_rectangle(const Point& p,
                   const CGAL::Kd_tree_rectangle<FT,D>& b,std::vector<double>& dists){   
    double h = p.x();
    dists[0] = (h >= (b.min_coord(0)+b.max_coord(0))/2.0) ?
                (h-b.min_coord(0)) : (b.max_coord(0)-h);
    
    h=p.y();
    dists[1] = (h >= (b.min_coord(1)+b.max_coord(1))/2.0) ?
                (h-b.min_coord(1)) : (b.max_coord(1)-h);
    h=p.z();
    dists[2] = (h >= (b.min_coord(2)+b.max_coord(2))/2.0) ?
                (h-b.min_coord(2)) : (b.max_coord(2)-h);
    return dists[0] * dists[0] + dists[1] * dists[1] + dists[2] * dists[2];
  }
  double new_distance(double& dist, double old_off, double new_off,
              int /* cutting_dimension */)  const {
    return dist + new_off*new_off - old_off*old_off;
  }
  double transformed_distance(double d) const { return d*d; }
  double inverse_of_transformed_distance(double d) { return std::sqrt(d); }
}; // end of struct Distance

We are ready to put the pieces together. The class Search_traits<..> ,which you see in the next file, is a mere wrapper for all our defined types. The searching itself works exactly as for CGAL kernels.

File Spatial_searching/user_defined_point_and_distance.cpp

#include <CGAL/Search_traits.h>
#include <CGAL/point_generators_3.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include "Point.h"  // defines types Point, Construct_coord_iterator
#include "Distance.h"
typedef CGAL::Creator_uniform_3<double,Point> Point_creator;
typedef CGAL::Random_points_in_cube_3<Point, Point_creator> Random_points_iterator;
typedef CGAL::Counting_iterator<Random_points_iterator> N_Random_points_iterator;
typedef CGAL::Dimension_tag<3> D;
typedef CGAL::Search_traits<double, Point, const double*, Construct_coord_iterator, D> Traits;
typedef CGAL::Orthogonal_k_neighbor_search<Traits, Distance> K_neighbor_search;
typedef K_neighbor_search::Tree Tree;
int main() {
  const int N = 1000;
  const unsigned int K = 5;
  // generator for random data points in the cube ( (-1,-1,-1), (1,1,1) )
  Random_points_iterator rpit( 1.0);
  // Insert number_of_data_points in the tree
  Tree tree(N_Random_points_iterator(rpit,0),
        N_Random_points_iterator(N));
  Point query(0.0, 0.0, 0.0);
  Distance tr_dist;
  // search K nearest neighbours
  K_neighbor_search search(tree, query, K);
  for(K_neighbor_search::iterator it = search.begin(); it != search.end(); it++){
    std::cout << " d(q, nearest neighbor)=  "
          << tr_dist.inverse_of_transformed_distance(it->second) << std::endl;
  }
  // search K furthest neighbour searching, with eps=0, search_nearest=false
  K_neighbor_search search2(tree, query, K, 0.0, false);
  for(K_neighbor_search::iterator it = search2.begin(); it != search2.end(); it++){
    std::cout << " d(q, furthest neighbor)=  "
          << tr_dist.inverse_of_transformed_distance(it->second) << std::endl;
  }
  return 0;
}

Examples for Using an Arbitrary Point Type with Point Property Maps

The following four example programs illustrate how to use the classes Search_traits_adapter<Key,PointPropertyMap,BaseTraits> and Distance_adapter<Key,PointPropertyMap,Base_distance> to store in the kd-tree objects of an arbitrary key type. Points are accessed through a point property map. This enables to associate information to a point or to reduce the size of the search structure.

Using a Point and an Integer as Key Type

In this example program, the search tree stores tuples of point and integer. The value type of the iterator of the neighbor searching algorithm is this tuple type.

File Spatial_searching/searching_with_point_with_info.cpp

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Search_traits_3.h>
#include <CGAL/Search_traits_adapter.h>
#include <CGAL/point_generators_3.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <CGAL/property_map.h>
#include <boost/iterator/zip_iterator.hpp>
#include <utility>
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef Kernel::Point_3                                     Point_3;
typedef boost::tuple<Point_3,int>                           Point_and_int;
typedef CGAL::Random_points_in_cube_3<Point_3>              Random_points_iterator;
typedef CGAL::Search_traits_3<Kernel>                       Traits_base;
typedef CGAL::Search_traits_adapter<Point_and_int,
  CGAL::Nth_of_tuple_property_map<0, Point_and_int>,
  Traits_base>                                              Traits;
typedef CGAL::Orthogonal_k_neighbor_search<Traits>          K_neighbor_search;
typedef K_neighbor_search::Tree                             Tree;
typedef K_neighbor_search::Distance                         Distance;
int main() {
  const unsigned int K = 5;
  // generator for random data points in the cube ( (-1,-1,-1), (1,1,1) )
  Random_points_iterator rpit( 1.0);
  std::vector<Point_3> points;
  std::vector<int>     indices;
  
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  indices.push_back(0);
  indices.push_back(1);
  indices.push_back(2);
  indices.push_back(3);
  indices.push_back(4);
  indices.push_back(5);
  indices.push_back(6);
  // Insert number_of_data_points in the tree
  Tree tree(
    boost::make_zip_iterator(boost::make_tuple( points.begin(),indices.begin() )),
    boost::make_zip_iterator(boost::make_tuple( points.end(),indices.end() ) )  
  );
  Point_3 query(0.0, 0.0, 0.0);
  Distance tr_dist;
  // search K nearest neighbours
  K_neighbor_search search(tree, query, K);
  for(K_neighbor_search::iterator it = search.begin(); it != search.end(); it++){
    std::cout << " d(q, nearest neighbor)=  "
          << tr_dist.inverse_of_transformed_distance(it->second) << " " 
              << boost::get<0>(it->first)<< " " << boost::get<1>(it->first) << std::endl;
  }
  return 0;
}

Using an Integer as Key Type

In this example program, the search tree stores only integers that refer to points stored within a user vector. The point type of the search traits is std::size_t.

File Spatial_searching/searching_with_point_with_info_inplace.cpp

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Search_traits_3.h>
#include <CGAL/Search_traits_adapter.h>
#include <CGAL/point_generators_3.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <boost/iterator/counting_iterator.hpp>
#include <utility>
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef Kernel::Point_3 Point_3;
//definition of a non-mutable lvalue property map,
//with the get function as a friend function to give it
//access to the private member
class My_point_property_map{
  const std::vector<Point_3>& points;
public:
  typedef Point_3 value_type;
  typedef const value_type& reference;
  typedef std::size_t key_type;
  typedef boost::lvalue_property_map_tag category;  
  My_point_property_map(const std::vector<Point_3>& pts):points(pts){}
  reference operator[](key_type k) const {return points[k];}
  friend reference get(const My_point_property_map& ppmap,key_type i) 
  {return ppmap[i];}
};
typedef CGAL::Random_points_in_cube_3<Point_3>                               Random_points_iterator;
typedef CGAL::Search_traits_3<Kernel>                                        Traits_base;
typedef CGAL::Search_traits_adapter<std::size_t,My_point_property_map,Traits_base> Traits;
typedef CGAL::Orthogonal_k_neighbor_search<Traits>                      K_neighbor_search;
typedef K_neighbor_search::Tree                                         Tree;
typedef Tree::Splitter                                                  Splitter;
typedef K_neighbor_search::Distance                                     Distance;
int main() {
  const unsigned int K = 5;
  // generator for random data points in the cube ( (-1,-1,-1), (1,1,1) )
  Random_points_iterator rpit( 1.0);
  std::vector<Point_3> points;
  
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  points.push_back(Point_3(*rpit++));
  My_point_property_map ppmap(points);
  // Insert number_of_data_points in the tree
  Tree tree(
    boost::counting_iterator<std::size_t>(0),
    boost::counting_iterator<std::size_t>(points.size()),
    Splitter(),
    Traits(ppmap)
  );
  Point_3 query(0.0, 0.0, 0.0);
  Distance tr_dist(ppmap);
  // search K nearest neighbours
  K_neighbor_search search(tree, query, K,0,true,tr_dist);
  for(K_neighbor_search::iterator it = search.begin(); it != search.end(); it++){
    std::cout << " d(q, nearest neighbor)=  "
          << tr_dist.inverse_of_transformed_distance(it->second) << " " 
              << points[it->first] << " " << it->first << std::endl;
  }
  return 0;
}

Using a Model of L-value Property Map Concept

This example programs uses a model of LvaluePropertyMap. Points are read from a std::map. The search tree stores integers of type std::size_t. The value type of the iterator of the neighbor searching algorithm is std::size_t.

File Spatial_searching/searching_with_point_with_info_pmap.cpp

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Search_traits_3.h>
#include <CGAL/Search_traits_adapter.h>
#include <CGAL/point_generators_3.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <boost/iterator/counting_iterator.hpp>
#include <utility>
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef Kernel::Point_3 Point_3;
typedef boost::const_associative_property_map<std::map<std::size_t,Point_3> >           My_point_property_map;
  
typedef CGAL::Random_points_in_cube_3<Point_3>                                          Random_points_iterator;
typedef CGAL::Search_traits_3<Kernel>                                                   Traits_base;
typedef CGAL::Search_traits_adapter<std::size_t,My_point_property_map,Traits_base>      Traits;
typedef CGAL::Orthogonal_k_neighbor_search<Traits>                      K_neighbor_search;
typedef K_neighbor_search::Tree                                         Tree;
typedef Tree::Splitter                                                  Splitter;
typedef K_neighbor_search::Distance                                     Distance;
int main() {
  const unsigned int K = 5;
  // generator for random data points in the cube ( (-1,-1,-1), (1,1,1) )
  Random_points_iterator rpit( 1.0);
  std::map<std::size_t,Point_3> points;
  
  points[0]=Point_3(*rpit++);
  points[1]=Point_3(*rpit++);
  points[2]=Point_3(*rpit++);
  points[3]=Point_3(*rpit++);
  points[4]=Point_3(*rpit++);
  points[5]=Point_3(*rpit++);
  points[6]=Point_3(*rpit++);
  My_point_property_map ppmap(points);
  // Insert number_of_data_points in the tree
  Tree tree(
    boost::counting_iterator<std::size_t>(0),
    boost::counting_iterator<std::size_t>(points.size()),
    Splitter(),
    Traits(ppmap)
  );
  Point_3 query(0.0, 0.0, 0.0);
  Distance tr_dist(ppmap);
  // search K nearest neighbours
  K_neighbor_search search(tree, query, K,0,true,tr_dist);
  for(K_neighbor_search::iterator it = search.begin(); it != search.end(); it++){
    std::cout << " d(q, nearest neighbor)=  "
          << tr_dist.inverse_of_transformed_distance(it->second) << " "
              << points[it->first] << " " << it->first << std::endl;
  }
  return 0;
}

Using a Point Property Map of a Polygonal Mesh

This example programs shows how to search the closest vertices of a Surface_mesh or, quite similar, of a Polyhedron_3. Points are stored in the polygonal mesh. The search tree stores vertex descriptors. The value type of the iterator of the neighbor searching algorithm is boost::graph_traits<Surface_mesh>::vertex_descriptor .

File Spatial_searching/searching_surface_mesh_vertices.cpp

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Search_traits_3.h>
#include <CGAL/Search_traits_adapter.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <CGAL/Surface_mesh.h>
#include <fstream>
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef Kernel::Point_3 Point_3;
typedef CGAL::Surface_mesh<Point_3> Mesh;
typedef boost::graph_traits<Mesh>::vertex_descriptor Point;
typedef boost::graph_traits<Mesh>::vertices_size_type size_type;
typedef boost::property_map<Mesh,CGAL::vertex_point_t>::type Vertex_point_pmap;
typedef CGAL::Search_traits_3<Kernel>                                    Traits_base;
typedef CGAL::Search_traits_adapter<Point,Vertex_point_pmap,Traits_base> Traits;
typedef CGAL::Orthogonal_k_neighbor_search<Traits>                      K_neighbor_search;
typedef K_neighbor_search::Tree                                         Tree;
typedef Tree::Splitter                                                  Splitter;
typedef K_neighbor_search::Distance                                     Distance;
int main(int argc, char* argv[]) {
  Mesh mesh;
  std::cerr << argc <<std::endl;
  std::ifstream in((argc>1)?argv[1]:"data/tripod.off");
  in  >> mesh;
  const unsigned int K = 5;
  Vertex_point_pmap vppmap = get(CGAL::vertex_point,mesh);
  // Insert number_of_data_points in the tree
  Tree tree(
            vertices(mesh).begin(),
            vertices(mesh).end(),
            Splitter(),
            Traits(vppmap)
  );
  Point_3 query(0.0, 0.0, 0.0);
  Distance tr_dist(vppmap);
  // search K nearest neighbours
  K_neighbor_search search(tree, query, K,0,true,tr_dist);
  std::cout <<"The "<< K << " nearest vertices to the query point at (0,0,0) are:" << std::endl;
  for(K_neighbor_search::iterator it = search.begin(); it != search.end(); it++){
    std::cout << "vertex " << it->first << " : " << vppmap[it->first] << " at distance " 
          << tr_dist.inverse_of_transformed_distance(it->second) << std::endl;
  }
  return 0;
}

Example for Selecting a Splitting Rule and Setting the Bucket Size

This example program illustrates selecting a splitting rule and setting the maximal allowed bucket size. The only differences with the first example are the declaration of the Fair splitting rule, needed to set the maximal allowed bucket size.

File Spatial_searching/using_fair_splitting_rule.cpp

#include <CGAL/Simple_cartesian.h>
#include <CGAL/point_generators_2.h>
#include <CGAL/Search_traits_2.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <cmath>
typedef CGAL::Simple_cartesian<double> R;
typedef R::Point_2 Point_d;
typedef CGAL::Random_points_in_square_2<Point_d> Random_points_iterator;
typedef CGAL::Counting_iterator<Random_points_iterator> N_Random_points_iterator;
typedef CGAL::Search_traits_2<R> Traits;
typedef CGAL::Euclidean_distance<Traits> Distance;
typedef CGAL::Fair<Traits> Fair;
typedef CGAL::Orthogonal_k_neighbor_search<Traits,Distance,Fair> Neighbor_search;
typedef Neighbor_search::Tree Tree;
int main() {
  const unsigned int N = 1000;
  // generator for random data points in the square ( (-1,-1), (1,1) )
  Random_points_iterator rpit( 1.0);
  Fair fair(5); // bucket size=5
  // Insert number_of_data_points in the tree
  Tree tree(N_Random_points_iterator(rpit,0),
        N_Random_points_iterator(N),
        fair);
  Point_d query(0,0);
  // Initialize the search structure, and search all N points
  Neighbor_search search(tree, query, N);
  // report the N nearest neighbors and their distance
  // This should sort all N points by increasing distance from origin
  for(Neighbor_search::iterator it = search.begin(); it != search.end(); ++it){
    std::cout << it->first << " "<< std::sqrt(it->second) << std::endl;
  }
  return 0;
}

Example for Worst-Case Scenarios for Different Splitters

This example program has two 2-dimensional data sets: The first one containing collinear points with exponential increasing distances and the second one with collinear points in the firstdimension and one point with a distance exceeding the spread of the other points in the second dimension. These are the worst cases for the midpoint/median rules and can also occur in higher dimensions.

File Spatial_searching/splitter_worst_cases.cpp

#include <CGAL/Simple_cartesian.h>
#include <CGAL/Orthogonal_k_neighbor_search.h>
#include <CGAL/Search_traits_2.h>
typedef CGAL::Simple_cartesian<double>                Kernel;
typedef CGAL::Search_traits_2<Kernel>                 Traits_2;
typedef Kernel::Point_2                               Point_2;
typedef CGAL::Sliding_midpoint<Traits_2>              Sliding_midpoint;
typedef CGAL::Median_of_rectangle<Traits_2>           Median_of_rectangle;
typedef CGAL::Euclidean_distance<Traits_2>            Distance;
typedef CGAL::Orthogonal_k_neighbor_search<Traits_2,Distance,Sliding_midpoint>     Neighbor_search_sliding;
typedef CGAL::Orthogonal_k_neighbor_search<Traits_2,Distance,Median_of_rectangle>  Neighbor_search_median;
typedef Neighbor_search_sliding::Tree                 Tree_sliding;
typedef Neighbor_search_median::Tree                  Tree_median;
typedef std::vector<Point_2>                          Points;
int main()
{
  Points sliding_worst_case;
  for (int i = 0 ,j = 1; i < 10 ; ++i , j *= 2){
    sliding_worst_case.push_back(Point_2(((double)i)/10 , 0));
    sliding_worst_case.push_back(Point_2( (double)j , 0));    
  }
  Sliding_midpoint sliding(10);
  Median_of_rectangle median(10);
  Tree_sliding tree1(sliding_worst_case.begin(), sliding_worst_case.end() , sliding);
  tree1.build();
  std::cout << "Worst case tree for Sliding midpoint and Midpoint of max spread : "<<std::endl;
  tree1.statistics(std::cout);
  tree1.clear();
  std::cout<<std::endl<<"Same data with median splitter:"<<std::endl;
  Tree_median tree2(sliding_worst_case.begin(), sliding_worst_case.end() , median );
  tree2.statistics(std::cout);
  tree2.clear();
  Points median_worst_case;
  for(int i = 0 ; i < 19 ; ++i){
    median_worst_case.push_back(Point_2( 0 , i));
  }
  median_worst_case.push_back(Point_2(20,0));
 
  Tree_median tree3(median_worst_case.begin() , median_worst_case.end() , median);
  tree3.build();
  std::cout <<std::endl<< "Worst case tree for Median of rectangle, Median of max spread : "<<std::endl;
  tree3.statistics(std::cout);
  tree3.clear();
  std::cout<<std::endl<<"Same data with midpoint splitter:"<<std::endl;
  Tree_sliding tree4(median_worst_case.begin() , median_worst_case.end() , sliding);
  tree4.build();
  tree4.statistics(std::cout);
  tree4.clear();
  return 0;
}

Performance

Performance of the Orthogonal Search

We took the gargoyle data set (Surface) from aim@shape, and generated the same number of random points in the bbox of the gargoyle (Random). We then consider three scenarios as data/queries. The data set contains 800K points. For each query point we compute the K=10,20,30 closest points, with the default splitter and for the bucket size 10 (default) and 20.

The results were produced with the release 4.6 of CGAL, on an Intel i7 2.7 Ghz laptop with 16 GB RAM, compiled with Visual C++ 2012 with the /O2 option.

The values are the average of ten tests each.

k	bucket size	Surface/Surface	Surface/Random	Random/Random
10	10	0.89	11.48	2.63
10	20	0.89	9.80	2.25
20	10	1.60	13.41	4.06
20	20	1.59	11.62	3.46
30	10	2.35	15.52	5.42
30	20	2.33	13.50	4.61

Figure 75.4 Benchmark dataset (downsampled for visualization).
Blue: Gargoyle surface. Green: Gargoyle bbox random.

Software Design

The kd Tree

Bentley [3] introduced the kd tree as a generalization of the binary search tree in higher dimensions. kd trees hierarchically decompose space into a relatively small number of rectangles such that no rectangle contains too many input objects. For our purposes, a rectangle in real d dimensional space, \( \mathbb{R}^d\), is the product of d closed intervals on the coordinate axes. kd trees are obtained by partitioning point sets in \( \mathbb{R}^d\) using (d-1)-dimensional hyperplanes. Each node in the tree is split into two children by one such separating hyperplane. Several splitting rules (see Section Splitting Rules can be used to compute a separating (d-1)-dimensional hyperplane.

Each internal node of the kd tree is associated with a rectangle and a hyperplane orthogonal to one of the coordinate axis, which splits the rectangle into two parts. Therefore, such a hyperplane, defined by a splitting dimension and a splitting value, is called a separator. These two parts are then associated with the two child nodes in the tree. The process of partitioning space continues until the number of data points in the rectangle falls below some given threshold. The rectangles associated with the leaf nodes are called buckets, and they define a subdivision of the space into rectangles. Data points are only stored in the leaf nodes of the tree, not in the internal nodes.

Friedmann, Bentley and Finkel [4] described the standard search algorithm to find the kth nearest neighbor by searching a kd tree recursively.

When encountering a node of the tree, the algorithm first visits the child that is closest to the query point. On return, if the rectangle containing the other child lies within 1/ (1+ \( \epsilon\)) times the distance to the kth nearest neighbors so far, then the other child is visited recursively. Priority search [2] visits the nodes in increasing order of distance from the queue with help of a priority queue. The search stops when the distance of the query point to the nearest nodes exceeds the distance to the nearest point found with a factor 1/ (1+ \( \epsilon\)). Priority search supports next neighbor search, standard search does not.

In order to speed-up the internal distance computations in nearest neighbor searching in high dimensional space, the approximate searching package supports orthogonal distance computation. Orthogonal distance computation implements the efficient incremental distance computation technique introduced by Arya and Mount [1]. This technique works only for neighbor queries with query items represented as points and with a quadratic form distance, defined by \( d_A(x,y)= (x-y)A(x-y)^T\), where the matrix \( A\) is positive definite, i.e. \( d_A(x,y) \geq 0\). An important class of quadratic form distances are weighted Minkowski distances. Given a parameter \( p>0\) and parameters \( w_i \geq 0, 0 < i \leq d\), the weighted Minkowski distance is defined by \( l_p(w)(r,q)= ({\Sigma_{i=1}^{i=d} \, w_i(r_i-q_i)^p})^{1/p}\) for \( 0 < p <\infty\) and defined by \( l_{\infty}(w)(r,q)=max \{w_i |r_i-q_i| \mid 1 \leq i \leq d\}\). The Manhattan distance ( \( p=1\), \( w_i=1\)) and the Euclidean distance ( \( p=2\), \( w_i=1\)) are examples of a weighted Minkowski metric.

To speed up distance computations also transformed distances are used instead of the distance itself. For instance for the Euclidean distance, to avoid the expensive computation of square roots, squared distances are used instead of the Euclidean distance itself.

Implementation History

The initial implementation of this package was done by Hans Tangelder and Andreas Fabri. It was optimized in speed and memory consumption by Markus Overtheil during an internship at GeometryFactory in 2014.