Evaluation and Optimization: IoU, Non-max Suppression, Anchor Boxes

Intersection over Union (IoU)
Non-max Suppression (NMS)
Anchor Boxes
Summary

Intersection over Union (IoU)

Intersection over Union (IoU) is a metric used to evaluate the accuracy of an object detector on a particular dataset. It measures the overlap between two bounding boxes:

The predicted bounding box
The ground-truth bounding box

Mathematical Definition

If $B_{p}$ is the predicted bounding box and $B_{g t}$ is the ground truth bounding box:

$I o U = \frac{A re a ( B _{p} \cap B _{g t} )}{A re a ( B _{p} \cup B _{g t} )}$

$I o U = 1.0$ : perfect overlap
$I o U = 0.0$ : no overlap

Example

Suppose:

Predicted box: top-left = (50, 50), bottom-right = (150, 150)
Ground truth: top-left = (100, 100), bottom-right = (200, 200)

The overlapping area is a square from (100, 100) to (150, 150) → 50x50 = 2500

Total area:

Predicted: $100 \times 100 = 10, 000$
GT: $100 \times 100 = 10, 000$
Union: $10, 000 + 10, 000 - 2, 500 = 17, 500$

So,

$I o U = \frac{2500}{17500} = 0.143$

Use in Training and Evaluation

In training, you may ignore detections with IoU < 0.5
For evaluation, mAP (mean average precision) uses IoU thresholds (e.g., 0.5 or 0.75)

Non-max Suppression (NMS)

Why Do We Need It?

Object detectors often output multiple overlapping boxes for a single object. NMS filters out redundant boxes by keeping the one with the highest confidence score.

Algorithm Steps

Sort all bounding boxes by their confidence score.
Select the box with the highest confidence and remove it from the list.
Compute IoU between this box and all others.
Remove boxes with IoU above a threshold (e.g., 0.5).
Repeat until no boxes remain.

Mathematical Intuition

Let $B_{i}$ be a box with score $s_{i}$ . You iterate over all boxes and apply:

$Keep B_{i} if I o U (B_{i}, B_{j}) < T, \forall j < i$

Where $T$ is the suppression threshold.

Anchor Boxes

What are Anchor Boxes?

Anchor boxes (also called prior boxes) are predefined bounding boxes with different shapes and sizes. They allow object detectors to:

Detect multiple objects in the same grid cell
Handle aspect ratio and scale variation

Why Are They Needed?

Without anchor boxes, a single grid cell could detect only one object. But real-world scenes often contain overlapping or closely spaced objects.

Anchor Box Design

You predefine $k$ anchor boxes per cell. Each one is defined by:

Width $w$
Height $h$
Aspect ratio $r = \frac{w}{h}$

For example, in SSD:

3 feature maps
6 anchors per feature cell
$\Rightarrow$ 8732 total anchor boxes

Output Format with Anchors

For each anchor box, the network predicts:

$Δ x, Δ y$ : offset from anchor center
$Δ w, Δ h$ : log scale changes to width and height
Confidence score
Class probabilities

This transforms anchor box $(x_{a}, y_{a}, w_{a}, h_{a})$ to the predicted box $(x_{p}, y_{p}, w_{p}, h_{p})$ :

$x_{p} = x_{a} + w_{a} \cdot Δ x y_{p} = y_{a} + h_{a} \cdot Δ y w_{p} = w_{a} \cdot e^{Δ w} h_{p} = h_{a} \cdot e^{Δ h}$

Summary

IoU measures overlap and is used for loss/evaluation.
Non-max suppression removes redundant boxes based on IoU.
Anchor boxes allow detection of multiple objects at different scales/aspect ratios.

Together, these techniques form the foundation of modern object detection pipelines like YOLO, SSD, and Faster R-CNN.