Face Recognition and Neural Style Transfer

What is Face Recognition?

Face recognition is the task of identifying or verifying a person’s identity using their facial features. It can be broken down into three main categories:

  • Face Detection: Locate faces in an image (bounding box).
  • Face Verification: Check if two faces are of the same person (1:1 comparison).
  • Face Recognition/Identification: Identify a person from a database (1:N comparison).

Real-World Applications

  • Smartphone unlock (Face ID)
  • Security surveillance
  • Online proctoring
  • Social media tagging (e.g., Facebook)


One Shot Learning

Traditional classification algorithms require many training examples per class. However, in face recognition:

  • We might only have one image per person.
  • The task becomes: Can the model recognize a face it has seen only once?

This is known as One-Shot Learning.

Problem Setup

regression-example
  • Instead of learning to classify, the model learns similarity between pairs of images.
  • A distance function is trained to return a small value for the same person, and large for different people.


Siamese Network

A Siamese Network consists of two identical ConvNets (with shared weights) that compare two inputs.


Architecture Overview

  • Two inputs: and
  • Same CNN maps both to feature vectors and
  • A distance metric (e.g., L2 norm) is applied:

regression-example

Loss Function

A contrastive loss or triplet loss is used to train the network to minimize distances for same identities and maximize for different ones.



Triplet Loss

Triplet Loss is a powerful loss function for learning embeddings. It relies on triplets:

  • Anchor (A): A known image
  • Positive (P): Image of the same identity
  • Negative (N): Image of a different identity
regression-example

We want:

Where:

  • is the embedding function (ConvNet output)
  • is a margin to separate positive and negative pairs

Loss Function

The Triplet Loss is:


Important Notes

  • Semi-hard negative mining improves convergence (choose negatives that are hard but not too hard).
  • Embeddings are often normalized to unit length.


Face Verification and Binary Classification

Once we have embeddings from a trained network (e.g., using triplet loss), we can perform face verification as a binary classification task.


Verification Pipeline

  1. Encode both face images to embeddings.
  2. Compute Euclidean distance or cosine similarity.
  3. If distance < threshold same person.

Threshold is selected based on False Positive Rate vs. True Positive Rate using ROC curve on a validation set.



What is Neural Style Transfer?

Neural Style Transfer is the task of synthesizing an image that:

  • Preserves the content of a content image
  • Adopts the style of a style image

Leverage a pre-trained ConvNet (like VGG19) to extract content and style representations.

regression-example

Let:

  • be the content image
  • be the style image
  • be the generated image

Then we optimize to minimize a cost function:


What are Deep ConvNets Learning?

Deep ConvNets learn hierarchical representations:

regression-example
  • Early layers: edges, colors, textures
  • Mid layers: shapes, motifs
  • Later layers: object-level concepts

In NST, content is encoded in deeper layers, style in shallower layers.



Cost Function

The total cost is:

Where:

  • : weight for content preservation
  • : weight for style transfer
  • Typically: , to

Content Cost Function

Let and be activations at layer for the content and generated images.

Then content cost is:

Use a deeper layer (e.g., conv4_2) for this.


Style Cost Function

Style is captured by correlations between feature maps using a Gram matrix.

Let be the activations at layer for style image. Compute Gram matrix:

Style cost is:

Then sum over multiple layers:



1D and 3D Generalizations

1D Generalization

Neural style transfer principles can be applied to audio signals:

regression-example
  • 1D convolution over waveform
  • Preserve temporal content, apply style of another sound

3D Generalization

Applied to volumetric data such as:

regression-example
  • 3D MRI scans
  • 3D point clouds
  • Transfer spatial styles across 3D volumes

These require 3D convolutional layers and custom Gram matrix calculations.


Summary

  • Face Recognition uses embedding learning (Triplet loss, Siamese networks).
  • One-shot learning enables models to generalize with limited data.
  • Neural Style Transfer uses a pre-trained CNN to blend content and style images using a combination of content/style loss.
  • Both applications showcase the expressive power of deep convolutional networks beyond classic classification.