Tech Wavo
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock
Tech Wavo
No Result
View All Result

Focal Loss vs Binary Cross-Entropy: A Practical Guide for Imbalanced Classification

Tech Wavo by Tech Wavo
November 18, 2025
in News
0


Binary cross-entropy (BCE) is the default loss function for binary classification—but it breaks down badly on imbalanced datasets. The reason is subtle but important: BCE weighs mistakes from both classes equally, even when one class is extremely rare. 

Imagine two predictions: a minority-class sample with true label 1 predicted at 0.3, and a majority-class sample with true label 0 predicted at 0.7. Both produce the same BCE value: −log(0.3). But should these two errors be treated equally? In an imbalanced dataset, definitely not—the mistake on the minority sample is far more costly. 

This is exactly where Focal Loss comes in. It reduces the contribution of easy, confident predictions and amplifies the impact of difficult, minority-class examples. As a result, the model focuses less on the overwhelmingly easy majority class and more on the patterns that actually matter. Check out the FULL CODES here.

In this tutorial, we demonstrate this effect by training two identical neural networks on a dataset with a 99:1 imbalance ratio—one using BCE and the other using Focal Loss—and comparing their behavior, decision regions, and confusion matrices. Check out the FULL CODES here.

Installing the dependencies

pip install numpy pandas matplotlib scikit-learn torch

Creating an Imbalanced Dataset

We create a synthetic binary classification dataset with a 99:1 imbalance with 6000 samples using make_classification. This ensures that almost all samples belong to the majority class, making it an ideal setup to demonstrate why BCE struggles and how Focal Loss helps. Check out the FULL CODES here.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim

# Generate imbalanced dataset
X, y = make_classification(
    n_samples=6000,
    n_features=2,
    n_redundant=0,
    n_clusters_per_class=1,
    weights=[0.99, 0.01],   
    class_sep=1.5,
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
X_test  = torch.tensor(X_test,  dtype=torch.float32)
y_test  = torch.tensor(y_test,  dtype=torch.float32).unsqueeze(1)

Creating the Neural Network

We define a simple neural network with two hidden layers to keep the experiment lightweight and focused on the loss functions. This small architecture is sufficient to learn the decision boundary in our 2D dataset while clearly highlighting the differences between BCE and Focal Loss. Check out the FULL CODES here.

class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(2, 16),
            nn.ReLU(),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.layers(x)

Focal Loss Implementation

This class implements the Focal Loss function, which modifies binary cross-entropy by down-weighting easy examples and focusing the training on hard, misclassified samples. The gamma term controls how aggressively easy samples are suppressed, while alpha assigns higher weight to the minority class. Together, they help the model learn better on imbalanced datasets. Check out the FULL CODES here.

class FocalLoss(nn.Module):
    def __init__(self, alpha=0.25, gamma=2):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, preds, targets):
        eps = 1e-7
        preds = torch.clamp(preds, eps, 1 - eps)
        
        pt = torch.where(targets == 1, preds, 1 - preds)
        loss = -self.alpha * (1 - pt) ** self.gamma * torch.log(pt)
        return loss.mean()

Training the Model

We define a simple training loop that optimizes the model using the chosen loss function and evaluates accuracy on the test set. We then train two identical neural networks — one with standard BCE loss and the other with Focal Loss — allowing us to directly compare how each loss function performs on the same imbalanced dataset. The printed accuracies highlight the performance gap between BCE and Focal Loss.

Although BCE shows a very high accuracy (98%), this is misleading because the dataset is heavily imbalanced — predicting almost everything as the majority class still yields high accuracy. Focal Loss, on the other hand, improves minority-class detection, which is why its slightly higher accuracy (99%) is far more meaningful in this context. Check out the FULL CODES here.

def train(model, loss_fn, lr=0.01, epochs=30):
    opt = optim.Adam(model.parameters(), lr=lr)

    for _ in range(epochs):
        preds = model(X_train)
        loss = loss_fn(preds, y_train)
        opt.zero_grad()
        loss.backward()
        opt.step()

    with torch.no_grad():
        test_preds = model(X_test)
        test_acc = ((test_preds > 0.5).float() == y_test).float().mean().item()
    return test_acc, test_preds.squeeze().detach().numpy()

# Models
model_bce = SimpleNN()
model_focal = SimpleNN()

acc_bce, preds_bce = train(model_bce, nn.BCELoss())
acc_focal, preds_focal = train(model_focal, FocalLoss(alpha=0.25, gamma=2))

print("Test Accuracy (BCE):", acc_bce)
print("Test Accuracy (Focal Loss):", acc_focal)

Plotting the Decision Boundary

The BCE model produces an almost flat decision boundary that predicts only the majority class, completely ignoring the minority samples. This happens because, in an imbalanced dataset, BCE is dominated by the majority-class examples and learns to classify nearly everything as that class. In contrast, the Focal Loss model shows a much more refined and meaningful decision boundary, successfully identifying more minority-class regions and capturing patterns BCE fails to learn. Check out the FULL CODES here.

def plot_decision_boundary(model, title):
    # Create a grid
    x_min, x_max = X[:,0].min()-1, X[:,0].max()+1
    y_min, y_max = X[:,1].min()-1, X[:,1].max()+1
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 300),
        np.linspace(y_min, y_max, 300)
    )
    grid = torch.tensor(np.c_[xx.ravel(), yy.ravel()], dtype=torch.float32)
    with torch.no_grad():
        Z = model(grid).reshape(xx.shape)

    # Plot
    plt.contourf(xx, yy, Z, levels=[0,0.5,1], alpha=0.4)
    plt.scatter(X[:,0], X[:,1], c=y, cmap='coolwarm', s=10)
    plt.title(title)
    plt.show()

plot_decision_boundary(model_bce, "Decision Boundary -- BCE Loss")
plot_decision_boundary(model_focal, "Decision Boundary -- Focal Loss")

Plotting the Confusion Matrix

In the BCE model’s confusion matrix, the network correctly identifies only 1 minority-class sample, while misclassifying 27 of them as majority class. This shows that BCE collapses toward predicting almost everything as the majority class due to the imbalance. In contrast, the Focal Loss model correctly predicts 14 minority samples and reduces misclassifications from 27 down to 14. This demonstrates how Focal Loss places more emphasis on hard, minority-class examples, enabling the model to learn a decision boundary that actually captures the rare class instead of ignoring it. Check out the FULL CODES here.

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

def plot_conf_matrix(y_true, y_pred, title):
    cm = confusion_matrix(y_true, y_pred)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm)
    disp.plot(cmap="Blues", values_format="d")
    plt.title(title)
    plt.show()

# Convert torch tensors to numpy
y_test_np = y_test.numpy().astype(int)

preds_bce_label   = (preds_bce > 0.5).astype(int)
preds_focal_label = (preds_focal > 0.5).astype(int)

plot_conf_matrix(y_test_np, preds_bce_label, "Confusion Matrix -- BCE Loss")
plot_conf_matrix(y_test_np, preds_focal_label, "Confusion Matrix -- Focal Loss")

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.
Previous Post

The highly-rated Nikon Z5 II just dropped to a record-low price at Adorama

Next Post

Interplay co-founder Rebecca Heineman dies

Next Post
Interplay co-founder Rebecca Heineman dies

Interplay co-founder Rebecca Heineman dies

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This Logitech G522 Lightspeed pre-Black Friday deal takes one of our most highly rated gaming headsets down to its lowest price ever

by Tech Wavo
November 19, 2025
0
This Logitech G522 Lightspeed pre-Black Friday deal takes one of our most highly rated gaming headsets down to its lowest price ever
Computers

Gaming headset deals are starting to heat up as we approach Black Friday. The Logitech G522 Lightspeed Wireless Gaming Headset...

Read more

Over 60 organizations call on governments worldwide to protect encryption

by Tech Wavo
November 19, 2025
0
Over 60 organizations call on governments worldwide to protect encryption
Computers

60+ groups are calling for governments to protect encryptionSignatories warn that backdoors create systemic security risksThe pledge comes at a...

Read more

Call of Duty: Black Ops 7 review: superb multiplayer meets a sloppy campaign

by Tech Wavo
November 19, 2025
0
Call of Duty: Black Ops 7 review: superb multiplayer meets a sloppy campaign
Computers

Why you can trust TechRadar We spend hours testing every product or service we review, so you can be sure...

Read more

My ultimate Mac mini M4 accessories guide – 9 must-have add-ons for your mini

by Tech Wavo
November 19, 2025
0
My ultimate Mac mini M4 accessories guide – 9 must-have add-ons for your mini
Computers

The Apple Mac mini M4 is one of the biggest bargains in computing, offering a huge amount of power for...

Read more

Site links

  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use

No Result
View All Result
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock