Sunday, December 22, 2024

Building a Neural Network from the Ground Up – Part 2

Share

Enhancing Our Neural Network Library with Convolutional Neural Networks

In the realm of deep learning, convolutional neural networks (ConvNets or CNNs) have emerged as the gold standard for image classification and computer vision tasks. Their ability to automatically and adaptively learn spatial hierarchies of features from images has made them indispensable in various applications, from facial recognition to autonomous driving. In this article, we will explore how to enhance our neural network library by integrating a convolutional neural network structure, leveraging the power of GPUs and OpenCL to handle the increased computational demands.

Understanding the Structure of Convolutional Neural Networks

At their core, convolutional neural networks consist of three primary components: convolutional layers, pooling layers, and fully connected layers. While we have already implemented the fully connected layer in our library, our focus will now shift to the convolutional and pooling layers.

The Convolutional Layer

The convolutional layer is where the magic begins. It applies a set of learnable filters (or kernels) to the input image, producing a feature map that highlights various aspects of the image. Each filter is designed to capture specific features, such as edges, textures, or patterns.

Here’s a simplified version of the kernel code for the convolution operation:

kernel void convolve(global float *image, global Filter* filters, global float *featMap, int filterWidth, int inWidth, int featmapdim) {
    const int xIn = get_global_id(0);
    const int yIn = get_global_id(1);
    const int z = get_global_id(2);
    float sum = 0;

    for (int r = 0; r < filterWidth; r++) {
        for (int c = 0; c < filterWidth; c++) {
            sum += filters[z].weights[c * filterWidth + r] * image[(xIn + c) + inWidth * (yIn + r)];
        }
    }
    sum += filters[z].bias;
    featMap[(xIn + yIn * featmapdim + z * featmapdim * featmapdim)] = relu(sum);
}

In this code, we assume that each pixel in the feature map is calculated independently, allowing for parallel processing. For an input image of size 28×28 with a 5×5 kernel, we would need 576 threads to compute the feature map simultaneously.

Backpropagation in Convolutional Layers

Backpropagation in convolutional layers can be more complex than in fully connected layers due to the nature of convolution operations. The gradients must be calculated with respect to the filters and the input image. The equations governing this process can be intricate, but they fundamentally rely on the chain rule of calculus.

Here’s a simplified version of the backpropagation kernel:

kernel void backpropcnn(global float* featMap, global float* deltas, global Filter* filters, int featmapdim, int imagedim, int filterdim, float a, global float* Image) {
    const int xIn = get_global_id(0);
    const int yIn = get_global_id(1);
    const int z = get_global_id(2);
    float sum = 0;

    for (int r = 0; r < featmapdim; r++) {
        for (int c = 0; c < featmapdim; c++) {
            sum += deltas[(c + r * featmapdim + z * featmapdim * featmapdim)] * Image[(xIn + r) + imagedim * (yIn + c)];
        }
    }
    filters[z].weights[(xIn + filterdim * yIn)] -= a * sum;
}

The Pooling Layer

Pooling layers serve to downsample the feature maps, reducing their dimensionality while retaining essential information. The most common types of pooling are max pooling and average pooling, with max pooling being the preferred choice in most applications. In max pooling, a filter (typically 2×2) is applied to the feature map to extract the maximum value within the filter window.

Here’s a simple implementation of the max pooling operation:

kernel void pooling(global float* prevfeatMap, global float* poolMap, global int* indexes, int Width, int pooldim) {
    const int xIn = get_global_id(0);
    const int yIn = get_global_id(1);
    const int z = get_global_id(2);
    float max = 0;
    int index = 0;

    for (int r = 0; r < 2; r++) {
        for (int c = 0; c < 2; c++) {
            if (prevfeatMap[(yIn + c) * Width + (xIn + r)] > max) {
                max = prevfeatMap[(yIn + c) * Width + (xIn + r)];
                index = c * 2 + r;
            }
        }
    }
    poolMap[(xIn + yIn * pooldim + z * pooldim * pooldim)] = max;
    indexes[(xIn + yIn * pooldim + z * pooldim * pooldim)] = index;
}

Backpropagation in Pooling Layers

Unlike convolutional layers, pooling layers do not require complex gradient calculations. Instead, we simply need to propagate the gradient to the "winning unit" identified during the forward pass. This is achieved using an index matrix that tracks the positions of the maximum values during max pooling.

Running the Kernels

To execute the kernels, we follow these steps:

  1. Prepare the Data: Convert the input data into matrix format, which can be done using libraries like OpenCV.
  2. Define Buffers and Kernels: Set up the necessary OpenCL buffers and kernels.
  3. Execute the Kernels: Use the following code to run the kernels:
convKern.setArg(0, d_InputBuffer);
convKern.setArg(1, d_FiltersBuffer);
convKern.setArg(2, d_FeatMapBuffer);
convKern.setArg(3, filterdim);
convKern.setArg(4, inputdim);
convKern.setArg(5, featmapdim);
err = (OpenCL::clqueue).enqueueNDRangeKernel(convKern, cl::NullRange, cl::NDRange(featmapdim, featmapdim, convLayer.numOfFilters), cl::NullRange);

poolKern.setArg(0, d_FeatMapBuffer);
poolKern.setArg(1, d_PoolBuffer);
poolKern.setArg(2, d_PoolIndexBuffer);
poolKern.setArg(3, featmapdim);
poolKern.setArg(4, pooldim);
err = (OpenCL::clqueue).enqueueNDRangeKernel(poolKern, cl::NullRange, cl::NDRange(pooldim, pooldim, convLayer.numOfFilters), cl::NullRange);

Conclusion

Building a neural network library from scratch and implementing convolutional neural networks is a challenging yet rewarding endeavor. By understanding the underlying principles of matrix operations and how GPUs function, we can create powerful models capable of tackling complex tasks in image classification and beyond.

For those interested in exploring the complete code, it is available on my GitHub repository: Neural Network Library.

Further Reading

If you’re eager to dive deeper into the world of deep learning, consider checking out the book "Deep Learning in Production," which covers building, training, deploying, and maintaining deep learning models with practical examples. Learn more.

Disclosure: Some links in this article may be affiliate links, and at no additional cost to you, I may earn a commission if you decide to make a purchase after clicking through.

Read more

Related updates