Enhancing Our Neural Network Library with Convolutional Neural Networks

In the realm of deep learning, convolutional neural networks (ConvNets or CNNs) have emerged as the gold standard for image classification and computer vision tasks. Their ability to automatically and adaptively learn spatial hierarchies of features from images has made them indispensable in various applications, from facial recognition to autonomous driving. In this article, we will explore how to enhance our neural network library by integrating a convolutional neural network structure, leveraging the power of GPUs and OpenCL to handle the increased computational demands.

Understanding the Structure of Convolutional Neural Networks

At their core, convolutional neural networks consist of three primary components: convolutional layers, pooling layers, and fully connected layers. While we have already implemented the fully connected layer in our library, our focus will now shift to the convolutional and pooling layers.

The Convolutional Layer

The convolutional layer is where the magic begins. It applies a set of learnable filters (or kernels) to the input image, producing a feature map that highlights various aspects of the image. Each filter is designed to capture specific features, such as edges, textures, or patterns.

Here’s a simplified version of the kernel code for the convolution operation:

kernel void convolve(global float *image, global Filter* filters, global float *featMap, int filterWidth, int inWidth, int featmapdim) {
    const int xIn = get_global_id(0);
    const int yIn = get_global_id(1);
    const int z = get_global_id(2);
    float sum = 0;

    for (int r = 0; r < filterWidth; r++) {
        for (int c = 0; c < filterWidth; c++) {
            sum += filters[z].weights[c * filterWidth + r] * image[(xIn + c) + inWidth * (yIn + r)];
        }
    }
    sum += filters[z].bias;
    featMap[(xIn + yIn * featmapdim + z * featmapdim * featmapdim)] = relu(sum);
}

In this code, we assume that each pixel in the feature map is calculated independently, allowing for parallel processing. For an input image of size 28×28 with a 5×5 kernel, we would need 576 threads to compute the feature map simultaneously.

Backpropagation in Convolutional Layers

Backpropagation in convolutional layers can be more complex than in fully connected layers due to the nature of convolution operations. The gradients must be calculated with respect to the filters and the input image. The equations governing this process can be intricate, but they fundamentally rely on the chain rule of calculus.

Here’s a simplified version of the backpropagation kernel:

kernel void backpropcnn(global float* featMap, global float* deltas, global Filter* filters, int featmapdim, int imagedim, int filterdim, float a, global float* Image) {
    const int xIn = get_global_id(0);
    const int yIn = get_global_id(1);
    const int z = get_global_id(2);
    float sum = 0;

    for (int r = 0; r < featmapdim; r++) {
        for (int c = 0; c < featmapdim; c++) {
            sum += deltas[(c + r * featmapdim + z * featmapdim * featmapdim)] * Image[(xIn + r) + imagedim * (yIn + c)];
        }
    }
    filters[z].weights[(xIn + filterdim * yIn)] -= a * sum;
}

The Pooling Layer

Pooling layers serve to downsample the feature maps, reducing their dimensionality while retaining essential information. The most common types of pooling are max pooling and average pooling, with max pooling being the preferred choice in most applications. In max pooling, a filter (typically 2×2) is applied to the feature map to extract the maximum value within the filter window.

Here’s a simple implementation of the max pooling operation:

kernel void pooling(global float* prevfeatMap, global float* poolMap, global int* indexes, int Width, int pooldim) {
    const int xIn = get_global_id(0);
    const int yIn = get_global_id(1);
    const int z = get_global_id(2);
    float max = 0;
    int index = 0;

    for (int r = 0; r < 2; r++) {
        for (int c = 0; c < 2; c++) {
            if (prevfeatMap[(yIn + c) * Width + (xIn + r)] > max) {
                max = prevfeatMap[(yIn + c) * Width + (xIn + r)];
                index = c * 2 + r;
            }
        }
    }
    poolMap[(xIn + yIn * pooldim + z * pooldim * pooldim)] = max;
    indexes[(xIn + yIn * pooldim + z * pooldim * pooldim)] = index;
}

Backpropagation in Pooling Layers

Unlike convolutional layers, pooling layers do not require complex gradient calculations. Instead, we simply need to propagate the gradient to the "winning unit" identified during the forward pass. This is achieved using an index matrix that tracks the positions of the maximum values during max pooling.

Running the Kernels

To execute the kernels, we follow these steps:

Prepare the Data: Convert the input data into matrix format, which can be done using libraries like OpenCV.
Define Buffers and Kernels: Set up the necessary OpenCL buffers and kernels.
Execute the Kernels: Use the following code to run the kernels:

convKern.setArg(0, d_InputBuffer);
convKern.setArg(1, d_FiltersBuffer);
convKern.setArg(2, d_FeatMapBuffer);
convKern.setArg(3, filterdim);
convKern.setArg(4, inputdim);
convKern.setArg(5, featmapdim);
err = (OpenCL::clqueue).enqueueNDRangeKernel(convKern, cl::NullRange, cl::NDRange(featmapdim, featmapdim, convLayer.numOfFilters), cl::NullRange);

poolKern.setArg(0, d_FeatMapBuffer);
poolKern.setArg(1, d_PoolBuffer);
poolKern.setArg(2, d_PoolIndexBuffer);
poolKern.setArg(3, featmapdim);
poolKern.setArg(4, pooldim);
err = (OpenCL::clqueue).enqueueNDRangeKernel(poolKern, cl::NullRange, cl::NDRange(pooldim, pooldim, convLayer.numOfFilters), cl::NullRange);

Conclusion

Building a neural network library from scratch and implementing convolutional neural networks is a challenging yet rewarding endeavor. By understanding the underlying principles of matrix operations and how GPUs function, we can create powerful models capable of tackling complex tasks in image classification and beyond.

For those interested in exploring the complete code, it is available on my GitHub repository: Neural Network Library.

Building a Neural Network from the Ground Up – Part 2

Enhancing Our Neural Network Library with Convolutional Neural Networks

Understanding the Structure of Convolutional Neural Networks

The Convolutional Layer

Backpropagation in Convolutional Layers

The Pooling Layer

Backpropagation in Pooling Layers

Running the Kernels

Conclusion

Further Reading

Table of contents

rewrite this title How Purpose-Driven Entrepreneurs Are Changing the World

rewrite this title Neko Health Raises $260M to Expand AI-Powered Body Scans

rewrite this title FOMC Interest Rates Decision 2025: What It Means for Crypto

rewrite this title KLAS Names Top EHR Implementation Partners for Providers

rewrite this title Safemoon and Vine Are Trending Again – Are We Reviving the Ghosts of the Past?

Related updates

rewrite this title Six Feared Dead in Tragic Air Disaster

AI Summer: Document Clustering Techniques

Building a Neural Network from the Ground Up – Part 1

Deep Learning: A Promising Future or Just Another AI Buzzword?

3 Key Reasons Your Marketing Isn’t Working...

rewrite this title DYOR and Ava Labs...

MUSC Enhances Virtual Care in Charleston Schools...

rewrite this title How Purpose-Driven Entrepreneurs Are...

rewrite this title Neko Health Raises $260M...

rewrite this title FOMC Interest Rates Decision...