1

Image Pre-processing for the Neural Network

[Unlike other chapters, this chapter is not directly related to the neural network itself. This chapter pertains to implementation details of the particular application discussed in this article.]

In the last chapter we discussed the input layer of the neural network. In this chapter we will discuss different modifications that we are going to make in the input image to make it a consistant input for the neural network. Take a look at the following input image:

Notice that the input image we have here is a very big one. To represent a character in the English alphabet there is no need to use such a big image. This 256*256 image means that the neural network input layer will have 65536 neurons. This huge number of nuerons will make the training phase very difficult. We have to reduce the size of the image before giving it as an input to the neural network. Also, you can see that the character is not exactly centrally aligned in the image. These small deviations in the placement of the character in the canvas will greatly reduce the accuracy of the results. So we have to centrally align the image (in the canvas) so that the input to the neural network is fairly consistant.

One advantage of reducing the size of the input image is that it reduces a lot of noise in the original image. Imagine a small extra pixel in the original image. This pixel will not be there in the reuced image. The reduced image will be showing just an approximation of the outline of the original image, thereby doing away with a lot of noise and jitter.

Another reason for image pre-processing is that there is a lot of blank space on all four sides of the image. This space is just unwanted noise for us and we would like to remove these space by cropping the image to the appropriate dimensions. A side-effect of this cropping is that after cropping the image is (by definintion) centally aligned.

Our original input image is drawn in a white background with a black pencil. After reducing the image size, many pixels may contain color values other than pure black or pure white. There will be many pixels with greyish values. So we introduce another operation called the cleaning of the image in which we make the reduced image pure black and white.

So in effect here are the manipulations that we are going to do on the image:

  1. Crop the image
  2. Resize the image
  3. Clean the image

Below you can see the implementation details of these image manipulations. You can go through them if you want. If you can implement these things by yourself, you can skip to the next chapter. The source codes given below use libraries from Microsoft Visual C++ 6.0. The source code provided is not complete. This means that copying and pasting the code won't work. The below sections are meant to give a general idea of some techniques that can be used to do the image manipulations.

Source code

Cropping the image

const int inputWindowSize=256;
CClientDC d(this);
COLORREF  INPUT_CLR[inputWindowSize][inputWindowSize] = {0};

//	Get Pixels from the picture to INPUT_CLR
for(i=inputWindowX1;i<inputWindowX2;i++){
	for(j=inputWindowY1;j<inputWindowY2;j++){
		INPUT_CLR[i-inputWindowX1][j-inputWindowY1] = d.GetPixel(i,j);
	}
}

// (x1,y1) and (x2,y2) will contain the dimensions of the cropped section
int x1 = inputWindowSize, x2 = 0, y1 = inputWindowSize, y2 = 0;

//Get the coordinates to crop the image
for(i=0;i<inputWindowSize;i++){
	for(j=0;j<inputWindowSize;j++){
		if(INPUT_CLR[i][j] < 0x00FFEEEE){
			INPUT_CLR[i][j] = 0x00000000;
			if(i<x1)
				x1=i;
			if(i>x2)
				x2=i;
			if(j<y1)
				y1=j;
			if(j>y2)
				y2=j;
		}else{
			INPUT_CLR[i][j]=0x00FFFFFF;
		}
	}
}

// Find the width of cropped image
int dx = x2-x1+1;
int dy = y2-y1+1;

Resizing the image

const int rFactor=32;						
const int scaledWindowSize=inputWindowSize/rFactor;
COLORREF  CLR[scaledWindowSize][scaledWindowSize] = {0};
COLORREF  PROCESSED_CLR[inputWindowSize][inputWindowSize] = {0};

if(dx>0)
{
	// First enlarge the cropped image to fit the original canvas
	float W_ratio = inputWindowSize / (float)dx; // Width ratio
	float H_ratio = inputWindowSize / (float)dy; // Height ratio

	for(i=0;i<inputWindowSize;i++){
		for(j=0;j<inputWindowSize;j++){
			PROCESSED_CLR[i][j] = INPUT_CLR[x1 + (int)(i / W_ratio)][y1 + (int)(j / H_ratio)];
		}
	}

	// Reduce the image
	for(i=0;i<inputWindowSize;i++){
		for(j=0;j<inputWindowSize;j++){
			CLR[i/rFactor][j/rFactor] += PROCESSED_CLR[i][j] / (rFactor*rFactor);
		}
	}
}

Cleaning the image

if(dx>0)
	{
		for(i=0;i<scaledWindowSize;i++)
		{
			for(j=0;j<scaledWindowSize;j++)
			{
				if(CLR[i][j] < 0x00FFEEEE){
					CLR[i][j]=0x00000000;
				}else{
					CLR[i][j]=0x00FFFFFF;
				}
			}
		}
}