NPB 261B - Lab #2

Due Monday, February 21

In this assignment you will investigate the power spectrum and higher-order statistics of natural images, and you will examine some models for how neurons in the visual system may have been adapted to this structure.

Power spectrum

The first step is to measure the power spectrum of natural images.  Get some images, either with a digital camera, or over the internet, etc.  Then, load an image into Matlab using the imread command, which can read images of  many different formats.  To see how to use the imread command and what types of images it can read, type help imread.  For example, to read the einstein image from the last assignment you type
It will be easiest for this exercise if the image is square, so if its not already square you should truncate the largest dimension.  Now, to compute the power spectrum of the image you use the Fourier transform, which converts a function in the space domain to a function in the frequency domain.  You can compute the Fourier transform of an image in Matlab using fft2 as follows:
The function fftshift is needed to put the DC component (frequency = 0) in the center.  The power spectrum is just the square of the modulus of the Fourier transform,  which is obtained as follows:
To display the power spectrum you will need to define some frequency coordinates.  If the image is of size N, then the frequencies run from -N/2 to N/2-1 (assuming N is even):
Then display the log power spectrum using imagesc:
imagesc(f,f,log10(impf)), axis xy
You will note that the power falls off with frequency as you move away from the center.  Ignore the vertical and horizontal streak for now - its an artifact due to the boundaries of the image.  Now, to get a better idea of how the power falls off, we can do a rotational average of the power spectrum:
Now define a frequency coordinate from 0 to N/2:
and plot the rotationally averaged spectrum on a log-log plot:
You should see that the power tends to fall as 1/f2 - approximately with a slope of -2 on a log-log plot.  You should try doing this for a couple different images. Try to find an image which deviates strongly from this trend and see if you can explain what's happening.


Now let's whiten the image by designing a filter in the frequency domain that will flatten the spectrum of a natural image (on average).  Since the power spectrum tends to fall as 1/f2, then the amplitude spectrum will fall as 1/f.  So we want to design the amplitude spectrum of our filter so that it rises linearly with frequency, to compensate for the 1/f amplitude spectrum of natural images.  To do this we need to first define a grid of frequency coordinates:
[fx fy] = meshgrid(f);
Then convert to polar coordinates:
[theta rho]=cart2pol(fx,fy);
The filter is then
filtf = rho;
Plot the filter using
imagesc(f,f,filtf), axis xy
and you will see that it is a function that simply rises with frequency, independent of orientation.  But remember that due to noise, we don't want to whiten all the way to the highest spatial-frequencies.  Also, because we are working in rectangular coordinates, there is a larger range of spatial frequencies along the diagonals, so we'd like to contrict the whitening function to lie within a constant radius just short of the Nyquist frequency.  To do this, we multiply by a circular, Gaussian filter as follows:
filtf = rho.*exp(-0.5*(rho/(0.7*N/2)).^2);
Now plot out the filter again and you will see that it rises linearly to a point and then begins to fall again.  Take a rotational average and plot as a function of one dimension and you will see more clearly how the filter is shaped.  To see what the filter looks like in the space domain, you need to calculate the inverse Fourier transform:
filt = fftshift(real(ifft2(fftshift(filtf))));
You can view the filter as follows:
imagesc(filt,[-1 1]*max(filt(:)))
axis image, colorbar

You will need to zoom into the center with the magnifying glass to see the filter.  Note that it is basically a center-surround type filter.  The inhibitory surround may be a bit hard to see because they are small negative numbers, but its basically analogous to what is found in the retina and LGN.  To get the full scoop on this story you should read the following paper:
Atick JJ (1992)  Could information theory provide an ecological theory of sensory processing?  Network, 3, 213-251
Now let's see what happens when we apply our whitening filter to the image.  Remember that convolution in the space domain is multiplication in the frequency domain, so we simply multiply the spectrum of the image with the spectrum of the filter and take the inverse fft as follows:
imwf = filtf.*imf;
imw = ifft2(fftshift(imwf));
View the image
imagesc(imw), axis image
colormap gray

And look at its spectrum
Note that the spectrum is now pretty much flat, with simply a lowpass roll-off at the high end of the spectrum.

There's one thing left to do: contrast normalization.  It has been shown that neurons in the retina and LGN already do some form of local contrast normalization.  One simple model of this process is to divide the output of a neuron by some measure of the total activity in the neighborhood - for example, standard deviation.  Here's one way of doing this using a Gaussian neighborhood with a diameter of 16 sample nodes:
[x y] = meshgrid(-D/2:D/2-1);
G = exp(-0.5*((x.^2+y.^2)/(D/2)^2));
G = G/sum(G(:));
imv = conv2(imw.^2,G,'same');
imn = imw./sqrt(imv);

Now look at the contrast normalized image imn.  You will note that it brings out many of the details in the image.

Sparse coding

You will see that the whitened image, constrast normalized image above that it is far from being structureless.  The reason is that whitening is simply a process for removing pairwise, linear correlations.  Structures such as lines and edges are characterized by their higher-order statistics - i.e., non-linear statistical dependencies among three or more pixels - and so these forms of structure will remain even after whitening.  So the idea is that the cortex focusses on representing these higher-order forms of structure that are contained in the images coming from the LGN.  One way of learning higher-order structure is through sparse coding - i.e., by finding a lossless representation of the image in which only a few neurons need to be active at any given point in time.  It is closely related to independent components analysis (ICA) and projection pursuit.  Here we will follow the connection to projection pursuit.  
The goal of projection pursuit is to find a low dimensional projection of the data that yields a non-Gaussian distribution.  In general, any random projection of the data will yield a Gaussian distribution.  But when the linear subspace defining the projection is aligned with the structure of the data, then the projection will be non-Gaussian.  Sparse coding can be thought of as a special case of projection pursuit in which the form of non-Gaussianity is sparse - i.e., a distribution peaked at zero with heavy tails with respect to a Gaussian.  This means that the value of the projection is typically around zero, but not in the trivial sense that it simply has low variance.  A population of neurons with such distributions would exhibit a sparse code, because only a small fraction of the neurons will be active.  
We can think of a cortical simple cell as computing a one-dimensional projection of the data if its instantaneous activity is essentially the result of computing the inner-product between its weight vector (receptive field) and the image.  So, let's try designing different receptive fields for a model cortical neuron and see what their projections look like.  For starters, let's make the receptive field an array of weight values with the same size D as the neighborhood defined above for contrast normalization.   We can simulate the output of such a neuron when the image is scanned over its receptive field by convolving the rf with the image.  For example, consider a neuron with initially random receptive field weights, w:
w = randn(D);
w = w/sqrt(sum(w(:).^2));  % normalizes the weight vector
w = w - mean(w(:));  % zero mean
You can do the convolution as follows:
imfilt = conv2(imn,w,'same');
Note that we use the contrast-normalized, whitened image, because we are simulating a weighted sum of LGN inputs to the cortex.  Look at the resulting image as viewed through the rf of this neuron using imagesc.  Then compute the histogram using the hist function:

Then compute a Gaussian of the same variance

and plot both on a semilog plot as follows:
(You may have to use the axis command to get the dynamic range right.)  You should see that with random weights the distribution is more or less Gaussian - i.e., an upside-down parabola.  
A random projection of the data (random receptive field) yields a Gaussian distribution, as predicted from projection pursuit.  So, what kind of receptive field yields a non-Gaussian distribution?  Let's try a Gabor filter model of a cortical simple cell.  A Gabor filter is simply a Gaussian modulated sinusoid.  We can make a vertically oriented Gabor as follows:
sigma_x = D/8;
sigma_y = D/8;
f_x = 1/(D/4);
g = exp(-0.5*((x/sigma_x).^2 + (y/sigma_y).^2)).*sin(2*pi*f_x*x);

g = g/sqrt(sum(g(:).^2));  % normalization
You can see what this looks like using imagesc.  Then convolve with the contrast normalized image:
imfilt = conv2(imn,g,'same');
Now look at the resulting image as viewed through the rf of this neuron, and plot the histogram and compare to a Gaussian as before.  You should see that it is now distinctly non-Gaussian, and the specific way in which it is non-Gaussian is such that it is peaked at zero with heavy tails.  Another way of saying this is that the neuron spends more time at zero than expected from a Gaussian distribution.  Try different parameter settings for the Gabor filter to see if you can either accentuate or degrade the degree of non-Gaussianity (sparsity) exhibited the model neuron.

For further reading on these issues see
Simoncelli EP, Olshausen BA  (2001)  Natural Image Statistics and Neural Representation.  Annual Review of Neuroscience, 24, 1193-1215.

What you should turn in

You should turn in a lab writeup that contains:
  1. A set of images and their corresponding power spectra, with at least one that demonstrates strong deviation from the 1/f2 trend, along with an explanation of why.
  2. The whitened, contrast-normalized image computed for a "good" 1/f2 image above.  Use this same image for parts 3-5 below.
  3. The resulting response histogram, plus super-imposed Gaussian fit, obtained using the random receptive field model.
  4. The resulting response histogram, plus super-imposed Gaussian fit, obtained using a Gabor filter for which you have optimized the parameters for sparsity.  Show the Gabor and report what parameters you used. 
  5. Same thing for Gabor filter parameters that minimize sparsity.
Make sure to use subplot to put multiple plots in one figure.  Don't turn in one plot per page.  Make your report legible.