0:00

In this section we'll be addressing how information theoretic ideas can help us

to understand how the neural code may be specially adapted to the structure of

natural signals. We'll briefly first look at some of the

special properties of natural inputs. And then some theories of how code should

behave. Finally we'll sum up with some

suggestions from the principles that may be at work in shaping the neural code.

So I'm going to show you some photos that we taken by one of our Post-Docs, Fred

Sue, as he was sitting in his apartment on one of our typical sunny Seattle

afternoons, looking out at the view. He tried to take a picture that both

encompassed his beautifully furnished apartment and the grand view outside.

You can see that he had to change his f stop over a wide range in order to be

able to capture information both about the scene inside and about the world

outside. Now this is something that our eye does

effortlessly. If you were sitting here at this table,

you would be able to see both the inside and the outside with perfect fidelity.

So looking even at this familiar example, we can see two properties that are

characteristic of natural inputs. One is that there's a huge dynamic range.

There are variations in light level and contrast that range over orders of

magnitude. We can see signs of another property by

comparing these two boxes. Because of effects of depth and

perspective, there's similar structure, similarly well defined shapes and objects

at very different length scales. This is reflected in the power spectrum

of natural images. If one computes the power in different

spatial frequency components this function has a, this function has a power

log form. That is it scales, like the frequency, to

the power minus two. This reflects the lack of any

characteristic scale. The similar structure are not.

Despite these scale differences and the very large variations in light and

contrast across the image, we'd like to be able to distinguish detail at every

point in it. Unlike this camera.

These basic issues arise for almost all of our senses.

Here's an audio track of a chunk of speech.

The signal is full of complex fluctuations that carry detailed

information about pitch and nuance. However, these fast variations are

modulated by the relatively huge variations in amplitude that make up the

envelope of speech. We're perfectly capable of understanding

all of these signal components regardless of the overall amplitude, even when there

are multiple speakers, or they're far away.

So how can a neural system, with a limited range of responses, manage to

convey the relevant information about details in the face of these huge

variations of scale? We found that the entropy, we found that

the entropy, reached it's maximum, when it was of the form of these two symbols

were used equally often. Now if we're thinking about maximizing

the mutual information. We also have to take into account this

noise term. But generally the amount of noise for a

given stimulus may not be something that's easily controlled.

While the total response entropy is something that's in the hands of the

coder. Let's see how.

Let's imagine that the stimulus that a system needs to encode Is varying in

time, this is s of t, it has some distribution, p of s over here.

Our job as an encoder is to map the stimulus onto the symbols that we have at

our disposal. Let's imagine that we're constrained to

use some maximal firing rate, so we have some limited range of possible symbols at

our disposal, say zero to 20 hertz. How should we organize that mapping so

that we end up with the most efficient code?

We'll get the most information by maximizing our output entropy.

That is, by using all of our symbols about equally often.

So what does that imply for the shape of this curve?

So what we should do is move along our stimulus distribution and encode equal

shares of that distribution with each symbol.

If we have 20 symbols lets count up 1 20th of our total area under this curve,

and assign that to symbol one. What this amounts to is a response curve

that's given by the cumulative integral of the stimulus distribution.

Another name for this is histogram equalization.

4:26

So this implies that for a good coding system, its input output function, this

function here, should be determined by the distribution of natural inputs.

So here's a classic study in which this idea was tested directly.

In the early 1980's, Simon Laughlin went out into the fields with a camera, and

measured the typical contrasts, that is deviations in the light level, divided by

the mean light level, that would be experienced in the natural world, for

example, by a fly. So, that's this distribution here.

If the response does indeed follow the distribution of natural inputs...

Then the response curve, here, should look like the cumultive probability

determined by integrating p of c. And in fact, that's a very good match to

what he did actually observe in the response properties of the fly large

mono-polar cells, the neurons that integrate signals from the fly's

photo-receptors. Now, a study like this poses a challenge.

While it makes sense that our sensory systems would, over evolution or

development, set up response codes that are adjusted to natural input statistics.

It seems that much more work is needed to handle the problems posed by this huge

natural variation, that stimuli take as one moves from indoors to outdoors or

even moves one's eyes around a room. The contrast distribution is varying

widely. Might sensory systems rather adjust

themselves on much shorter timescales to take these statistical variations into

account. So let's take a patch of the image, and

look at the, the variations in contrast in that image.

Here for example, that contrast distribution might take, might be narrow

like this. Wheras over here, it might be much

broader. What our code should do is take the

widths of these distributions into account in setting up a local.

Input, output curve, that accommodates this structure of the, currently measured

statistics of the input. So that's the question that we tested

here, in the h1 neuron. In this experiment, we took a white-noise

input, of the type that you used in the problem sets, so some s of t.

Looks like that. And we multiplied it by some time

varying, slowing time varying envelope. Call that sigma of t.

And that's what you see here. So we repeated the same sigma of t.

This is a 90 second long chunk of stimulus.

Repeated the same sigma of t. In every trial, but we changed the

specific white noise. Stimulus.

And that allowed us to pick out spikes that occurred at different time points

throughout this presentation of, of sigma of t, where in every trial the cell would

have seen a different specific stimulus. And to calculate the input output

function described by those spikes, in those different, in those different

windows of time. So now one, when one analyzes spikes

across these different windows, and pulls out their input output function using the

methods that we talked about in week two, one finds that for example, here in this

window, one gets a very broad input and output curve.

Where, when the stimulus is varying very little, one finds a very sharp input and

output curve. Now, it turns out that if one normalizes

the stimulus by its standard deviation, or by this envelope sigma of t, all of

these curves collapse onto the same curve.

What that says is that the code has the freedom to stretch its input access such

that it's accommodating these variations in the overall scale of the stimulus.

And it's able to do that in real time as this envelope is varying.

This is being seen in several other systems, including the retna and the

auditory system. But here's an example from rat barrel

cortex. This is somatosensory cortex of the rat.

In particular. The part that encodes the vibrations of

whiskers. So, from extracellular in vivo recordings

of responses to whisker motion, whiskers were stimulated with a velocity signal

again, s of t, that looked like this. So this is a slightly simpler experiment.

The standard deviation was varied between two different values.

And now one can pull out spikes that are generated in these two epochs that

presentation. The high variance case and the low

variance case. And one can compute input output curves

for spikes that occurred under these two different conditions.

9:00

So in the low-variance case, one sees this input output curve, in the

high-variance case, one sees this input output curve.

And hopefully you won't be surprised that if I now divide the stimulus.

By its standard deviation, we now see a common curve.

So now we see again that this input output curve has the freedom to stretch

itself such that its able to encode stimuli in their natural dynamic range.

So what I've shown you is that as one changes the characteristics of the

stimulus. In this case, in the cases we've talked

about, by changing its overall amplitude, changes can occur in the input output

function. So here we've found that if a stimulus

say, took on this dynamic range, it might be encoded with an input output curve

like that. Now you should be able to see that if one

increased the range of the stimulus and stayed with that same input output curve.

Most of the time, your stimuli would be giving responses that were even zero or

at saturation point. Similarly, if you now decrease the range

of the stimulus you'd be hovering at the central part of the curve.

So, ideally one would like to use one's entire dynamic brain by defined by this

input output curve. And so, one would like to match it to the

range of the stimulus. And that's exactly what we saw in the

experiments. Now this adaptive representation of

information is not confined to change us in the input output function.

It's also been seen that changes can happen in the feature as the statistics

of the inputs are changed. The feature that's selected by a neural

system can also adapt to changes in the stimulus statistics.

And information theory has also been used to explain the way in which this occurs.

For example it's been used to explain how the spatial filtering properties of

neurons in retina, and in LGN change with light level.

Joe Addick and his colleagues pose the following question: If we consider that

the retina imposes a linear transfer function, or a filter on its inputs,

what's the shape of that filter that maximizes information transmission

through the retina? The solution turns out to depend on two

things. The powers spectrum of natural images and

the signal-to-noise ratio. At high light levels, or high signal to

noise, one would predict a filter shape like the one we've seen already, the

Mexican hat shape. This acts like a differentiator, looking

for edges of the stimulus, but at low light levels, the predicted optimal

filter is integrating, and simply averages its inputs to reduce noise.

And indeed in retinal receptive fields it's seen that the surround becomes

weaker at low-light levels and the center braoder which qualatatively matches these

predictions. We can also use information theory to

find out what it is about a stimulus that drives a neuron to fire.

We looked at this method in week two. In this case, this is called the, the

method of maximally informative dimensions.

One can choose a filter, so one can extract from the stimulus some component

that maximizes the Colbeck-Libler Divergence between the spike conditional

and the prior distributions. This turns out to be equivalent to

maximizing the information that the spike provides about the stimulus.

One can use this method to search for the optimal feature that explains the coating

properties of a system. When it's being presented with stimuli of

a particular distribution. Distribution.

So for example if one initially starts with a Gaussian white noise distribution,

that's a Gaussian, that's vertical Gaussian, in this, in this

representation. One might find a particular feature.

But now if one changes the distribution to say natural images, which will have

some very different distribution. The filter that maximizes the, the

information between spike and stimulus maybe different and that's being shown to

be the case for cortical receptive fields among other systems.

13:01

So, finish up by discussing briefly an influential idea that Ragesh mentioned in

the first lecture. That might explain my cortical receptor

fields have the shape that they do. Many years ago, Horace Barlow proposed

that because because spikes are expensive, neural should be trying to

encode systems as efficiently as possible.

What does this mean for a popular of neurons?

If you consider the joint distribution of the responses of many neurons, here lets

just take two. Maximizing their entropy should imply

that they code independently. That is their joint distribution should

factor into the product of the two marginal distributions.

This is a strategy that would maximize their entropy.

Why is that? Because the entropy of a joint

distribution is always less then or equal to the entropy of the distributions of

the marginals added together. So this idea is known as redundancy

reduction. The neural system should be optimized to

perform as independently as possible. However in the past years, it's been

realized that correlations between neurons can have some advantages.

For one. Having many neurons that encode the same

thing may allow for error correction and more robust coding.

It's also been realized that correlations can actually help discrimination, and

indeed, neurons in the retina have been observed to be redundant.

That is, that their joint distribution is very different from the product of

independent distribution. More recently, Barlow proposed a new

idea, that neuron populations should be as sparse as possible.

That is that their coding properties should be organized so that as few

neurons as possible are firing at any time.

14:38

This idea was developed formally by Olshausen and Field, and also Bell and

Sejnowski. Here's the idea.

Let's say that one can write down a set of basis functions, phi i, with which to

reconstruct a natural scene. Then any image can be expressed as a

weighted sum, with coefficients ai over these basis functions with perhaps the

addition of some noise. Now this basis function should be chosen

so that as few coefficients ai as possible are needed in general to

represent an image. This is carried out by minimizing a

function that includes the reconstruction error.

So here, the root mean squared difference between the reconstructed image and the

image itself. So that one gets a good match to the

images, but that also includes a cost term, whose role, whose role is to count

how many coefficients are needed, so one simple choice of this cost function, is

just the absolute value of these coefficients.

[INAUDIBLE]. The coefficient lambda, weights the

strength of that constraint. The job of this term is to penalize

solutions that require too many basis functions to represent an image.

Too many coefficients ai, that are, that are different from zero.

A fourier basis for instance, represents the images as a sum of signs and cosines.

While the fourier basis is guaranteed to be able to represent any image.

One might already be able to guess that coding with such a basis is not sparse.

Because, as you probably recall, the power spectrum is broad, which means,

that many coefficients are needed. When one runs an algorithm to find the

best basis functions, the best values of phi i, for natural images, one finds,

instead, a set of functions that look like this, like localized oriented

features, like those that we see in v one.

So this implies that when we view an image using neuronal receptive fields

that look like this, this excites on average a minimal number of neurons.

This is called a sparse code. So we've touched upon several different

ideas about coding principles. The idea of coding efficiency, that

neural codes should represent input stimuli as efficiently as possible.

We've seen that this implies adaptation to stimulus statistics.

As one changes the statistics of the stimulus, one should see aspects of the

coding model changing to ensure that it remains efficient.

We've also brought up the idea of sparseness.

That it would be ideal if the neural code needed as few neurons as possible to

represent its input. And this brings us to the end of our

discussion of coding. I've shown you some classic and state of

the art methods for predicting how stimuli are encoded in spikes.

We've seen models for decoding stimuli from neural.

Responses. We've discussed information theory and

how it's used to evaluate coding schemes, and we've taken a very quick glance at

how coding strategies might be shaped by the statistics of natural inputs.

There's a lot that we've missed. In particular, let's just go through the,

the typical cycle of behavior of an organism.

Where we have invested some time is the idea, that we go from complex

environments, animals extract some features from that environment to solve

problems, and that's represented in neural activity.

What the brain is then doing is extracting that information and

synthesizing it to drive decisions. We talked about some examples of using

maximum likelihood methods that might in fact have neural implementation.

These decisions then generate motor activity which drives behavior.

Muscles work together to perform actions that drive behavior output.

And these actions effect subsequent sensation.

So, we didn't really address any of this part of the, of the behavioral feedback

loop. Next week, we'll be moving onto a new

topic. Rather than handling data analysis, we'll

be moving more into the realm of modeling.

And we'll start that with a brief introduction to the bio-physics of

coding. How do single neurons generate action

potential. We'll talk about neuronal excitability.

And we'll close up with some simplified models that capture neuronal firing

before moving on to the second part of the course where you'll be learning about

network modeling. So that's all for this week.

Looking forward to seeing you next week.