John O. Campbell
This post is an early draft of an article published as Universal Darwinism as a process of Bayesian inference.
Although Darwin must be counted amongst history’s foremost scientific geniuses he had very little talent for mathematics. His theory of natural selection was developed in great detail with many compelling examples but without a mathematical framework. Mathematics composed only a small part of his Cambridge education and in his autobiography he wrote:
Although Darwin must be counted amongst history’s foremost scientific geniuses he had very little talent for mathematics. His theory of natural selection was developed in great detail with many compelling examples but without a mathematical framework. Mathematics composed only a small part of his Cambridge education and in his autobiography he wrote:
I attempted mathematics, and even went during the summer of 1828 with a
private tutor (a very dull man) to Barmouth, but I got on very slowly. The work
was repugnant to me, chiefly from my not being able to see any meaning in the
early steps in algebra. This impatience was very foolish, and in after years I
have deeply regretted that I did not proceed far enough at least to understand
something of the great leading principles of mathematics, for men thus endowed
seem to have an extra sense.
Generally mathematics are an aid to scientific theories
because a theory whose basics are described through mathematical relationships
can be expanded into a larger network of predictive implications and the
entirety of the expanded theory subjected to the test of evidence. As a bonus
any interpretation of the theory must also conform to this larger network of
implications.
Natural selection describes the change in frequency or probability of biological traits over succeeding generations. One might suppose that a mathematical description complete with an insightful interpretation would be straightforward but even today this remains elusive. The current impasse involves conceptual difficulties arising from one of mathematics’ bitterest interpretational controversies.
That controversy is between the Bayesian and the Frequentist
interpretations of probability theory. Frequentists assume probability or
frequency to be a natural propensity of nature. For instance the fact that each
face of a dice will land with 1/6 probability is understood by frequentists to
be a physical property of the dice. On the other hand Bayesians understand that
humans assign probabilities to hypotheses on the basis of the knowledge they
have, thus the probability of each side of a dice is 1/6 because the observer
has no knowledge which would favour one face over the other; the only way that
no face is favoured is for each hypothesis to be assigned the same probability.
Frequentists have attacked the Bayesian interpretation of
probability on the grounds that the knowledge which a particular person has is ‘subjective’
and that mathematics and science only deals with objective phenomena. Bayesians
counter that their view is objective, as all observers with the same knowledge
must assign the same probability. As the great Bayesian theoretician E.T.
Jaynes put it (1) :
In the theory we are
developing, any probability assignment is necessarily “subjective" in the
sense that it describes only a state of knowledge, and not anything that could
be measured in a physical experiment. …. Now it was just the function of our
interface desiderata to make these probability assignments completely
“objective" in the sense that they are independent of the personality of
the user. They are a means of describing (or what is the same thing, of
encoding) the information given
in the statement of a problem, independently of whatever personal feelings
(hopes, fears, value judgments, etc.)
you or I might have about the propositions involved. It is “objectivity"
in this sense that is needed for a scientifically respectable theory of
inference.
The Bayesian framework is arguably more comprehensive and
has been developed into the mathematics of Bayesian inference, at the heart of
which is Bayes’ theorem describing how probabilistic models gain knowledge and
learn from evidence. In my opinion the major drawback of the Bayesian approach
is its anthropomorphic reliance on human agency.
Despite the lack of mathematics in Darwin’s initial
formulation of the theory it was not long before researchers began developing a
mathematical framework describing natural selection. Perhaps the first step was
taken by Darwin’s cousin, Francis Galton. He developed numerous probabilistic
techniques for describing the variance in natural traits as well as for natural
selection in general. His conception of natural selection appears to have
been curiously Bayesian although he may never have heard of Bayes' theorem. Evidence
for his Bayesian bent is in the form of a visual aid which he built for a
lecture given to the Royal Society.
He used this device as an aid to explain natural selection
in probabilistic terms. It contains three compartments: a top one representing
the frequency of traits in the parent population, a middle one representing the
application of 'relative fitness' to this prior and a third
representing the normalization of the resulting distribution in the child
generation. Beads are loaded in the top compartment to represent the
distribution in the parent generation and then are allowed to fall
into the second compartment. The trick is in the second compartment where there
is a vertical division in the shape of the relative
fitness distribution. Some of the beads fall behind this division and are
‘wasted’; they do not survive and are removed from sight. The
remaining beads represent the distribution of the 'survivors' in the child
generation.
Galton’s device has recently been rediscovered and
employed by Stephan Stigler and others in the statistics community as a visual
aid, not of natural selection, but of Bayes' theorem. The
top compartment represents the prior distribution, the middle one
represents the application of the likelihood to the prior, and the third
represents the normalization of the resulting distribution. The change between
the initial distribution and the final one is due to the Bayesian update.
R.A. Fisher further developed the mathematics describing
natural selection during the 1920s and 1930s. He applied statistical methods to
the analysis of natural selection via Mendelian genetics and arrived at the
fundamental theorem of natural selection which states (2) :
the rate of increase in
fitness of any organism at any time is equal to its genetic variance in fitness
at that time.
Fisher was a fierce critic of the Bayesian interpretation
which he considered subjective. Instead he pioneered and made many advances with
the frequentist interpretation.
The next major development in the mathematics of natural
selection came in 1970 with the publication of the Price equation which built
on the fundamental theorem of natural selection. As Wikipedia describes it (3) :
Price developed a new
interpretation of Fisher's fundamental theorem of natural selection, the Price
equation, which has now been accepted as the best interpretation of a formerly
enigmatic result.
Although the Price equation fully describes evolutionary
change, its meaning has only recently begun to be unravelled, chiefly by
Stephen A. Frank in a series of papers spanning the last couple of decades.
Frank’s insights into the meaning of the Price equation culminated in his 2012
paper (4) which derives a
description of natural selection using the mathematics of information theory.
In my opinion this paper represents a huge advance in the
understanding of evolutionary change as it shifts interpretation from the
objective statistical description of frequentist probability to an
interpretation in terms of Bayesian inference. Unfortunately Frank does not
share my appreciation of his accomplishment. Instead he seems to take it for
granted, in the frequentist tradition, that a Bayesian interpretation is not
useful. While he understands that his mathematics are very close to those of
Bayesian inference he is unable to endorse a Bayesian interpretation of his
results.
However the mathematics of information theory and Bayesian probability
are joined at the hip as their basic definitions are in terms of one another. Information
theory begins with a definition of information in terms of probability:
Here we may view hi as the ith hypothesis in a mutually
exclusive and exhaustive family of competing hypothesis composing a model. I is
the information gained by the model on learning that hypothesis hi
is true.
P is
the probability which had previously been assigned by the model that the hypothesis
hi is true. Thus information is ‘surprise’; the less likely a model
initially considered a hypothesis that turns out to be the case, the more
surprise it experiences, the more information it receives.
Thus information theory, starting with the very definition of
information, is aligned with the Bayesian interpretation of probability; information
is ‘surprise’ or the gap between an existing state of knowledge and a new state
of knowledge gained through receiving new information or evidence.
The 'expected' information contained by the model composed
of the distribution of the pi is the entropy.
Bayes'
theorem follows directly from the axioms of probability theory and may be
understood as the implication which new evidence or information holds for the
model described by the distribution of the pi. This theorem states
that on the reception of new information (I) by the model (H) the probability
of each component hypothesis (hi) making up the model must be
updated according to:
Where X is the information we had prior to receiving the new
evidence or information I. Bayesian inference is commonly understood as any
process which employs Bayes’ theorem to accumulate evidence based knowledge. As
Wikipedia puts it (5) :
Bayesian
inference is a method of statistical inference in which
Bayes' theorem is used to update the probability for a hypothesis as evidence
is acquired.
Thus we see that, contrary to Frank’s view, Bayesian
inference and information theory have the same logical structure. However it is
instructive to follow Frank’s development of the mathematics of evolutionary
change in terms of information theory while keeping his explicit denial of its
relationship to Bayesian inference in mind. Frank begins his unpacking of
Prices equation by describing the ‘simple model’ he will develop:
A simple model starts with n different types of individuals. The
frequency of each type is qi. Each type has wi offspring,
where w expresses fitness. In the simplest case, each type is a clone producing
wi copies of itself in each round of reproduction. The frequency of
each type after selection is
Where
Equation (1) is clearly an instance of a Bayesian update
where the new evidence or information is given in terms of relative fitness and
thus the development of his simple model is in terms of Bayesian inference;
I consider this to be neither an analogy nor a speculation.
While Frank acknowledges that there is an isomorphism
between Bayes’ theorem and his simple model he cannot bring himself to admit
that this means his simple model involves a Bayesian update and therefore
describes a process of Bayesian inference. Instead he refers to the
relationship between Bayes’ theorem and his model as an analogy:
Part of the problem is that the analogy, as currently developed,
provides little more than a match of labels between the theory of selection and
Bayesian theory. As Harper (2010) shows, if one begins with the replicator
equation (eqn 1), then one can label the set {qi} as the initial (prior) population,
as the new information through differential
fitness and {
} as the updated (posterior) population.
Frank refers to Bayesian inference at many further
points in his paper and even devotes a box to a discussion of it. In the
process he gives a very coherent account of natural selection in terms of
Bayesian inference:
The Bayesian process makes an obvious analogy with selection. The
initial population encodes predictions about the fit of characters to the
environment. Selection through differential fitness provides new information.
The updated population combines the prior information in the initial population
with the new information from selection to improve the fit of the new
population to the environment.
Perhaps he considers it only an analogy because he doesn't
realize that his 'simple model' is in fact an instance of Bayes' theorem. He makes
the somewhat dismissive remark:
I am sure this Bayesian analogy has been noted many times. But it has
never developed into a coherent framework that has contributed significantly to
understanding selection.
On the contrary, I would suggest that Frank’s paper itself develops
a coherent framework for natural selection in terms of Bayesian inference.
He concludes his paper with the derivation of:
J= Bmw Vw/w
On the right hand side of this equation are statistical
variables describing selection. On the left is Jeffreys’ divergence. (Jeffrey,
in the course of his geophysical research, had led the Bayesian revival during
the 1930s.) Frank acknowledges that Jeffreys’ divergence was discovered and
used by Jeffreys in his development of Bayesian inference:
Jeffreys (1946) divergence first appeared in an attempt to derive prior
distributions for use in Bayesian analysis rather than as the sort of
divergence used in this article.
Frank believes that his article is an analysis in terms of frequentist
probability and information theory and he can therefore assert that Jeffreys’
divergence is not to be understood in terms of Bayesian analysis. He contends
that his analysis relates the statistics of evolutionary change to information
theory.
Equation 30 shows the
equivalence between the expression of information gain and the expression of it
in terms of statistical quantities. There is nothing in the mathematics to
favour either an information interpretation or a statistical interpretation.
Frank puts himself in the awkward position of admitting that
his description of evolutionary change utilizes the mathematics of Bayesian
inference while at the same time denying that evolutionary change can be
interpreted as a process of Bayesian inference. Why is he compelled to do this?
Part of the reason may be due to the near tribal loyalty
demanded within both the Bayesian and Frequentist camps. Frank, in the
tradition of the mathematics of evolutionary change, is a frequentist and
jumping ship is not easy. A more substantial reason may be due to a peculiarity,
and I would suggest a flaw, in the Bayesian interpretation. The consensus
Bayesian position is that probability theory describes only inferences made by
humans. As E.T. Jaynes put it:
it is...the job of probability theory to describe human inferences at
the level of epistemology.
Epistemology is the branch of philosophy which studies the
nature and scope of knowledge. Since Plato the accepted definition of knowledge
within epistemology has been ‘justified true belief’. In the Bayesian interpretation ‘justified’
means justified by the evidence. ‘True belief’ is the degree of belief in a
given hypothesis which is justified by the evidence; it is the probability that
the hypothesis is true within the terms of the model. Thus knowledge is the
probability, based on the evidence, that a given belief or model is true. I
have proposed a technical definition of knowledge as 2-S where S is
the entropy of the model (6) .
A perhaps interesting interpretation of this definition is
that knowledge occurs within the confines of entropy or ignorance. For example,
in a model composed of a family of 64 competing hypotheses where no evidence is
available to decide amongst them, we would assign a probability of 1/64 to each
hypothesis. The model has an entropy of 6 bits and has knowledge of 2-6
= 1/64. Let’s say some evidence becomes available and the model’s entropy or
ignorance is reduced to 3 bits. Then the knowledge of the updated model is 1/8,
equivalent to the entropy of a model composed of only 8 competing hypotheses
which is maximally ignorant, which has no available evidence. The effect which
evidence has on the model is to increase its knowledge by reducing the scope of
its ignorance.
Due to their anthropomorphic focus both the fields of
epistemology and Bayesian inference deny themselves the option of commenting on
the many sources of knowledge encountered in non-human realms of nature. They
must deny some obviously true facts about the world such as that a bird ‘knows’
how to fly. Instead they equivocate that a bird has genetic and neural
information which allows it to fly but must deny it the knowledge of flight. The
notion that we are different from the rest of nature in that we have knowledge
is, in my opinion, but an anti-Copernican conceit, an anthropomorphic attempt
to claim a privileged position.
This is unfortunate because it forbids the application of
Bayesian inference to phenomena other than models conceived by humans, it
denies that knowledge may be accumulated in natural processes unconnected to
human agency. Thus even though natural selection is clearly described in terms
of the mathematics of Bayesian inference, neither Bayesians such as Jaynes nor
frequentists such as Frank can acknowledge this fact due to another hard fact: natural
selection is unconnected to human agency. In both their views this rule out it having
a Bayesian interpretation.
Natural selection involves a 'natural'
model rather than one conceived by humans. Biological knowledge is stored
in biological structures, principally the genome, and the probabilities
involved are frequencies, namely counts of the relative proportions of traits,
such as alleles, in a population. The process of natural selection which
updates these frequencies has nothing to do with human agency; in fact this
process of knowledge accumulation operated quite efficiently for billions of
years before the arrival of humans. How, then, should we interpret the
mathematics of evolutionary change which clearly take the form of Bayesian
inference but fall outside of the arena of Bayesian interpretation?
I believe that the way out of this conundrum is to simply
acknowledge that in many cases inference is performed by non-human agents
as in the case of natural selection. The genome may for instance be understood
as an example of a non-human conceived model involving families of
competing hypotheses in the form of competing alleles within the population.
Such models are capable of accumulating evidence-based knowledge in a Bayesian
manner. The evidence involved is simply the proportion of traits in ancestral
generations which make it into succeeding generations. In other words, we just need to broaden Jaynes' definition of
probability to include non-human agency in order to view natural selection
in terms of Bayesian inference.
Bayesian probability, epistemology and science in general
tend to draw a false distinction between the human and non-human realms of
nature. In this view the human realm is replete with knowledge and thus infused
with meaning, purpose and goals and the mathematical framework describing these
knowledge-accumulating attributes is Bayesian inference. On the other hand the
non-human realm is viewed as devoid of these attributes and thus Bayesian
inference is considered inapplicable.
However if we recognize expanded instances, such as natural
selection, in which nature accumulates knowledge then we may also recognize that
Bayesian inference provides a suitable mathematical description. Evolutionary
processes, as described by the mathematics of Bayesian inference, are those
which accumulate knowledge. Not just
any arbitrary type of knowledge, but that
required for increased fitness, for increased chances of continued existence.
Thus the mathematics imply purpose, meaning and goals, and thus provide
legitimacy for Daniel Dennett’s interpretation of natural selection in those
terms (7) :
If I could give a prize to the single
best idea anybody ever had, I’d give it to Darwin—ahead of Newton, ahead of
Einstein, ahead of everybody else. Why? Because Darwin’s idea
put together the two biggest worlds, the world of mechanism and material, and
physical causes on the one hand (the lifeless world of matter) and the world of
meaning, and purpose, and goals. And those had seemed really just—an
unbridgeable gap between them and he showed “no,” he showed how meaning and
purposes could arise out of physical law, out of the workings of ultimately
inanimate nature. And that’s just a stunning unification and opens up
a tremendous vista for all inquiries, not just for biology, but for
the relationship between the second law of thermodynamics and the existence of
poetry.
If we allow an expanded scope to Bayesian inference we may
view Dennett’s poetic interpretation of Darwinian processes as being supported
by their most powerful mathematical formulation.
An important aspect of these mathematics is that they apply
not only to natural selection but also to any generalized evolutionary
processes where inherited traits change in frequencies between generations. As
noted by the cosmologists Conlon and Gardner (8) :
Specifically, Price’s equation
of evolutionary genetics has generalized the concept of selection acting upon
any substrate and, in principle, can be used to formalize the selection of
universes as readily as the selection of biological organisms.
Given that the Price equation is a general mathematical
framework for evolutionary change, its Bayesian interpretation allows us to
consider all evolutionary change as due
to the accumulation of evidence-based knowledge. So quantum theory (9) , biology (10) , neural-based
behaviour (11) and culture (12) may all be
understood in terms of such evolutionary processes. Thus a wide scope of subject
matters may be unified within a single philosophical and mathematical
framework.
Bibliography
1. Jaynes, Edwin T. Probability Theory:
the logic of science. Cambridge : Cambridge University Press, 2003.
2. Fisher, R.A.
The Genetical Theory of Natural Selection. s.l. : Clarendon
Press, Oxford, 1930.
3. Wikipedia.
George R. Price. Wikipedia. [Online] [Cited: September 30, 2015.]
https://en.wikipedia.org/wiki/George_R._Price.
4. Natural
selection. V. How to read the fundamental equations of evolutionary change in
terms of information theory. Frank, S.A. s.l. : Journal of
Evolutionary Biology , 2012, Vols. 25:2377-2396.
5. Wikipedia.
Bayesian Inference. Wikipedia. [Online] [Cited: September 26, 2015.] https://en.wikipedia.org/wiki/Bayesian_inference.
6. Campbell, John
O. Darwin Does Physics. s.l. : createspace, 2014.
7. Dennett,
Daniel. Darwin's Dangerous Idea. 1995. ISBN-10: 068482471X.
8. Cosmological
natural selection and the purpose of the universe. Conlon, Joseph and
Andy, Gardner. 2013.
9. Quantum
Darwinism. Zurek, Wojciech H. 2009, Nature Physics, vol. 5, pp.
181-188.
10. Darwin,
Charles. The Origin of Species. sixth edition. New York : The
New American Library - 1958, 1872. pp. 391 -392.
11. Selectionist
and evolutionary approaches to brain function: a critical appraisal. Fernando,
Chrisantha, Szathmary, Eros and Husbands, Phil. s.l. :
http://www.sussex.ac.uk/Users/philh/pubs/fncom-Fernandoetal2012.pdf, 2012,
Computational Neuroscience.
12. Towards a
unified science of cultural evolution. Mesoudi, Alex , Whiten, Andrew
and Laland, Kevin N. . 2006, BEHAVIORAL AND BRAIN SCIENCES.
13. A framework
for the unification of the behavioral sciences. Gintis, Herbert.
s.l. :
http://www2.econ.iastate.edu/tesfatsi/FrameworkForUnificationOfBehavioralSciences.HGintis2007.pdf,
2007, BEHAVIORAL AND BRAIN SCIENCES.