The prominent physicists David Deutsch reminds us of
the power of knowledge:
Everything that is not forbidden by the laws of nature is achievable,
given the right knowledge.
Undoubtedly knowledge is the essential ingredient of order. I
will argue however that the preeminent position which Deutsch bestows upon
knowledge does not go far enough, that the very laws of nature have themselves
come into being through the accumulation of knowledge, that the laws of physics
may be considered but a summary of nature’s accumulated knowledge.
Information at the basis of structure
Since atomic theory came to general acceptance near the
start of the 20th century physics has learned that atoms themselves
are composed of yet more fundamental particles.
There is little reason to suppose that the quarks and leptons currently
understood to be fundamental will not in turn be found to be composed of an
underlying family of even more fundamental ‘preons’.
All structures appear to be composed of ever more
fundamental units. At each level new structures may form because the subunits
are able to interact and exchange information. Without the exchange of
information between them no structures could exist; at that level such a
reality would be composed entirely of isolated entities unable to detect or
influence other isolated entity.
The complexity of the world we experience is due, in a
primary sense, to this transfer of information; the ability of one entity to
record information about another entity. The four forces of nature by which fundamental
particles are understood to influence each other may be described as instances
of information transfer. Indeed these forces appear to be the ultimate form of
all information transfer.
It is akin to the miraculous that our universe is not a
barren one where each fundamental particle is isolated and uninfluenced by
anything else. In such a world there could be no bonds between matter and no
entity could contain information regarding another entity. Indeed up to the
level of the universe as a whole science understands all structures to be
composed of interacting sub units.
Principle of incomplete information
However information transfer is highly constrained; it
appears to be a universal principal, which I will name the principle of
incomplete knowledge, that one entity may convey only very little information
about itself to other entities[1].
This principle seems as applicable to human communications as it does to
information transfer between quantum entities. Complete information concerning
one entity is never available to another entity; some degree of ignorance is
unavoidable.
A general argument I will make is that knowledge is
essential for the existence of many natural systems including quantum systems,
biological systems, behavioural systems, and cultural systems. In this view
knowledge has played a long and illustrious role in the evolution of the
universe.
Towards a definition of knowledge
Surprisingly within science, our primary means of
understanding the universe, the term ‘knowledge’ is used in only a vague sense
and does not have a clear technical definition. This lack of an adequate
definition for a phenomena playing a central role in the structure of the
universe has resulted in a good deal of confusion. I will suggest a technical
definition of knowledge later in this section which may help to resolve this problem.
While science has not provided a clear definition of
knowledge it has developed a detailed understanding of ignorance. Ignorance is
the amount of information in bits which any entity lacks in its knowledge of
another and has the technical term entropy. Entropy is conceptually and
mathematically well understood.
My proposed definition of knowledge leverages our deep
understanding of ignorance. Knowledge is the probability which is the
mathematical inverse of entropy. This probability is essentially the chance
that a random pick within the realm of ignorance will be the correct choice; our
odds of being correct (knowledge) increases as ignorance decreases.
Knowledge and Bayesian inference
The principle of
incomplete knowledge requires that any knowledge must be uncertain knowledge.
The field of mathematics which describes degrees of uncertainty or degrees of
plausibility is Bayesian probability. It provides all the necessary mechanisms
for determining the probability of knowledge from the information content of
supporting evidence. Thus Bayesian probability prescribes the evolution of
knowledge as evidential information is gained.
Common usage of the word knowledge usually involves an
internal representation or model of external phenomena. The definition of
knowledge I will develop is in close agreement with this view. In order for an
internal model to be accurate it must receive information of the external
phenomena and be capable of updating its representation as the phenomena
changes. Thus it is necessary for the model to receive information concerning
the phenomena and to possess the ability to update itself accordingly.
In contrast to knowledge, information has a fairly straight
forward scientific definition. It is measured in bits which may be considered
answers to ‘yes’ or ‘no’ type questions. For instance the game of twenty
questions can be considered as one where the questioner receives twenty bits of
information in order to identify the correct answer. Twenty bits of information
is powerful, it is able to distinguish between 220 or over a million
different possibilities.
Information may also take the form of a coded message that
represents an entity. For instance all of the concepts dealt with by computers
are represented by messages in binary code. As a simple example we might
consider the sixteen binary states of four flipped coins represented by ones
for heads and zeroes for tails.
0000
|
0001
|
0010
|
0011
|
|||
0100
|
0101
|
0110
|
0111
|
|||
1000
|
1001
|
1010
|
1011
|
|||
1100
|
1101
|
1110
|
1111
|
These sixteen distinguishable outcomes of flipping four
coins may each be identified with four bits of information. In general n
distinguishable states can be distinguished or coded with log2(n)
bits of information. The probability of randomly choosing a specific state from
n states is 1/n or if we let I stand for the information required to code for
the state then the probability is 1/2I or equivalently 2-I.
The history of information as a well-defined scientific
concept has been quite brief. Claude Shannon introduced our modern conception
of information in 1948. Since then it has come to be seen by many as perhaps
the most fundamental concept in Science. The great physicist John Wheeler said
that he had come to view ‘everything as
information’ (52). This astonishing
ascendance of a scientific concept from its introduction as a scientific
concept to perhaps the most fundamental in science has occurred in only fifty
years.
There can be difficulties when a colloquial term such as
‘information’ is adopted by science and given a precise technical definition.
The technical definition may be quite different from common usage and confusion
may arise.
Technical definition of information
Dictionary.com defines information:- knowledge communicated or received concerning a particular fact or circumstance; news: information concerning a crime.
- knowledge gained through study, communication, research, instruction, etc.; factual data: His wealth of general information is amazing.
These definitions describe information in terms of knowledge
but this is not the technical definition of information. Technically
information is defined in terms of probability:
Where each w is one of the possible outcomes of some event
and wn is the nth possible outcome. P(wn) is the probability that the nth possible
outcome will actually occur. In our discussion it should be assumed that the
log function is to the base 2 and thus information is given it bits.
The term on the left side of the equal
sign might be paraphrased as ‘the information (I) received on learning that the
outcome wn has occurred’. The right hand side of the expression
might be paraphrased as ‘the negative log function of the probability previously
assigned to the possibility that the outcome wn would occur’.
So this definition says that the
information received when event occurs equals the
negative log of the probability that had been previously assigned to the event
happening. Information may be thought of as the amount of surprise experienced
when the actual outcome is learned.
Thus technically information is a measure
of probability. If we assigned a low probability to an outcome we receive a lot
of information if it does occur. If we expected it to be sunny today but it
rained we received a lot of information; our plans may have to be fully
revised. On the other hand if we expected rain and it did rain then we did not
receive much information and not much needs to be updated.
It is perhaps somewhat paradoxical that although information
has come to be considered perhaps the most fundamental concept in science it is
not simple. It requires the assignment of a probability to an outcome and in
addition it requires that this probability be compared to the actual outcome,
requiring a rather complicated mechanism for any physical instantiation. Thus
information transfer is itself a complex phenomenon.
We might also use the above equation as a definition of
probability: probability is a numerical assignment of the degree of
plausibility for a given outcome.
Bayesian interpretation of information and knowledge
In Bayesian terminology probabilities represent states of
knowledge thus making a connection with information’s colloquial meaning.
Perhaps surprisingly although the term knowledge is used
extensively within the Bayesian scientific literature there does not seem to be
an accepted definition. In fact Jaynes uses the term to define probability
itself:
In our terminology, a probability is something that we assign, in order to represent a state of knowledge.
However nowhere in his
writing or in other Bayesian literature have I been able to find an in-depth
description of what is meant by ‘knowledge’. Unfortunately the primary
technical definition of knowledge seems to still be the one offered by Plato
over two thousand years ago and still embraced by many philosophers today that
knowledge is ‘justified true belief’.
The first problem with
this definition is that it just refocuses our attempts at clarity onto
deciphering what is meant by ‘justified true belief’. This seems to offer only
a regress to other vague terms. A perhaps more serious problem is that this
definition has come to be understood as referring to human knowledge and justified true human
beliefs. It does not refer to knowledge found anywhere else in nature.
Jaynes himself seems to
have accepted the philosophers’ definition.
it is...the job of probability theory to describe human inferences at the level of epistemology.
I suggest that this
confusion over the nature and scope of ‘knowledge’ within Bayesian thought has
led to numerous difficulties in its proper application to fields such as
biology where the existence of non-human knowledge is evident.
In our brief review of
the proper context for knowledge we have encountered a number of related
concepts including: models, information, updating models with information and
probability. We can now combine these to gain an understanding of the process
by which knowledge may evolve.
Returning to the definition of information as a measure of
probability we should consider that the probabilities assigned to the mutually
exclusive and exhaustive set of all the possible outcomes of an event must sum
to 1. One and only one outcome of the model must occur. We may consider this
set as a list of hypothesis; each assigned a probability that the associated
outcome will occur. This kind of
complete set of hypothesis forms a model of the event. To find the correct
hypothesis in the set we must gather enough information to label one true and
the rest false.
A set of probabilities which sums to 1 is called a
probability distribution and has many interesting mathematical properties.
Perhaps foremost amongst them is entropy. Entropy is the sum of the information
contained in the set of hypotheses, the information of each hypothesis weighted
by its probability:
Where E is entropy, H is our model and hn are the n hypotheses making
up the model. This expression for entropy may be paraphrased as: the expected
surprise that a model of the outcomes will experience when the actual outcome
becomes known.
Surprise, and thus increased entropy, occurs when the model
lacks predictive accuracy. The entropy of every probability distribution has a
value between zero and infinity. It equals zero when the probability
distribution is a certainty; one hypothesis has a probability of 1 and the rest
of 0. The uniform distribution which has n members all having probability 1/n
has the highest entropy of any distribution with n members. Its entropy
approaches infinity as n approaches infinity.
Entropy measures what a model does not know or its
uncertainty. In the case of thermodynamics entropy is the amount of uncertainty
in the exact microstate of the system when we have some partial information
such as temperature:
The amount of additional information that would allow us to pinpoint the
actual microstate is given by the entropy of the distribution.
Definition of knowledge
As entropy measures a lack of knowledge or ignorance it is a
kind of inverse of knowledge and we might expect a technical definition of
knowledge could be formed in terms of entropy. A first step forward is to
recognize that knowledge, like entropy, is a property of a model; it is a
measure of a model’s predictive accuracy. Drawing on the relationship between
information and probability we noted earlier and noting that entropy is a form
of information I propose the technical definition for the knowledge of a model
K(H) as:
For example the model describing a coin flip is the two
member uniform probability distribution {.5, .5}. It has entropy =1 bit. There
are 21 = 2 distinct possible outcomes about which the model is
ignorant: [Heads] and [Tails]. The model’s knowledge is 2-1 = .5 which is the probability that an
arbitrary choice will produce the correct prediction.
Knowledge and ignorance
In general entropy is a measure of ignorance and ignorance
is described by the uniform distribution; when nothing is known all
possibilities are equally likely. The entropy of the uniform distribution which
has n possibilities is log(n). Using our definition its
knowledge is 2-log(n) or more simply 1/n. But this is the
probability that any of the possibilities within the space of our ignorance is
the correct one. Our definition of knowledge is the probability of randomly
guessing the correct possibility within the boundaries of ignorance.
The amazing implication is that knowledge amounts to a
random guess within the sphere of ignorance. The only way the guess may become
more likely is if the space of ignorance is reduced.
As an example let consider a model in the form of a
distribution which has 16 possibilities. To begin with we have no information
which would make one possibility more likely than the others so we assign the
uniform distribution where each probability is 1/16. This distribution has 4
bits of entropy. Let’s say we get some evidence concerning the model and when
this is applied via Bayesian updating some possibilities become more likely
than others and the entropy of the new distribution is reduced to 3 bits.
The new state of knowledge is 2-3 = 1/8. But this
is the same knowledge as contained in a uniform distribution with only 8
possibilities; it is the same probability as a random guess amongst 8
possibilities. The change in certainty of the model due to the evidence is
equivalent to reducing the scope of our ignorance from 16 possibilities to 8.
The amazing implication is that knowledge may be considered a
random guess within a scope of ignorance. The only way for the guess to become more
likely, for knowledge to increase, is for the scope of ignorance to be reduced.
While this definition might seem mathematically cumbersome
it has some attractive properties:
- Knowledge is a positive number between 0 and 1.
- Knowledge increases in value as entropy or ignorance decreases and vice versa.
- Knowledge approaches 1 when one of the model’s hypothesis approaches certainty.
- The knowledge of the uniform distribution is especially simple; it is 1/n, the same as the probability for any particular outcome. Thus a fair six sided dice has a distribution with knowledge of 1/6 which agrees with our common sense perception of knowledge as a measure of how close we are to certainty.
- The knowledge of the uniform distribution approaches zero as n approaches infinity. Again in agreement with common sense; all else being equal we know the least when there are a great many possibilities and we have no information that would allow us to prefer one over any of the others.
Our definition of knowledge in terms of a probability agrees
with the usual Bayesian definition of probability as a state of knowledge with
one important difference; knowledge is a property of any model in nature and
such models are not necessarily closely related to humans.
With this definition in hand we might next ask ‘how is a
model’s knowledge increased?’ Fortunately mathematicians have shown that
knowledge increase must follow a unique algorithm: the Bayesian update. On the
reception of new information (I) by the model (H) the probability of each
component hypothesis (hn) making up the model must be updated
according to:
Where X is the
information we had prior to receiving our new information I.
This theorem demonstrates that the model composed of the updated probabilities will have the greatest
accuracy possible given the data. Models which are updated according to Bayes’
theorem on the reception of new information will tend to have the greatest
knowledge or predictive accuracy. There are however some important caveats to
this that are explored in Appendix 1 using as an example the results of medical
tests.
We have seen
that the mathematical concepts of information, probability, entropy, knowledge,
Bayesian update, and models are intimately related. They are in fact but
different properties of a mathematical entity called inference. Inference is
the mathematical process for basing conclusions on data and for reaching the
best conclusions possible in the face of incomplete information.
A clearer view
of the intimate relation amongst these concepts might begin with probability.
- Cox showed that any consistent process of assigning real numbers to degrees of plausibility would lead to the sum and product rules of probability theory; these rules may be taken as the axioms of probability theory.
- The Bayesian update is a mere rearrangement of the terms of the product rule.
- The Bayesian update connects new information with updated probabilities which form a probability distribution over an exhaustive and mutually exclusive set of hypothesis (H); in our terms this is a model.
- Entropy and knowledge are inverse functions which are properties of models.
Thus we see
that these concepts are inseparable; they are defined in terms of one another
and any one of them implies the others. The integrated entity which they form
is an inferential agent. When we encounter any one of these concepts we should
expect to encounter them all operating together as an inferential agent.
This claim may
appear reckless. In some views probability or information are primitive
concepts which are found throughout nature and do not necessarily entail the
complications of inference. However we might consider that probability and
information are defined in terms of one another. Probability is the assignment
of a degree of plausibility of an outcome. No such assignment can be made
without considering the set of alternative outcomes, in other words without
considering a probability distribution over all possible outcomes. This is a
model and on the reception of new information the correct probabilities
entailed by the model are given by the Bayesian update. Thus my claim that
probability (and the other related concepts) has no meaning other than within
the context of inference.
Rather than narrowing the context
for probability, this view actually is an expansion on the usual Bayesian view
of probability. Bayesians have stressed that probability is related to a state
of human knowledge or inference. In the view presented here the scope of
probability is expanded beyond humans to the larger arena of inferential agents
in general.
[1]
The basis for the principle of incomplete information may reside in the nature of quantum information
which is the basic form of all information exchange. The quantum information
necessary to fully describe an entity can be divided between Holevo information
which may be communicated and the information of quantum discord which may not (91). The quantity of
Holevo, or classical information, is usually minute compared to quantum
discord.
No comments:
Post a Comment