October, 2015

# Motivating example

• A policeman spots an armed person next to a robbery. He quickly concludes that the man is guilty. By what reasoning process?

# Deductive and plausible reasoning

• Policeman's conclusion couldn't be a logical deduction
• Deductive reasoning consists of the following strong syllogisms:
• if $$A$$ is true, then $$B$$ is true \begin{align*} \frac{\text{$A$ is true}}{\text{therefore, $B$ is true}} \end{align*}
• \begin{align*} \frac{\text{$B$ is false}}{\text{therefore, $A$ is false}} \end{align*}

# Weak syllogisms

• The reasoning of our policeman consists of the following weak syllogisms:

• if $$A$$ is true, then $$B$$ is true \begin{align*} \frac{\text{$A$ is false}}{\text{therefore, $B$ becomes less plausible}} \end{align*}
• \begin{align*} \frac{\text{$B$ is true}}{\text{therefore, $A$ becomes more plausible}} \end{align*}

# Crucial difference

• Strong syllogisms can be chained together without any loss of certainty

• Weak syllogism have wider applicability

Most of the reasoning people do consists of weak syllogism.

# Quantifying weak syllogisms

• Question: can we quantify this everyday reasoning?

Probability theory is nothing but common sense reduced to calculation -- Laplace, 1819

• Cox's theorem (1946) states that any system for plausibility reasoning that satisfies certain commonsense requirements is isomorphic to probability theory.
• Cox's theorem relies on a few assumptions. Whilst those assumptions are not necessarily how humans always think, it might well something be something that rational people might want to adopt into their way of thinking.

# Objectives of this talk

• Goal 1: show that probability theory can be regarded as an extension of logic
• Goal 2: relate such notion of probability to other approaches of interpreting probability
• Goal 3: show applications of probability theory used as logic

# Desideratum. Developing a thinking robot.

1. Degrees of plausibility are represented by real numbers.
1. Correspondence with common sense:
• (2a) If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.
• (2b) The robot always takes into account all of the evidence it has relevant to a question. It does not arbitrarily ignore some of the information, basing its conclusions only on what remains.
• (2c) The robot always represents equivalent states of knowledge by equivalent plausibility assignments. That is, if in two problems the robot's state of knowledge is the same (except perhaps for the labelling of the propositions), then it must assign the same plausibilities in both.

# Assumption: Single-valued theory

There exists a continuous monotonic decreasing function $$S$$ such that

$(\neg A|B) = S(A|B)$

# Assumption: Conjunction

There exists a continuous function $$F$$ such that

$(A \wedge B|C) = F[(B|C), (A| B, C)]$

Heuristic: for $$A \wedge B$$ to be true $$B$$ has to be true so $$(B | C)$$ is needed. If $$B$$ is false then $$A \wedge B$$ is false independently of $$A$$, so $$(A | C)$$ is not needed if $$(A | B, C)$$ and $$(B | C)$$ are known.

# Cox's theorem

There exists a continuous, strictly increasing function $$p$$ such that, for every $$A, B$$ and some background information $$X$$,

1. $$p(A | X) = 0$$ iff $$A$$ is known to be false given the information in $$X$$.
2. $$p(A | X) = 1$$ iff $$A$$ is known to be true given the information in $$X$$.
3. $$0 \leq p(A | X) \leq 1$$.
4. $$p(A\wedge B | X) = p(A | X)p(B | A, X)$$.
5. $$p(\neg A | X) = 1 - p(A | X)$$.

# Approaches to probability: comparison

• Measure-theoretic. Opt-out: probability is any measure with certain properties.

The principles for assigning probabilities by logical analysis of incomplete information is not present at all in Kolmogorov system.

• Statistics:
1. Bayesian: what we derived here. Probability = plausibility of a statement
2. Frequentist: probability = long standing frequency of an event

# Convergence of Bayesian probabilities with Frequentists

Conduct $$n$$ independent trials, where each trial has $$m$$ outcomes.

Start with ignorance knowledge ($$I_0$$) that every trial is equally likely.

Can be shown that

$P(\text{trial}_i = j | \{n_j\}, n, I_0) = \frac{n_j}{n}$

where $$\frac{n_j}{n}$$ is just the observed frequency of an outcome $$j$$.

# Case study 1.

Throw a die $$n$$ times. Average is $$4$$. What is the probability distribution of such a die as $$n \to \infty$$?

Let's first answer calculate the following:

$P(\text{Average is } 4 | I_0)$

# Calculating the multiplicity

$P(\text{Average is } 4 | I_0) = \text{Multiplicity}(\text{Average is } 4) / 6^n$

Fix $$n = 20$$.

\begin{align*} \text{Multiplicity}(\text{Average is } 4 | I_0) = \frac{20!}{1!1!1!12!4!1!} &+ \frac{20!}{1!1!1!13!2!2!} \\ &+ \frac{20!}{1!1!2!10!5!1!} + \cdots \end{align*}

There are $$283$$ terms in the summation.

# Multiplicity as $$n \to \infty$$

$\frac{1}{n} \log(\text{Multiplicity}) = \frac{1}{n} \log \max_{\substack{\sum_i n_i = n \\ \sum_i i*n_i = 4 * n}} \frac{n!}{n_1! n_2! \dots n_6!} + o(1) \,.$

Now take $$n \to \infty$$ under $$n_j / n \to f_j$$ to see that

$P(\text{Average is } 4 | I_0) \approx \frac{e^{n \sum_i - f_j \log f_j}}{6^n}.$

for $$f_j$$'s that maximise $$\sum_{j = 1}^{6} - f_j \log f_j$$.

# Solution to the die question

• Out of all probability distributions that average to $$4$$ pick the one which maximise the information entropy: $$\sum_{j = 1}^{6} -p_j \log p_j$$.

• The answer is: $$0.11, 0.12, 0.14, 0.17, 0.21, 0.25$$.

# Consequences of the Cox theorem

• Such interpretation of probability unify statistical inference, probability theory, information theory under one mathematical framework.

"Information theory must precede probability theory and not be based on it." -- Kolmogorov

• E.T. Jaynes derived procedures for multiple hypotheses testing, parameter estimation, significance testing and many more directly from the Cox's theorem.
• Often times such derivations yield a new and deeper understanding of the statistical tools.

# Final remark

In the future, workers in all the quantitative sciences will be obliged, as a matter of practical necessity, to use probability theory in the manner expounded here.

-- E.T. Jaynes. Probability: The Logic of Science