Probability theory as an extension of logic

Artiom Fiodorov (Tom)

October, 2015

Motivating example

Deductive and plausible reasoning

Weak syllogisms

Crucial difference

Quantifying weak syllogisms

Answer: yes, using probability theory.

Probability theory is nothing but common sense reduced to calculation -- Laplace, 1819

Objectives of this talk

Desideratum. Developing a thinking robot.

Assumption: Single-valued theory

There exists a continuous monotonic decreasing function \(S\) such that

\[ (\neg A|B) = S(A|B) \]

Assumption: Conjunction

There exists a continuous function \(F\) such that

\[(A \wedge B|C) = F[(B|C), (A| B, C)]\]

Heuristic: for \(A \wedge B\) to be true \(B\) has to be true so \((B | C)\) is needed. If \(B\) is false then \(A \wedge B\) is false independently of \(A\), so \((A | C)\) is not needed if \((A | B, C)\) and \((B | C)\) are known.

Cox's theorem

There exists a continuous, strictly increasing function \(p\) such that, for every \(A, B\) and some background information \(X\),

  1. \(p(A | X) = 0\) iff \(A\) is known to be false given the information in \(X\).
  2. \(p(A | X) = 1\) iff \(A\) is known to be true given the information in \(X\).
  3. \(0 \leq p(A | X) \leq 1\).
  4. \(p(A\wedge B | X) = p(A | X)p(B | A, X)\).
  5. \(p(\neg A | X) = 1 - p(A | X)\).

Approaches to probability: comparison

  1. Bayesian: what we derived here. Probability = plausibility of a statement
  2. Frequentist: probability = long standing frequency of an event

Convergence of Bayesian probabilities with Frequentists

Conduct \(n\) independent trials, where each trial has \(m\) outcomes.

Start with ignorance knowledge (\(I_0\)) that every trial is equally likely.

Can be shown that

\[ P(\text{trial}_i = j | \{n_j\}, n, I_0) = \frac{n_j}{n} \]

where \(\frac{n_j}{n}\) is just the observed frequency of an outcome \(j\).

Case study 1.

Throw a die \(n\) times. Average is \(4\). What is the probability distribution of such a die as \(n \to \infty\)?

Let's first answer calculate the following:

\[ P(\text{Average is } 4 | I_0) \]

Calculating the multiplicity

\[ P(\text{Average is } 4 | I_0) = \text{Multiplicity}(\text{Average is } 4) / 6^n \]

Fix \(n = 20\).

\begin{align*} \text{Multiplicity}(\text{Average is } 4 | I_0) = \frac{20!}{1!1!1!12!4!1!} &+ \frac{20!}{1!1!1!13!2!2!} \\ &+ \frac{20!}{1!1!2!10!5!1!} + \cdots \end{align*}

There are \(283\) terms in the summation.

Multiplicity as \(n \to \infty\)

\[ \frac{1}{n} \log(\text{Multiplicity}) = \frac{1}{n} \log \max_{\substack{\sum_i n_i = n \\ \sum_i i*n_i = 4 * n}} \frac{n!}{n_1! n_2! \dots n_6!} + o(1) \,. \]

Now take \(n \to \infty\) under \(n_j / n \to f_j\) to see that

\[ P(\text{Average is } 4 | I_0) \approx \frac{e^{n \sum_i - f_j \log f_j}}{6^n}. \]

for \(f_j\)'s that maximise \(\sum_{j = 1}^{6} - f_j \log f_j\).

Solution to the die question

Consequences of the Cox theorem

"Information theory must precede probability theory and not be based on it." -- Kolmogorov

Final remark

In the future, workers in all the quantitative sciences will be obliged, as a matter of practical necessity, to use probability theory in the manner expounded here.

-- E.T. Jaynes. Probability: The Logic of Science