Artiom Fiodorov (Tom)
October, 2015
The reasoning of our policeman consists of the following weak syllogisms:
Strong syllogisms can be chained together without any loss of certainty
Weak syllogism have wider applicability
Most of the reasoning people do consists of weak syllogism.
Answer: yes, using probability theory.
Probability theory is nothing but common sense reduced to calculation -- Laplace, 1819
There exists a continuous monotonic decreasing function \(S\) such that
\[ (\neg A|B) = S(A|B) \]
There exists a continuous function \(F\) such that
\[(A \wedge B|C) = F[(B|C), (A| B, C)]\]
Heuristic: for \(A \wedge B\) to be true \(B\) has to be true so \((B | C)\) is needed. If \(B\) is false then \(A \wedge B\) is false independently of \(A\), so \((A | C)\) is not needed if \((A | B, C)\) and \((B | C)\) are known.
There exists a continuous, strictly increasing function \(p\) such that, for every \(A, B\) and some background information \(X\),
Measure-theoretic. Opt-out: probability is any measure with certain properties.
The principles for assigning probabilities by logical analysis of incomplete information is not present at all in Kolmogorov system.
Conduct \(n\) independent trials, where each trial has \(m\) outcomes.
Start with ignorance knowledge (\(I_0\)) that every trial is equally likely.
Can be shown that
\[ P(\text{trial}_i = j | \{n_j\}, n, I_0) = \frac{n_j}{n} \]
where \(\frac{n_j}{n}\) is just the observed frequency of an outcome \(j\).
Throw a die \(n\) times. Average is \(4\). What is the probability distribution of such a die as \(n \to \infty\)?
Let's first answer calculate the following:
\[ P(\text{Average is } 4 | I_0) \]
\[ P(\text{Average is } 4 | I_0) = \text{Multiplicity}(\text{Average is } 4) / 6^n \]
Fix \(n = 20\).
\begin{align*} \text{Multiplicity}(\text{Average is } 4 | I_0) = \frac{20!}{1!1!1!12!4!1!} &+ \frac{20!}{1!1!1!13!2!2!} \\ &+ \frac{20!}{1!1!2!10!5!1!} + \cdots \end{align*}There are \(283\) terms in the summation.
\[ \frac{1}{n} \log(\text{Multiplicity}) = \frac{1}{n} \log \max_{\substack{\sum_i n_i = n \\ \sum_i i*n_i = 4 * n}} \frac{n!}{n_1! n_2! \dots n_6!} + o(1) \,. \]
Now take \(n \to \infty\) under \(n_j / n \to f_j\) to see that
\[ P(\text{Average is } 4 | I_0) \approx \frac{e^{n \sum_i - f_j \log f_j}}{6^n}. \]
for \(f_j\)'s that maximise \(\sum_{j = 1}^{6} - f_j \log f_j\).
Out of all probability distributions that average to \(4\) pick the one which maximise the information entropy: \(\sum_{j = 1}^{6} -p_j \log p_j\).
The answer is: \(0.11, 0.12, 0.14, 0.17, 0.21, 0.25\).
"Information theory must precede probability theory and not be based on it." -- Kolmogorov
In the future, workers in all the quantitative sciences will be obliged, as a matter of practical necessity, to use probability theory in the manner expounded here.
-- E.T. Jaynes. Probability: The Logic of Science