Probability Definition#

Probabilistic Model#

A probabilistic model is a mathematical framework to describe an experiment or process that involves uncertainty. It captures all possible outcomes and their associated probabilities, allowing for analysis and prediction.

Definition of Event#

An event represents a specific subset of outcomes from the sample space that we are interested in. For example, in a dice roll, the event of rolling an even number corresponds to the subset {2, 4, 6}.

Three Fundamental Components to Define a Probability#

Sample Space (\( \mathbb{S} \))
The sample space is the universal set of all possible outcomes of a random experiment. For a coin toss, \( \mathbb{S} = \{ \text{Heads}, \text{Tails} \} \). It serves as the foundational set from which events are defined.

Class of Events (\( \mathsf{E} \))
This is the collection of subsets of the sample space \( \mathbb{S} \). Each subset represents an event that could be studied. For instance, if \( \mathbb{S} = \{1, 2, 3, 4, 5, 6\} \) for a dice roll, possible events could be \( \{2, 4, 6\} \) (rolling an even number) or \( \{1, 3, 5\} \) (rolling an odd number).

Probability Law
The probability law assigns a value \( \Pr(A) \) to each event \( A \), quantifying how likely the event is to occur. It adheres to specific axioms, ensuring that probabilities are non-negative, sum to 1 across all possible outcomes, and are additive for disjoint events.

Special Types of Events#

  • Sure Event
    A sure event is the entire sample space \( \mathbb{S} \), representing a scenario where every possible outcome occurs. For example, when rolling a die, the event of rolling any number between 1 and 6 is a sure event.

  • Null/Impossible Event
    This refers to the empty set (\( \varnothing \)) and represents an event that cannot occur. For instance, rolling a 7 on a standard six-sided die is a null event.

Example: Rolling a Dice Experiment#

Sample Space (\( \mathbb{S} \))#

For a single roll of a standard six-sided die, the sample space is:

\[ \mathbb{S} = \{1, 2, 3, 4, 5, 6\} \]

Examples of Events#

An event is any subset of \( \mathbb{S} \), which can be:

Single-Outcome Events:

  • Rolling a 3: \( A = \{3\} \)

  • Rolling a 5: \( B = \{5\} \)

Multi-Outcome Events:

  • Rolling an even number: \( C = \{2, 4, 6\} \)

  • Rolling an odd number: \( D = \{1, 3, 5\} \)

  • Rolling a number less than 4: \( E = \{1, 2, 3\} \)

Compound Events:

  • Rolling a number greater than 4 or even: \( F = \{4, 5, 6\} \)

  • Rolling a prime number (prime numbers on a die: \( 2, 3, 5 \)): \( G = \{2, 3, 5\} \)

Complementary Events:

  • The complement of rolling a 6: \( H = \{1, 2, 3, 4, 5\} \)

Null Event:

  • Rolling a 7 on a six-sided die: \( I = \varnothing \)

Sure Event:

  • Rolling any number: \( J = \mathbb{S} = \{1, 2, 3, 4, 5, 6\} \)

Class of Events (\( \mathsf{E} \))#

The class of events is the collection of subsets of \( \mathbb{S} \), i.e., all subsets of \( \mathbb{S} \), from the null set to the entire sample space.

For this example, \( \mathsf{E} \) includes all possible subsets of \( \mathbb{S} \), including:

Single-Outcome Subsets:
\(\{1\}, \{2\}, \{3\}, \{4\}, \{5\}, \{6\}\)

Multi-Outcome Subsets:
\(\{1, 2\}, \{1, 3, 5\}, \{2, 4, 6\}, \{1, 2, 3, 4\}\), etc.

Special Subsets:

  • The empty set (\( \varnothing \)), corresponding to the null event.

  • The entire sample space (\( \mathbb{S} \)), corresponding to the sure event.

Properties of \( \mathsf{E} \)#

A set is said to be closed under an operation if applying that operation to elements of the set results in an element that is also in the set.

  • Closed Under Union: If \( A, B \in \mathsf{E} \), then \( A \cup B \in \mathsf{E} \).
    Example: \( A = \{2, 4\} \), \( B = \{3, 5\} \), so \( A \cup B = \{2, 3, 4, 5\} \).

  • Closed Under Intersection: If \( A, B \in \mathsf{E} \), then \( A \cap B \in \mathsf{E} \).
    Example: \( A = \{2, 4, 6\} \), \( B = \{4, 5, 6\} \), so \( A \cap B = \{4, 6\} \).

  • Closed Under Complement: If \( A \in \mathsf{E} \), then \( A^c \in \mathsf{E} \), where \( A^c = \mathbb{S} \setminus A \).
    Example: \( A = \{2, 4, 6\} \), \( A^c = \{1, 3, 5\} \).

Probability Measure \( \Pr(A) \)#

Definition: A probability measure \( \Pr(A) \) is a function that assigns a numerical value to an event \( A \), representing the likelihood of its occurrence.

  • Other notation: \( P(A) \).

  • Not that it is different from \(p (A)\) which is a value of a probability.

Assigning Probabilities#

The assignment of probabilities to events is a fundamental task in probability theory.

Mathematically, any assignment that satisfies the axioms of probability is valid.

However, depending on the context, different approaches are used to assign probabilities to specific events.

Two widely recognized approaches are the classical approach and the relative frequency approach.

Classical Approach#

The classical approach to probability is used in experiments where all outcomes are equally likely.

This method involves specifying all possible outcomes of the experiment, referred to as atomic outcomes.

Atomic outcomes are the most basic events in a sample space and cannot be decomposed further.

If there are \( M \) mutually exclusive and exhaustive atomic outcomes, the probability of each atomic outcome is assigned as:

\[ \Pr(\text{Atomic outcome}) = \frac{1}{M}. \]

Example:
Consider rolling a fair six-sided die. The sample space consists of six atomic outcomes: \( \{1, 2, 3, 4, 5, 6\} \). Since the die is fair, the probability of each outcome is:

\[ \Pr(\text{each side}) = \frac{1}{6}. \]

The classical approach assumes symmetry and equal likelihood among all outcomes, making it ideal for situations like dice rolls, coin tosses, or card draws from a well-shuffled deck.

Relative Frequency Approach#

The relative frequency approach defines probability based on repeated observations or experiments. This method is particularly useful when outcomes are not assumed to be equally likely or when probabilities need to be estimated empirically.

The probability of an event \( A \) is calculated as the long-run relative frequency of \( A \) occurring in repeated trials of the experiment. Mathematically, this is expressed as:

\[ \Pr(A) = \lim_{n \to \infty} \frac{n_A}{n}, \]

where:

  • \( n_A \): The number of times event \( A \) occurs.

  • \( n \): The total number of trials.

Example:
To estimate the probability of heads in a coin toss, the coin can be flipped \( n \) times. If heads appears \( n_{\text{heads}} \) times, the relative frequency approximation is:

\[ \Pr(\text{Heads}) \approx \frac{n_{\text{heads}}}{n}. \]

As \( n \) increases, the relative frequency converges to the true probability.

The classical approach is theoretical and assumes symmetry, while the relative frequency approach is empirical and relies on observed data. Both methods are valid as long as they adhere to the axioms of probability.

Axioms of Probability#

  • The axioms of probability are foundational principles that form the basis of probability theory.

  • The term axiom implies that these statements are assumed to be self-evident truths and do not require proof.

Three Governing Axioms

  1. Nonnegativity: Ensures all probabilities are nonnegative.

  2. Normalization: Ensures the total probability across the sample space equals 1.

  3. Additivity: Governs how probabilities of disjoint events are combined.

Axiom I: Nonnegativity#

  • Statement: For any event \( A \), the probability \( \Pr(A) \geq 0 \).

  • Implication: Probabilities cannot be negative. They represent nonnegative values between 0 and 1, where 0 indicates impossibility and values closer to 1 represent higher likelihoods.

This axiom ensures the mathematical consistency of probability theory, as negative probabilities are not physically or logically meaningful.

Axiom II: Normalization#

  • Statement: The probability of the entire sample space \( \mathbb{S} \) is \( \Pr(\mathbb{S}) = 1 \).

  • Implication: The sum of probabilities over all possible outcomes in a sample space must equal 1, reflecting the certainty that one of the outcomes will occur.

This axiom establishes a reference point, ensuring all probabilities are normalized and meaningful within a finite range.

Axiom III: Additivity#

  • Statement: For two disjoint events \( A \) and \( B \), where \( A \cap B = \varnothing \), the probability of their union is \( \Pr(A \cup B) = \Pr(A) + \Pr(B) \).

  • Implication: When events do not overlap (are mutually exclusive), the probability of either occurring is simply the sum of their individual probabilities.

This axiom provides a foundation for combining probabilities of independent or non-overlapping events in practical scenarios.

Generalization of Axiom III#

Corollary: Finite Additivity#

  • Statement: For \( M \) mutually exclusive events \( A_1, A_2, \ldots, A_M \), where \( A_i \cap A_j = \varnothing \) for \( i \neq j \), the probability of their union is the sum of their individual probabilities:

    \[ \Pr\left(\bigcup_{i=1}^{M} A_i\right) = \sum_{i=1}^{M} \Pr(A_i). \]
  • Implication: Extends Axiom III (Additivity) to a finite number of events.

This principle is often applied when analyzing probabilities across a finite number of disjoint scenarios, such as rolling a die or drawing from a shuffled deck.

Theorem: Infinite Additivity (Axiom “IV”)#

  • Statement: For an infinite sequence of mutually exclusive events \( A_1, A_2, A_3, \ldots \), where \( A_i \cap A_j = \varnothing \) for \( i \neq j \), the probability of their union is given by:

    \[ \Pr\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \Pr(A_i). \]
  • Implication: Extends the finite additivity rule to an infinite sequence of disjoint events. This property is fundamental to measure theory and provides the basis for dealing with infinite sample spaces, such as continuous probability distributions.

This is critical for advanced applications like integration in probability, where an infinite sum of probabilities converges to a meaningful value.

These results generalize the Additivity Axiom, ensuring that the probability framework holds consistently for both finite and infinite collections of mutually exclusive events.

Extended Axioms of Probability: Union of Two Events#

The basic axioms of probability provide a foundation for dealing with mutually exclusive events, but they do not directly address how to calculate the probability of the union of two events that are not mutually exclusive. This can be derived from the axioms and is formalized in the following theorem.

Theorem: Probability of the Union of Two Events#

Statement: For any two sets \( A \) and \( B \), not necessarily mutually exclusive, the probability of their union is given by:

\[ \Pr(A \cup B) = \Pr(A) + \Pr(B) - \Pr(A \cap B). \]

\( \Pr(A \cap B) \) is the joint probability of \( A \) and \( B \), denoted alternatively as \( \Pr(A, B) \). It represents the probability that both events \( A \) and \( B \) occur simultaneously.

Calculating the Joint Probability#

In cases where \( A \) and \( B \) are not mutually exclusive, the joint probability can be estimated using the relative frequency approach. Let:

  • \( n_{A,B} \): The number of trials in which both \( A \) and \( B \) occur simultaneously.

  • \( n \): The total number of trials.

The joint probability is then defined as:

\[ \Pr(A, B) = \lim_{n \to \infty} \frac{n_{A,B}}{n}. \]

We can see that:

  • This formula estimates \( \Pr(A, B) \) by observing the proportion of trials where \( A \) and \( B \) occur together.

  • As the number of trials (\( n \)) increases, the relative frequency converges to the true probability.

The formula for \( \Pr(A \cup B) \) accounts for the overlap between \( A \) and \( B \) to avoid double-counting. This theorem, combined with the relative frequency approach, provides a practical way to handle probabilities for events that are not mutually exclusive.

Example: Joint Probability in Rolling A Dice#

Consider the experiment of rolling a six-sided fair die. The sample space is:

\[ \mathbb{S} = \{1, 2, 3, 4, 5, 6\}. \]

Now, define two events:

  • \( A \): The outcome is an even number (\( A = \{2, 4, 6\} \)).

  • \( B \): The outcome is greater than 3 (\( B = \{4, 5, 6\} \)).

Identify the Joint Event The joint event \( A \cap B \) represents outcomes that satisfy both conditions:

  • The number is even (from \( A \)), and

  • The number is greater than 3 (from \( B \)).

From the sample space, \( A \cap B = \{4, 6\} \).

Calculate the Joint Probability Since the die is fair, each outcome has an equal probability of \( \frac{1}{6} \). The joint probability \( \Pr(A \cap B) \) is the sum of probabilities of the outcomes in \( A \cap B \):

\[ \Pr(A \cap B) = \Pr(\{4\}) + \Pr(\{6\}) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}. \]

Verify with the Union Formula (Optional) Using the formula for the union of two events:

\[ \Pr(A \cup B) = \Pr(A) + \Pr(B) - \Pr(A \cap B), \]

we calculate:

  • \( \Pr(A) = \frac{3}{6} = \frac{1}{2} \) (even numbers: \( 2, 4, 6 \)),

  • \( \Pr(B) = \frac{3}{6} = \frac{1}{2} \) (numbers greater than 3: \( 4, 5, 6 \)),

  • \( \Pr(A \cap B) = \frac{2}{6} = \frac{1}{3} \).

Substitute these into the formula:

\[ \Pr(A \cup B) = \frac{1}{2} + \frac{1}{2} - \frac{1}{3} = \frac{3}{3} = 1. \]

This confirms the probabilities are consistent.

We can say that the joint probability \( \Pr(A \cap B) = \frac{1}{3} \) quantifies the likelihood of rolling a number that is both even and greater than 3.

Simulation using Relative Frequency Approach#

The simulation implements the Relative Frequency Approach to approximate the joint probability by repeatedly observing outcomes and calculating the ratio \( n_{A,B}/n \).

This approach is practical for estimating probabilities empirically when theoretical calculations are difficult or when validating theoretical results.

import numpy as np

# Simulation parameters
num_trials = 100000  # Number of dice rolls
outcomes = np.random.randint(1, 7, size=num_trials)  # Simulate dice rolls (1 to 6)

# Define the events
A = (outcomes % 2 == 0)  # Event A: Even number (2, 4, 6)
B = (outcomes > 3)       # Event B: Greater than 3 (4, 5, 6)

# Joint event: A and B
joint_event = A & B
joint_count = np.sum(joint_event)  # Count the joint occurrences

# Joint probability
joint_probability = joint_count / num_trials

joint_count, joint_probability
(33175, 0.33175)

This simulation uses the Relative Frequency Approach to estimate the joint probability, i.e.:

\[ \Pr(A, B) = \lim_{n \to \infty} \frac{n_{A,B}}{n}. \]

where:

Experiment Trials (\( n \)):
The simulation runs \( n = 100,000 \) dice rolls, which represents a large number of trials.

Count of Joint Event (\( n_{A,B} \)):
The code calculates the number of outcomes where both conditions (\( A \) and \( B \)) are satisfied. This count is \( n_{A,B} \), the occurrences of the joint event.

Relative Frequency Estimate:
The joint probability is computed as:

\[ \Pr(A, B) = \frac{n_{A,B}}{n}. \]

This is directly implemented in the code:

joint_probability = joint_count / num_trials

Convergence to True Probability:
As \( n \to \infty \), the relative frequency converges to the theoretical probability. While the code uses \( n = 100,000 \), this large number of trials provides a good approximation of the true joint probability.

The Relative Frequency Approach used in the code is closely related to the Monte Carlo simulation method.

Properties of Probability#

Probability theory is governed by several important properties that follow directly from the axioms of probability.

These properties provide essential tools for reasoning about and calculating probabilities.

Property 1: Probability of an Impossible Event#

The probability of an impossible event is always zero:

\[ \Pr(\varnothing) = 0. \]

This means that an event that cannot occur under any circumstances has a probability of zero.

Example: Rolling a 7 with a standard six-sided die is impossible, so \( \Pr(\text{7}) = 0 \).

Property 2: Complement Rule#

For any event \( A \), the probability of its complement \( \bar{A} \) (the event that \( A \) does not occur) is:

\[ \Pr(\bar{A}) = 1 - \Pr(A). \]

The complement rule reflects the fact that the total probability of all possible outcomes in the sample space is 1.

Example: If the probability of rolling an even number on a die is \( \Pr(A) = \frac{1}{2} \), then the probability of rolling an odd number is \( \Pr(\bar{A}) = 1 - \frac{1}{2} = \frac{1}{2} \).

Property 3: Subset Rule#

If event \( A \) is a subset of event \( B \) (\( A \subset B \)), then:

\[ \Pr(A) \leq \Pr(B). \]

This property reflects the idea that the probability of a smaller event (a subset) cannot exceed the probability of a larger event that contains it.

Example: Let \( A \) be the event of rolling a number less than 3 (\( \{1, 2\} \)) and \( B \) be the event of rolling an odd number (\( \{1, 3, 5\} \)). Since \( A \subset B \), \( \Pr(A) \leq \Pr(B) \).

Property 4: Additivity for Exhaustive and Disjoint Events#

If \( A_1, A_2, \ldots, A_N \) are \( N \) disjoint events such that their union covers the entire sample space \( S \) (\( A_1 \cup A_2 \cup \ldots \cup A_N = S \)), then:

\[ \Pr(A_1) + \Pr(A_2) + \ldots + \Pr(A_N) = 1. \]

This property follows directly from the normalization axiom and the additivity axiom.

Example: In a rolling a die experiment, the six outcomes (\( A_1 = \{1\}, A_2 = \{2\}, \ldots, A_6 = \{6\} \)) are disjoint and exhaustive. Thus:

\[ \Pr(A_1) + \Pr(A_2) + \ldots + \Pr(A_6) = 1. \]

We can see that these properties extend the axioms of probability, making them practical for solving problems and analyzing probabilistic systems.

Conditional Probability#

The probability of an event \( A \) can often depend on the occurrence of another event \( B \). When we know that \( B \) has occurred, the likelihood of \( A \) may change based on this information.

This revised probability is called the conditional probability of \( A \) given \( B \). It quantifies the likelihood of \( A \) occurring under the assumption that \( B \) is true.

The shorthand notation \( \Pr(A|B) \) is used to represent this conditional probability:

  • \( \Pr(A|B) \): The probability of \( A \) given that \( B \) has occurred.

  • This is often referred to as “the probability of \( A \) conditional on \( B \).”

Conditional probabilities allow us to refine our assessments of likelihood when partial information about the system or experiment is available. The relationship between conditional and joint probabilities forms the foundation for deeper concepts in probability theory.

This is particularly useful when additional information is available about the occurrence of related events.

Mathematical Description#

For two events \( A \) and \( B \), the conditional probability of \( A \) given \( B \), denoted by \( \Pr(A|B) \), is defined as:

\[ \Pr(A|B) = \frac{\Pr(A, B)}{\Pr(B)}, \]

where \( \Pr(B) > 0 \).

This measures the probability of \( A \) happening under the condition that \( B \) has already occurred.

Note that the denominator \( \Pr(B) \) ensures that the conditioning event \( B \) has a non-zero probability.

Computing Joint Probabilities#

The relationship between joint probability and conditional probability is given by:

\[ \Pr(A, B) = \Pr(B|A)\Pr(A) = \Pr(A|B)\Pr(B). \]

This formula is particularly useful because:

  • Conditional probabilities can often be easier to calculate than joint probabilities.

  • It provides a straightforward method for breaking down complex problems.

Example: Conditional Probability in a Rolling A Dice Experiment#

Consider the experiment of rolling a fair six-sided die. The sample space is:

\[ \mathbb{S} = \{1, 2, 3, 4, 5, 6\}. \]

Events

  1. Event \( A \): The outcome is an even number (\( A = \{2, 4, 6\} \)).

  2. Event \( B \): The outcome is less than or equal to 3 (\( B = \{1, 2, 3\} \)).

Compute the Conditional Probability \( \Pr(A|B) \)
The conditional probability \( \Pr(A|B) \) represents the probability of rolling an even number (\( A \)) given that the outcome is less than or equal to 3 (\( B \)).

From the definition of conditional probability:

\[ \Pr(A|B) = \frac{\Pr(A \cap B)}{\Pr(B)}. \]
  • Find \( A \cap B \): The intersection of \( A \) and \( B \) (even numbers that are less than or equal to 3) is \( A \cap B = \{2\} \).

  • Calculate \( \Pr(A \cap B) \): The probability of \( A \cap B \) is:

    \[ \Pr(A \cap B) = \frac{\text{Number of outcomes in } A \cap B}{\text{Total outcomes in } \mathbb{S}} = \frac{1}{6}. \]
  • Calculate \( \Pr(B) \): The probability of \( B \) (outcomes less than or equal to 3) is:

    \[ \Pr(B) = \frac{\text{Number of outcomes in } B}{\text{Total outcomes in } \mathbb{S}} = \frac{3}{6} = \frac{1}{2}. \]
  • Compute \( \Pr(A|B) \):

    \[ \Pr(A|B) = \frac{\Pr(A \cap B)}{\Pr(B)} = \frac{\frac{1}{6}}{\frac{1}{2}} = \frac{1}{3}. \]

Given that the outcome is less than or equal to 3 (\( B \)), the probability of rolling an even number (\( A \)) is \( \frac{1}{3} \). Without this information, the unconditional probability of \( A \) would be \( \frac{1}{2} \).

We can see that the conditional probability \( \Pr(A|B) \) reflects how the likelihood of \( A \) (rolling an even number) is updated based on the knowledge that \( B \) (the outcome is less than or equal to 3) has occurred.

Numerical Results#

import numpy as np

# Simulation parameters
num_trials = 100000  # Number of dice rolls
outcomes = np.random.randint(1, 7, size=num_trials)  # Simulate dice rolls (1 to 6)

# Define events
A = (outcomes % 2 == 0)  # Event A: Outcome is even
B = (outcomes <= 3)  # Event B: Outcome is less than or equal to 3

# Compute intersection of A and B
A_and_B = A & B  # Event A ∩ B: Even numbers less than or equal to 3

# Conditional probability
Pr_B = np.sum(B) / num_trials  # Probability of event B
Pr_A_given_B = np.sum(A_and_B) / np.sum(B)  # Conditional probability Pr(A|B)

Pr_A_given_B
0.33829458608308427

Extension to Multiple Events#

Conditional probability can be extended to calculate joint probabilities for three or more events. For example:

  1. Three Events: The joint probability of events \( A \), \( B \), and \( C \) is:

    \[ \Pr(A, B, C) = \Pr(C|A, B)\Pr(B|A)\Pr(A). \]
  2. General Case for \( M \) Events: For \( M \) events \( A_1, A_2, \ldots, A_M \), the joint probability is:

    \[ \Pr(A_1, \ldots, A_M) = \Pr(A_M|A_1, \ldots, A_{M-1}) \Pr(A_{M-1}|A_1, \ldots, A_{M-2}) \cdots \Pr(A_2|A_1)\Pr(A_1). \]

This sequential approach allows the decomposition of a complex probability into a series of conditional probabilities.

We can see that conditional probability is a foundational concept in probability theory that provides a structured way to compute probabilities under known conditions. By leveraging its relationship with joint probabilities, it enables the analysis of intricate systems involving multiple events.

Bayes’ Theorem#

The two fundamental theorems provide powerful tools for reasoning about probabilities in complex systems involving conditional probabilities and mutually exclusive events are Bayes’ theorem and Total probability theorem.

Bayes’ Theorem (for Two Events)#

Bayes’ Theorem relates the conditional probabilities of two events \( A \) and \( B \). For \( \Pr(B) \neq 0 \):

\[ \Pr(A|B) = \frac{\Pr(B|A)\Pr(A)}{\Pr(B)}. \]

where:

  • \( \Pr(A|B) \): The probability of \( A \) given \( B \).

  • \( \Pr(B|A) \): The probability of \( B \) given \( A \).

  • \( \Pr(A) \): The prior probability of \( A \).

  • \( \Pr(B) \): The probability of \( B \), which acts as a normalizing constant.

Alternative Representation of Bayes’ Theorem#

Bayes’ Theorem can be expressed in terms of a hypothesis \( H \) and observed evidence \( E \).

This representation is particularly useful in applications like Bayesian inference, where the goal is to update beliefs about a hypothesis based on observed data.

\[ \Pr(H \mid E) = \frac{\Pr(E \mid H)}{\Pr(E)} \Pr(H), \]

where:

  • \( \Pr(H) \): The Prior
    The initial belief about the probability of the hypothesis \( H \) before observing any evidence.

  • \( \Pr(E) \): The Marginal
    The overall probability of observing the evidence \( E \), computed as: $\( \Pr(E) = \sum_{i} \Pr(E \mid H_i) \Pr(H_i), \)\( if there are multiple hypotheses \) H_i $.

  • \( \Pr(E \mid H) \): The Likelihood
    The probability of observing the evidence \( E \), assuming the hypothesis \( H \) is true.

  • \( \Pr(H \mid E) \): The Posterior
    The updated belief about the probability of \( H \) after considering the evidence \( E \).

We can understand them as:

  • Prior (\( \Pr(H) \)): Encodes what we initially believe about \( H \).

  • Likelihood (\( \Pr(E \mid H) \)): Reflects how well \( H \) explains \( E \).

  • Marginal (\( \Pr(E) \)): Normalizes the posterior to ensure probabilities sum to 1.

  • Posterior (\( \Pr(H \mid E) \)): The revised belief about \( H \) after incorporating \( E \).

Generalization of Bayes’ Theorem#

Bayes’ Theorem can be extended to handle scenarios involving multiple hypotheses or events. This generalized form is particularly useful when the sample space is divided into mutually exclusive and exhaustive events.

Statement: Let \( B_1, B_2, \ldots, B_n \) be a set of mutually exclusive and exhaustive events such that:

\[ B_i \cap B_j = \varnothing \quad \text{for all } i \neq j \quad \text{and} \quad \bigcup_{i=1}^n B_i = S. \]

For any event \( A \) where \( \Pr(A) > 0 \), the probability of \( B_i \) given \( A \) is:

\[ \Pr(B_i|A) = \frac{\Pr(A|B_i)\Pr(B_i)}{\sum_{i=1}^{n} \Pr(A|B_i)\Pr(B_i)}. \]

where:

  • \( \Pr(B_i) \): A Priori Probability
    The initial probability of the hypothesis \( B_i \), before observing evidence \( A \). This reflects prior knowledge or belief about \( B_i \).

  • \( \Pr(A|B_i) \): Likelihood
    The probability of observing \( A \) under the assumption that \( B_i \) is true. This measures how well the evidence \( A \) supports \( B_i \).

  • \( \Pr(B_i|A) \): A Posteriori Probability
    The updated probability of \( B_i \), incorporating the new evidence \( A \). This is the result of applying Bayes’ Theorem.

  • \( \sum_{i=1}^n \Pr(A|B_i)\Pr(B_i) \): Normalization Constant
    Ensures that the sum of posterior probabilities across all hypotheses equals 1.

We can see that Bayes’ theorem in this form allows us to update the probabilities of multiple hypotheses \( B_1, B_2, \ldots, B_n \) based on new evidence \( A \). This framework is foundational for probabilistic reasoning and inference.

Total Probability Theorem#

The Total Probability theorem expands the computation of probabilities when events are partitioned into mutually exclusive and exhaustive subsets.

Let \( B_1, B_2, \ldots, B_n \) be a set of such events, satisfying:

  • Mutually Exclusive: \( B_i \cap B_j = \varnothing \) for \( i \neq j \).

  • Exhaustive: \( \bigcup_{i=1}^n B_i = S \), meaning the events cover the entire sample space.

Statement: The probability of any event \( A \) can then be expressed as:

\[ \Pr(A) = \sum_{i=1}^n \Pr(A|B_i)\Pr(B_i). \]

where:

  • \( \Pr(A|B_i) \): The probability of \( A \) conditioned on \( B_i \).

  • \( \Pr(B_i) \): The prior probability of \( B_i \).

We can see that:

  • Bayes’ Theorem: Allows for updating probabilities based on evidence.

  • Theorem of Total Probability: Breaks down probabilities into contributions from mutually exclusive cases.

These theorems provide a robust framework for handling conditional and joint probabilities in diverse applications.

Independent Events#

Key Concepts#

Two events \( A \) and \( B \) are considered independent if the occurrence of one provides no information about the occurrence of the other. Mathematically, this is expressed as:

\[ \Pr(B|A) = \Pr(B) \quad \text{and} \quad \Pr(A|B) = \Pr(A). \]

We can interpret as:

  • Knowledge of the occurrence of \( A \) does not affect the probability of \( B \), and vice versa.

  • The probabilities of the two events remain unaffected by each other.

Mathematical Implication
From the definition of conditional probability:

\[ \Pr(A|B) = \frac{\Pr(A \cap B)}{\Pr(B)}, \]

the condition \( \Pr(A|B) = \Pr(A) \) leads to:

\[ \Pr(A \cap B) = \Pr(A)\Pr(B). \]

This relationship is the defining criterion for statistical independence.

Formal Definition#

Two events \( A \) and \( B \) are statistically independent if and only if:

\[ \Pr(A \cap B) = \Pr(A)\Pr(B). \]

Note that:

  • Special Case When \( \Pr(B) = 0 \):
    Independence still holds, even if \( \Pr(B) = 0 \), as conditional probabilities \( \Pr(A|B) \) are undefined in this case. The definition \( \Pr(A \cap B) = \Pr(A)\Pr(B) \) remains valid.

  • Symmetry of Independence:
    If \( A \) is independent of \( B \), then \( B \) is also independent of \( A \). Independence is a symmetric property.

We can see that:

  • Independence implies no probabilistic influence between events.

  • The relationship \( \Pr(A \cap B) = \Pr(A)\Pr(B) \) is central to reasoning about independent events.

  • Independence does not imply mutual exclusivity; mutually exclusive events (\( \Pr(A \cap B) = 0 \)) are generally not independent unless \( \Pr(A) = 0 \) or \( \Pr(B) = 0 \).

Generalization of Independence#

Independence of Multiple Events#

The concept of independence can be extended to a collection of events. A set of events \( A_1, A_2, \dots, A_n \) is said to be independent if the following conditions hold:

  • Subset Independence: Any subset of \( k < n \) events from \( \{A_1, A_2, \dots, A_n\} \) are independent.

  • Joint Independence: The joint probability of all \( n \) events satisfies:

    \[ \Pr(A_1, A_2, \dots, A_n) = \Pr(A_1)\Pr(A_2)\dots\Pr(A_n). \]

We can interpret it as the independence across a collection of events requires both pairwise independence and higher-order independence for all subsets.

The concept of independence has two main applications:

  • Testing for Independence:

    • Approach: Compute joint or conditional probabilities and compare them against the definitions of independence.

    • Example: Use the condition \( \Pr(A \cap B) = \Pr(A)\Pr(B) \) to test if two events \( A \) and \( B \) are independent.

  • Assuming Independence:

    • Approach: Assume independence to simplify the computation of joint or conditional probabilities, especially in complex systems where direct calculation is infeasible.

    • Use Case: This is widely applied in engineering and other fields, where modeling assumes independent components to enable tractable analysis.

We can see that:

  • Independence simplifies the computation of joint probabilities.

  • Independence assumptions are practical and frequently used in real-world applications, especially in communication systems, e.g., independent and identically distributed (i.i.d), and machine learning.