Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Metrics Notes

Table of contents
  1. Metrics Notes
    1. Entropy
    2. Perplexity
    3. References

Entropy

Visualization of the Entropy vs Probability distribution relationship
(Credit: ML Mastery)

An information content or uncertainty measurements. The higher the entropy, the harder to predict value of an random variable from a given distribution.

The entropy of a discrete random variable XX with distribution pp consisting of KK states is:

H(X)k=1Kp(X=k)=EX[logp(X)] \begin{aligned} H(X) &\triangleq -\sum^K_{k=1}{p(X=k)}\\ &=-\mathbb{E}_X[\log{p(X)}] \end{aligned}

Intuitions

For example, let XnX_n be a random variable from distribution pp: XnpX_n\sim p over KK states.

If we have three random variables Xn1,Xn2,Xn3X^1_n, X^2_n, X^3_n from three corresponding dist. p1,p2,p3p^1,p^2,p^3.

k P(X1=k)P(X^1=k) P(X2=k)P(X^2=k) P(X3=k)P(X^3=k)
k0k_0 0.25 0.75 0.5
k1k_1 0.75 0.25 0.5

Intuitively speaking, according to the given information, the prediction of X1X^1 and X2X^2 should be with higher confidence than X3X^3:

  • In X1X^1 case: we could somewhat predict that k1k_1 should likely to be observed. Likewise, it’s k0k_0 for X2X^2
  • However, for X3X^3 it’s impossible for any prediction (50:50).

Is there anyway to define a measurement to how “confident” our prediction would be with given information about KK states? By dividing and conquer this question, the sub question should be:

How to measure the prediction’s confidence for a given state KiK_i

  • XX is said to have low entropy, or rich amount of information; if using XX we could easily predict a specific event of P(X=ki)P(X=k_i)

Perplexity

To measure “predictability”. Given pp is a uniform dist. over KK state

References

  1. Murphy, K. P. (2022). Probabilistic Machine Learning: An introduction. MIT Press. probml.ai