Mathematics, philosophy, code, travel and everything in between. More about me…

I write about

Thoughts on weighing observations

Suppose you have a set of observations (measurements) and want to assess how well they fall into an ideal target range. Here are a few thoughts on how to go beyond the most obvious measure: percentage of “in-range values”.

Suppose our observations lie in <$ (X, \rho) $>, a metric space, and our target range is a subset <$ S \subset X $> closed in <$ X. $> For each observation <$ x \in X $> we want to calculate its weight <$ \omega(x) \in \lbrack 0, 1 \rbrack. $> The total weight of a set of observation will then naturally be

<$$ \omega(\Xi) := \frac{1}{|\Xi|} \sum_{x \in \Xi} \omega(x), \quad \Xi \subset X. $$>

Properties of <$ \omega $> weights

Feel free to skip to examples of specific weighting functions if you are not interested in an abstract discussion of their possible properties.

Note on notation. For <$ x \in X, A \subset X $> let <$ \rho(x, A) := \inf \{ \rho(x, y): y \in A \} $> be the distance of <$ x $> from the set <$ A. $> Let <$ \overline{A} $> denote the closure of <$ A \subset X $> in <$ X, $> let <$ \mathrm{int}\; A $> denote the interior of <$ A $> and let <$ \partial A := \overline{A} \setminus \mathrm{int}\; A $> be the boundary of <$ A. $>

Let us consider some properties we would expect this <$ \omega: X \to \lbrack 0, 1 \rbrack $> to have. First, I regard these as essential:

  1. <$ \forall s \in S: \omega(s) = 1. $>
    All observations within the target range have maximum weight.
  2. <$ \forall x, y \in X: \rho(x, S) = \rho(y, S) \Rightarrow \omega(x) = \omega(y). $>
    <$ \omega(x) $> is in fact a function of the distance of <$ x $> from <$ S. $> We could be thinking about a real function <$ \omega_{\rho}: \lbrack 0, \infty ) \to \lbrack 0, 1 \rbrack, $> <$ \rho(x, S) \mapsto \omega_{\rho}(\rho(x, S)) $> instead of the abstract <$ \omega: X \to \lbrack 0, 1 \rbrack. $>
  3. <$ \forall x, y \in X: \rho(x, S) \lt \rho(y, S) \Rightarrow \omega(x) \ge \omega(y). $>
    Observations farther from target have smaller weight.
  4. <$ \forall x \in X: \omega(x) \to 0 $> as <$ \rho(x, S) \to \infty. $>
    The weight of outliers approaches zero.

Some properties are rather desirable:

  1. <$ \omega \in C_0(X). $>
    <$ \omega $> is continuous on <$ X. $> Prevents sudden drops in weight.
  2. <$ \omega \in C_0(\partial S). $>
    Weaker form of (v). Guarantees smoothish drop-off at least around the boundary of the target. Obviously (v) <$ \Rightarrow $> (vi).
  3. <$ \omega \in C_1(X). $>
    <$ \omega $> is differentiable on <$ X. $> Guarantees smooth weight transitions. Obviously (vii) <$ \Rightarrow $> (v).
  4. <$ \omega \in C_1(\partial S). $>
    Weaker form of (vii). Guarantees smooth weight transition at least on the boundary of the target. Obviously (vii) <$ \Rightarrow $> (viii).

And finally, some properties are application dependent.

  1. <$ \forall x \in X: \omega(x) \gt 0. $>
    All weights are positive.
  2. <$ \rho(x, S) \mapsto \omega_{\rho}(\rho(x, S)) $> is (strictly) concave on <$ X. $>
    Decrease in weight speeds up with growing distance from the target.

Examples of <$ \omega $> functions

I used the following setup for the example plots. This target range just happens to be specific to my application.

<$$ X = \mathbb{R}, \\ \rho = \rho_e \quad\mathrm{(Euclidean)}, \\ S = \lbrack 3.6, 7.8 \rbrack. $$>

Discrete weight (in-range indicator)

<$$ \omega_d(x) := \begin{cases} 1, & x \in S, \ 0, & x \not\in S. \end{cases} $$>

The most basic binary indicator. Has the essential properties but that is all. Because it is discontinuous at <$ \partial S, $> it is too sensitive. For example, an observation of 7.9, mere 1.2% above the upper bound of 7.8, is immediately discarded with zero weight.

Polynomial weight

<$$ \omega_{P(\alpha, \beta)}(x) := \begin{cases} 1, & x \in S, \ \max \left\lbrace 1 - \left( \frac{\rho(x, S)}{\beta \mathrm{diam}\; S} \right)^{\alpha}, 0 \right\rbrace, & x \not\in S, \end{cases} $$>

where <$ \alpha, \beta \gt 0 $> and <$ \mathrm{diam}\; S := \sup \{ \rho(x, y): x, y \in S \} $> is the diameter of <$ S. $>

This looks complicated but is not. The distance of <$ x $> from <$ S $> is scaled by a <$ \beta $>-multiple of the “size” of <$ S $> and raised to the power of <$ \alpha. $> Negative weights are normalized to zero.

This weight function has properties the (i)–(vi) and (viii) (it is not differentiable <$ \forall x: \rho(x, S) = \beta \mathrm{diam}\; S $> where the polynomial hits the ground). It is concave <$ \forall x: \rho(x, S) < \beta \mathrm{diam}\; S $> when <$ \alpha \gt 1 $> but does not have property (ix) (it does assign zero weights).

The parameters <$ \alpha $> and <$ \beta $> will depend on your application and requirements. For example, let <$ \alpha = 2 $> and <$ \beta = \frac{1}{2} $> (weight is zero for points farther than one half of the target’s diameter):

Or <$ \alpha = 3 $> and <$ \beta = 1: $>

Instead of scaling by multiples of <$ \mathrm{diam}\; S $> we could perhaps scale by <$ \beta \sqrt{\strut\mathrm{Var}\;\Xi} $> or a <$ \beta $>-quantile for a sample of observations <$ \Xi \subset X. $>

Exponential weight

<$$ \omega_{\exp(\alpha)}(x) := \begin{cases} 1, & x \in S, \ \exp \left( -\alpha\rho(x, S) \right), & x \not\in S, \end{cases} $$>

<$ \alpha > 0. $> This function is interesting by being strictly positive. It is everywhere continuous and differentiable everywhere except for <$ \partial S. $> The sharp drop-off around <$ \partial S $> and its convexity might be problematic. <$ \alpha $> controls the speed of convergence to zero.

For example, <$ \alpha = \frac{1}{2}: $>

Much more could be written and explored in this area; these are just a few methods I have been playing with in a certain project. Other weighting functions can be considered and there is always the question of choosing parameters (they can either be fixed a priori or based on stochastic properties of the observation sample).

October 1, MMXIII — Mathematics.