notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

ShapleyValue.md (3548B)


      1 # Shapley Value
      2 
      3 **Source:** [https://en.wikipedia.org/wiki/Shapley_value](https://en.wikipedia.org/wiki/Shapley_value)
      4 
      5 **Definition:** The Shapley value is a method for distributing gains / costs among a group that contributed to an objective.
      6 
      7 ## Collaborative Game Theory Formalization
      8 
      9 Assume we have a group of players who are trying to win rewards by cooperating (forming a coalition). For a given coalition $S$ the reward function $v(S)$ gives the total sum of rewards for each member of the coalition when cooperating.
     10 
     11 The Shapley value gives a way to divide the reward created by a coalition among its members. According to this metric:
     12 
     13 $\phi(i) = \frac{1}{n!} \sum_R [v(P^R_i \cup \{i\}) - v(P_i^R)$
     14 
     15 Where $i$ is the player, $n$ is the total number of players, $R$ is all permutations of players, $v$ is defined as above, $P^R_i$ is the set of players from $N$ which preced $i$ in the order $R$.
     16 
     17 ## Approximation for Machine Learning
     18 
     19 In machine learning we don't have the luxury of computing true Shapley values because evaluating all possible coalitions of features is often too expensive, and we can't do counter-factual evaluations across all possible coalitions, so we settle for different approaches that attempt to quantify the same idea. One popular approximation uses Monte-Carlo sampling, referred to as the Monte Carlo Shapley value. Below is this formalization.
     20 
     21 $\phi(i) = \frac{1}{n} \sum_{\sigma\in\Pi} [v([\sigma]_{i-1} \cup \{i\}) - v([\sigma]_{i-1})]$
     22 
     23 For a uniform smaple of permutations $\Pi$ of size $n$. 
     24 
     25 ### Example
     26 
     27 1. Assume our baseline output $a$ is $E[f(X)]$.
     28 2. Initialize $\phi$ as array of size $n$ with all zeroes
     29 3. For m = 1 to M
     30     - Sample a random permutation $\sigma$ of $\{1,2,...,n\}$
     31         - $S = \empty$
     32         - $\text{current\_value} = \text{baseline}$
     33         - for feature j in $\sigma$
     34             - $S = S \cup \{j\}$
     35             - new_value $= E[f(X) | X_S = x_S]$
     36             - $\phi[j] = \phi[j] + (\text{new\_value} - \text{current\_value})$
     37             - current_value = new_value
     38 4. For i = 1 to n
     39     - $\phi[i] = \phi[i] / M$
     40 
     41 ### Put Simply
     42 
     43 To calculate the approximate Shapley value for a given sample $x$ from the dataset:
     44 
     45 - We start with a baseline of the average output across our dataset $X$ with respect to the model $f$. We call this baseline $a$.
     46 - We then create our baseline importance score vector with all zeroes, each index representing how much the given feature contributes to the output of the function $f$.
     47 - We then repeat the following procedure M times:
     48     1. Shuffle the features randomly
     49     2. Start with the empty set of known features
     50     3. Go through the features one by one in the shuffled order
     51          - Replace the baseline feature's value for the specified feature with the feature's value in $x$.
     52          - Compute how much this impacts the model's output
     53              - current iteration's output - prior iteration's output wrt. $f$
     54          - Add this change to the feature's score in the importance score vector
     55 - We then divide each feature's score from the importance score vector by $M$ to find the average change the feature caused wrt. the baseline.
     56 
     57 The above calculates the approximate Shapley value for a single sample from the dataset where $M$ is the number of permutations we test for the given sample. We often do the above loop (inner loop) a number of times to average across randomly selected samples from our dataset to compute an approximation for the Shapley value across the dataset.
     58 
     59 ### Implementation
     60 
     61 TODO