ShapleyValue.md (3548B)
1 # Shapley Value 2 3 **Source:** [https://en.wikipedia.org/wiki/Shapley_value](https://en.wikipedia.org/wiki/Shapley_value) 4 5 **Definition:** The Shapley value is a method for distributing gains / costs among a group that contributed to an objective. 6 7 ## Collaborative Game Theory Formalization 8 9 Assume we have a group of players who are trying to win rewards by cooperating (forming a coalition). For a given coalition $S$ the reward function $v(S)$ gives the total sum of rewards for each member of the coalition when cooperating. 10 11 The Shapley value gives a way to divide the reward created by a coalition among its members. According to this metric: 12 13 $\phi(i) = \frac{1}{n!} \sum_R [v(P^R_i \cup \{i\}) - v(P_i^R)$ 14 15 Where $i$ is the player, $n$ is the total number of players, $R$ is all permutations of players, $v$ is defined as above, $P^R_i$ is the set of players from $N$ which preced $i$ in the order $R$. 16 17 ## Approximation for Machine Learning 18 19 In machine learning we don't have the luxury of computing true Shapley values because evaluating all possible coalitions of features is often too expensive, and we can't do counter-factual evaluations across all possible coalitions, so we settle for different approaches that attempt to quantify the same idea. One popular approximation uses Monte-Carlo sampling, referred to as the Monte Carlo Shapley value. Below is this formalization. 20 21 $\phi(i) = \frac{1}{n} \sum_{\sigma\in\Pi} [v([\sigma]_{i-1} \cup \{i\}) - v([\sigma]_{i-1})]$ 22 23 For a uniform smaple of permutations $\Pi$ of size $n$. 24 25 ### Example 26 27 1. Assume our baseline output $a$ is $E[f(X)]$. 28 2. Initialize $\phi$ as array of size $n$ with all zeroes 29 3. For m = 1 to M 30 - Sample a random permutation $\sigma$ of $\{1,2,...,n\}$ 31 - $S = \empty$ 32 - $\text{current\_value} = \text{baseline}$ 33 - for feature j in $\sigma$ 34 - $S = S \cup \{j\}$ 35 - new_value $= E[f(X) | X_S = x_S]$ 36 - $\phi[j] = \phi[j] + (\text{new\_value} - \text{current\_value})$ 37 - current_value = new_value 38 4. For i = 1 to n 39 - $\phi[i] = \phi[i] / M$ 40 41 ### Put Simply 42 43 To calculate the approximate Shapley value for a given sample $x$ from the dataset: 44 45 - We start with a baseline of the average output across our dataset $X$ with respect to the model $f$. We call this baseline $a$. 46 - We then create our baseline importance score vector with all zeroes, each index representing how much the given feature contributes to the output of the function $f$. 47 - We then repeat the following procedure M times: 48 1. Shuffle the features randomly 49 2. Start with the empty set of known features 50 3. Go through the features one by one in the shuffled order 51 - Replace the baseline feature's value for the specified feature with the feature's value in $x$. 52 - Compute how much this impacts the model's output 53 - current iteration's output - prior iteration's output wrt. $f$ 54 - Add this change to the feature's score in the importance score vector 55 - We then divide each feature's score from the importance score vector by $M$ to find the average change the feature caused wrt. the baseline. 56 57 The above calculates the approximate Shapley value for a single sample from the dataset where $M$ is the number of permutations we test for the given sample. We often do the above loop (inner loop) a number of times to average across randomly selected samples from our dataset to compute an approximation for the Shapley value across the dataset. 58 59 ### Implementation 60 61 TODO