In part 1, I used a card shuffling model for WOWY. It is, I think, a model that substantially underestimates the variability in these systems. First, let me explain why:
Not playing with a full deck.
(Insert your own joke here.) The card shuffling model only works if you have a fairly full deck. Imagine you were using Monte Carlo to estimate the probability of various poker hands. You would need a 52 card deck to get correct results. If you lost a couple cards, and used a 49 or 50 card deck, your results would still be pretty good. If you are using a 20 card deck, your results will be pretty useless. Unfortunately, constructing the deck the way I did in part 1 gives us a very incomplete deck.
When I looked at Corsi error, I talked about lambda (λ), the event rate. I showed that λ varies dramatically from game to game over the course of the season. If λ were constant, or mostly constant, this deck would probably be complete enough.
Suppose David Backes has a game where, in 20 shifts, he has 20 offensive Corsi events for and 12 defensive Corsi events against. This model would weight the possible outcomes as:

8 
9 
10 
11 
12 
13 
14 
15 
16 
15 
0 
0 
0 
0 
0 
0 
0 
0 
0 
16 
0 
0 
0 
0 
0 
0 
0 
0 
0 
17 
0 
0 
0 
0 
0 
0 
0 
0 
0 
18 
0 
0 
0 
0 
0 
0 
0 
0 
0 
19 
0 
0 
0 
0 
0 
0 
0 
0 
0 
20 
0 
0 
0 
0 
100 
0 
0 
0 
0 
21 
0 
0 
0 
0 
0 
0 
0 
0 
0 
22 
0 
0 
0 
0 
0 
0 
0 
0 
0 
23 
0 
0 
0 
0 
0 
0 
0 
0 
0 
24 
0 
0 
0 
0 
0 
0 
0 
0 
0 
25 
0 
0 
0 
0 
0 
0 
0 
0 
0 
But with λOff = 1.0, and λDef = 0.6, a 20 shift game might not result in exactly 20 For events or exactly 12 Against events. You might get 22 For and 9 Against. Or 18 For and 14 Against. The weightings for the outcomes looks more like:

8 
9 
10 
11 
12 
13 
14 
15 
16 
15 
0.34 
0.45 
0.54 
0.59 
0.59 
0.55 
0.47 
0.47 
0.28 
16 
0.42 
0.56 
0.68 
0.74 
0.74 
0.68 
0.58 
0.58 
0.35 
17 
0.50 
0.66 
0.80 
0.87 
0.87 
0.80 
0.69 
0.69 
0.41 
18 
0.55 
0.74 
0.88 
0.97 
0.97 
0.89 
0.76 
0.76 
0.46 
19 
0.58 
0.78 
0.93 
1.02 
1.02 
0.94 
0.80 
0.80 
0.48 
20 
0.58 
0.78 
0.93 
1.02 
1.02 
0.94 
0.80 
0.80 
0.48 
21 
0.55 
0.74 
0.89 
0.97 
0.97 
0.89 
0.77 
0.77 
0.46 
22 
0.50 
0.67 
0.81 
0.88 
0.88 
0.81 
0.70 
0.70 
0.42 
23 
0.44 
0.58 
0.70 
0.76 
0.76 
0.71 
0.61 
0.61 
0.36 
24 
0.37 
0.49 
0.58 
0.64 
0.64 
0.59 
0.50 
0.50 
0.30 
25 
0.29 
0.39 
0.47 
0.51 
0.51 
0.47 
0.40 
0.40 
0.24 
These are percents! The observed outcome of 20 for and 12 against is only about 1% of the possible outcomes. Even this table is an underestimate of the range of possibilities. I used a Poisson calculator to get these values. Corsi events are more spread out than Poisson events. I also cut off the marginal values below 5%. As a result, this table only shows about 65% of the total possible outcomes. To see the full magnitude of the variability, we need to use a full two part Monte Carlo like I used in the Corsi error articles.
Two Part Monte Carlo
Similarly to the Corsi Error articles: Choose a game at random. Use the event table to create a probability density function for that pair of λ. Use a random number generator to "play" that game and create a set of results. Repeat as needed to get the desired number of With shifts. Do the same thing to get the desired number of Without shifts. Calculate the results. Repeat over and over. Essentially, here we are making a deck "onthefly".
Over the last 5 seasons, David Backes has played 365 games and 7077 shifts at 5v5. That works out to about 1551 shifts in a hypothetical 80 game season. I rounded this up to 1600 shifts to make the math easier. I looked at 5% With/95% Without (80/1520), 10% With/90% Without (160/1440), etc., on up to 50% With/50% Without (800/800). The smaller samples had larger standard deviations (as you would expect).
Corsi With
Percent Sample 
Standard Deviation 
95% Confidence Interval 
5% 
12.2% 
+/ 24.0% 
10% 
9.0% 
+/ 17.7% 
20% 
6.6% 
+/ 12.9% 
30% 
5.4% 
+/ 10.7% 
40% 
4.7% 
+/ 9.2% 
50% 
4.2% 
+/ 8.3% 
Corsi Without
Percent Sample 
Standard Deviation 
95% Confidence Interval 
95% 
3.1% 
+/ 6.0% 
90% 
3.2% 
+/ 6.2% 
80% 
3.4% 
+/ 6.6% 
70% 
3.6% 
+/ 7.0% 
60% 
3.9% 
+/ 7.6% 
50% 
4.3% 
+/ 8.3% 
WDiff
Percent Sample 
Standard Deviation 
95% Confidence Interval 
5%/95% 
12.7% 
+/ 24.8% 
10%/90% 
9.6% 
+/ 18.8% 
20%/80% 
7.4% 
+/ 14.5% 
30%/70% 
6.6% 
+/ 12.9% 
40%/60% 
6.1% 
+/ 12.0% 
50%/50% 
6.0% 
+/ 11.8% 
With 80 shifts together, the 95% Confidence Interval for WDiff is +/ 24.8%. Even with 800 shifts together and 800 shifts apart, the 95% Confidence Interval for WDiff is still +/ 11.8%
Comparing the models from Part 1 and Part 4
The two part Monte Carlo gives much larger standard deviations (and 95% Confidence Intervals).
Percent Sample 
Monte Carlo SD 
Card Shuffling SD 
10%/90% 
9.6% 
4.8% 
20%/80% 
7.4% 
3.6% 
30%/70% 
6.6% 
3.1% 
40%/60% 
6.1% 
2.9% 
50%/50% 
6.0% 
2.8% 
The Monte Carlo standard deviations (and Confidence Intervals) are roughly twice a large as the one from the Card Shuffling model. If you use the 95% Confidence Intervals from the Monte Carlo and look at the Blues 201314 results, there are no WOWY pairs above the upper limit of the confidence intervals. There is only one pair that still falls below the lower limits. Jackman and Bouwmeester had 28 minutes and 38 seconds of 18.4% magic together. Jackman was 54.0% without Bouwmeester and Bouwmeester was 54.6% without Jackman.