Dice Probability


Probability…now there’s a joyous subject…
As time goes on in my ASL career I have started to get paranoid about certain sets of dice. In several games the dice gods seemed to be against me and, after a while, I started noticing that it was always the same dice against me and this started me thinking.
Now probability is a horrible topic for the uninitiated and perception can make things appear strange (i.e people assume that in the lottery the chances of 1,2,3,4,5,6 is worse than say 1,23,34,44,45,49 yet the probability of any stated range of numbers is exactly the same. It is just so unlikely that either would be seen. This is also why the ‘Monty Hall’ problem gives so many people conniptions  * )
For dice you could use the mean totals so you add up the totals of the 6 faces and divide by 6. This comes to 3.5 (obviously as the die start at 1 and end at 6 and not 0 to 6). If you sum the die in game and divide by the total rolls then how far that deviates from 3.5 is a good indication of a dodgy dice. Especially if the sample dice size increases considerably. That is not good enough though as (for example) rolling 600 times a d6 and getting 3 every single time would produce a mean of ‘3’ which appears reasonable but is absolutely not.
So I would need something better. I resolved to test my hypothesis by the following method. I would track the suspicious dice through an entire game writing down each roll (white die first). As long as the amount of rolls was reasonably statistically significant (60 double rolls minimum) then I could calculate how far each face differed from the expected mean. For ease of calculations any rolls that do not divide to 1.0 at the end of the game would get ignored (so 60 rolls good, 61,62,63,64,65 bad and ignored)
First up we need to total the amount of times each dice is rolled and track the expected amount of times we would expect (in an exactly equal universe) that face to appear.
i.e out of 150 rolls
1   55 times
2  25 times
3 25 times
4 15 times
5 15 times
6 15 times
In this experiment the expected total is for each face to appear  is 25
From here we calculate the ChiSq which is basically  (amount rolled – amount expected) squared / amount expected
This figure is then compared to a p-value (probability value) which is used to work out how biased the die is OR whether bad dice are within an expected range of probability (i.e a 1% chance of those dice occurring would still occur 1% of the time). The closer the probability of that many rolls occurring is to zero the more biased the dice is. This calculation is…hard… but thankfully we can compare the chisq to tables to see what the probability is and let other people do the math. All we really need to know is the number of degrees of freedom, happily this is the number of die faces minus 1 (for the test face) so is 5.
Once I had this I slapped the calculations into a test program to restrict user error and simply generated a report. So as a sense test I pretended each dice rolled exactly 25 times! I expect the result to show high probability.
These results show the probability curve at its highest point. This is not 100% because on a bell curve you never actually get to that level of accuracy with 95% being the standard ‘best’ band.
For the next test I added slight over-rolls (5) for 1 & 2 and under rolls (5 again) for 4 & 5
Though the ChiSq has increased to 1 this is still in expected probability range. Let’s try something very buggered
Here you can see the impact the One and 4 results are beyond the expected random range (which is 50% for those who are bothered) so are at the very least unexpected. If I assumed this dice was dodgy then it would appear possibly proved though it would need considerably more rolls to validate whether this was just bad luck or a dodgy dice.
One last drastic test

Excellent. The probability test allows a quite wide range of variation before it started expanding alarm bells. You’ll notice that a mere 10 extra rolls was enough to push the probability from 10% to 0.1% showing that smaller dice adjustments push the probability lower in a far more aggressive fashion that the same number adjustments just off the highest probabilities.
My next task was some tests on my own dice. These are all ‘battle school’ dice and used via a dice tower. This is important as you  need the dice to spin to ensure randomness. If dice are thrown and do not turn then this is far more open to manipulation. To simplify things I just tried the various German pairs (bar some of the SS dice which I have but will not use. As in I won’t use the Deaths Head Dice though don’t have particular problems with some of the other SS units dice)
Colour coding is to assist. Light green is the expected ‘best’ probability (so 95%), purple is less than green but still within the expected probability range. Light Yellow is at the border of what is considered probable. Any red indicates improbable results.
As a rule most of the dice performed well. The only ‘interesting’ exception was  the German Soldier dice which managed the 20% chance of rolling 18 6’s in 60 rolls. If that range of values had occurred then I would no doubt be griping about the dice in game.
To increase the veracity of the test I then picked two further German dice (the 6th Army black dice and the 295th Division white dice (both from the Stalingrad set) and resolved to test these with 600 rolls in 10 groups of 60. I could then track individual results and how they impacted a steady accumulator. If ‘true random’ then some unexpected results (which you would expect to occur with random behaviour) would get gradually smoothed out over time.
In the charts below the 10 tests of 60 rolls each are on the left and the steady accumulating totals (i.e accumulator 2 is Rd1 + Rd 2 and accumulator 3 is Rd1 + Rd2 + Rd3)
The 600 Test

So the results are broadly expected.  With 600 rolls in an exactly average scenario all die faces should appear 100 times and as can be seen bar some minor fluctuations (no face rolls higher than 13 more than this mean and no dice falls less than 8.  The White dice which has the highest ‘1’ result was impacted heavily by the lucky round 4. For the  black dice more different faces had marginal adjustments but all of these came out in the wash and the end result was exactly the same probabilities.
I am  now happy that the maths is valid and that these dice are as balanced as dice can be. I can also now apply this process to gaming dice to see how well they perform conversely this will also prove if my perception of bad dice is a valid perception or the negative reinforcement of noticing bad dice more than good..
* The Monty Hall problem presumed a situation where you are on a game show and presented with three doors. Behind two of the doors is a goat and behind the other is a car. If you pick the door with the car you win it! Once you have made your random pick (1 chance in 3) then the presenter ‘Monty Hall’ checks the other two doors and opens one to show you a goat and then asks If you want to stick with your current choice or switch to the new choice.
The correct answer is to always switch but that often really confuses people who then assume the choice is now 1/2 or 50:50 so it does not matter. it isn’t and does matter.
I have seen various ways of explaining the situation and thought the best was that when you make your initial choice there is a 2/3rd chance that there is a goat behind it. The presenter looks at the other two doors and must show you a door with a goat therefore the other door has a 1 in2 chance of being the car. If you stick then the car will be behind your door 1/3rd of the time (the odds have not changed) but since the revealed door is a goat the odds have improved that the other door contains a car.
This is best done visually so assuming you always start by picking door 1 the  only possible  goat/car variations are as follows
Situation A – Door 1 has a  Goat Door 2 has a Goat Door 3 has a Car – Monty Hall opens door 2.
Situation B – Door 1 has a Goat Door 2 has a Car Door 3 has a Goat – Monty Hall opens door 3
Situation C – Door 1 has a Car Door 2 has a Goat Door 3 has a Goat Monty Hall opens either door 2 or 3
If you stick you only have 1 in 3 chance of winning the car as their will be a car behind the door only 1/3 of the time (situation C), by switching you have a 2 in 3 chance of winning the car as in both Situation A and Situation B Monty Hall has removed the ‘goat’ and what is left must be the car, it is only in situation C that switching would lose. The odds for switching therefore have improved because Monty Hall has provided further information which has improved the odds.
This can be proved by a slight change to the scenario. Assuming the same setup above – now once you have made your choice Monty Hall opens the central door automatically without opening the last door. Now the odds switch to 50:50. In the one case where the car is behind the central door you automatically lose. In the other two cases you have the option of switching but there is now a 50% chance of the car being behind your door and 50% it being behind the other. This is what a lot of people expect the original Monty Hall problem to actually relate to but as you can see it is the presenters knowledge of both the other doors that improves the odds of switching. In this second scenario the odds of both unopened doors increase from the original 1 in 3 to 1 in 2.
** So for those who love maths the actual calculation is…..
There, go knock yourself out. This might explain why I wrote a program and used tables to avoid having my head explode.
*** When using Pearsons ChiSq test you really do need lots of rolls to validate the dice. In reality it is unlikely
That you would get enough rolls in a standard ASL game to validate the dice. I have read that with smaller amounts the
Kolmogorov-Smirnov goodness-of-fit test is better at picking out disruptive dice so I may look at replicating that in
Future as an extra control


Comments

Popular posts from this blog

Newbie Do - Voice with Boards (VWB)

AP8 - A Bloody Harvest

VotG9 Eviction Notice