The Birthday Paradox, also called the Birthday Problem, is the surprisingly high probability that two people will have the same birthday even in a small group of people. In a group of 70 people, there’s a 99.9 percent chance of two people having a matching birthday. But even in a group as small as 23 people, there’s a 50 percent chance of a matching birthday. This program performs several probability experiments to determine the percentages for groups of different sizes. We call these types of experiments, in which we conduct multiple random trials to understand the likely outcomes, Monte Carlo experiments.
You can find out more about the Birthday Paradox at https://en.wikipedia.org/wiki/Birthday_problem.
When you run birthdayparadox.py, the output will look like this:
Birthday Paradox, by Al Sweigart [email protected]
--snip--
How many birthdays shall I generate? (Max 100)
> 23
Here are 23 birthdays:
Oct 9, Sep 1, May 28, Jul 29, Feb 17, Jan 8, Aug 18, Feb 19, Dec 1, Jan 22,
May 16, Sep 25, Oct 6, May 6, May 26, Oct 11, Dec 19, Jun 28, Jul 29, Dec 6,
Nov 26, Aug 18, Mar 18
In this simulation, multiple people have a birthday on Jul 29
Generating 23 random birthdays 100,000 times...
Press Enter to begin...
Let's run another 100,000 simulations.
0 simulations run...
10000 simulations run...
--snip--
90000 simulations run...
100000 simulations run.
Out of 100,000 simulations of 23 people, there was a
matching birthday in that group 50955 times. This means
that 23 people have a 50.95 % chance of
having a matching birthday in their group.
That's probably more than you would think!
Running 100,000 simulations can take a while, which is why lines 95 and 96 report that another 10,000 simulations have finished. This feedback can assure the user that the program hasn’t frozen. Notice that some of the integers, like 10_000
on line 95 and 100_000
on lines 93 and 103, have underscores. These underscores have no special meaning, but Python allows them so that programmers can make integer values easier to read. In other words, it’s easier to read “one hundred thousand” from 100_000
than from 100000
.
1. """Birthday Paradox Simulation, by Al Sweigart [email protected]
2. Explore the surprising probabilities of the "Birthday Paradox".
3. More info at https://en.wikipedia.org/wiki/Birthday_problem
4. This code is available at https://nostarch.com/big-book-small-python-programming
5. Tags: short, math, simulation"""
6.
7. import datetime, random
8.
9.
10. def getBirthdays(numberOfBirthdays):
11. """Returns a list of number random date objects for birthdays."""
12. birthdays = []
13. for i in range(numberOfBirthdays):
14. # The year is unimportant for our simulation, as long as all
15. # birthdays have the same year.
16. startOfYear = datetime.date(2001, 1, 1)
17.
18. # Get a random day into the year:
19. randomNumberOfDays = datetime.timedelta(random.randint(0, 364))
20. birthday = startOfYear + randomNumberOfDays
21. birthdays.append(birthday)
22. return birthdays
23.
24.
25. def getMatch(birthdays):
26. """Returns the date object of a birthday that occurs more than once
27. in the birthdays list."""
28. if len(birthdays) == len(set(birthdays)):
29. return None # All birthdays are unique, so return None.
30.
31. # Compare each birthday to every other birthday:
32. for a, birthdayA in enumerate(birthdays):
33. for b, birthdayB in enumerate(birthdays[a + 1 :]):
34. if birthdayA == birthdayB:
35. return birthdayA # Return the matching birthday.
36.
37.
38. # Display the intro:
39. print('''Birthday Paradox, by Al Sweigart [email protected]
40.
41. The birthday paradox shows us that in a group of N people, the odds
42. that two of them have matching birthdays is surprisingly large.
43. This program does a Monte Carlo simulation (that is, repeated random
44. simulations) to explore this concept.
45.
46. (It's not actually a paradox, it's just a surprising result.)
47. ''')
48.
49. # Set up a tuple of month names in order:
50. MONTHS = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
51. 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
52.
53. while True: # Keep asking until the user enters a valid amount.
54. print('How many birthdays shall I generate? (Max 100)')
55. response = input('> ')
56. if response.isdecimal() and (0 < int(response) <= 100):
57. numBDays = int(response)
58. break # User has entered a valid amount.
59. print()
60.
61. # Generate and display the birthdays:
62. print('Here are', numBDays, 'birthdays:')
63. birthdays = getBirthdays(numBDays)
64. for i, birthday in enumerate(birthdays):
65. if i != 0:
66. # Display a comma for each birthday after the first birthday.
67. print(', ', end='')
68. monthName = MONTHS[birthday.month - 1]
69. dateText = '{} {}'.format(monthName, birthday.day)
70. print(dateText, end='')
71. print()
72. print()
73.
74. # Determine if there are two birthdays that match.
75. match = getMatch(birthdays)
76.
77. # Display the results:
78. print('In this simulation, ', end='')
79. if match != None:
80. monthName = MONTHS[match.month - 1]
81. dateText = '{} {}'.format(monthName, match.day)
82. print('multiple people have a birthday on', dateText)
83. else:
84. print('there are no matching birthdays.')
85. print()
86.
87. # Run through 100,000 simulations:
88. print('Generating', numBDays, 'random birthdays 100,000 times...')
89. input('Press Enter to begin...')
90.
91. print('Let\'s run another 100,000 simulations.')
92. simMatch = 0 # How many simulations had matching birthdays in them.
93. for i in range(100000):
94. # Report on the progress every 10,000 simulations:
95. if i % 10000 == 0:
96. print(i, 'simulations run...')
97. birthdays = getBirthdays(numBDays)
98. if getMatch(birthdays) != None:
99. simMatch = simMatch + 1
100. print('100,000 simulations run.')
101.
102. # Display simulation results:
103. probability = round(simMatch / 100000 * 100, 2)
104. print('Out of 100,000 simulations of', numBDays, 'people, there was a')
105. print('matching birthday in that group', simMatch, 'times. This means')
106. print('that', numBDays, 'people have a', probability, '% chance of')
107. print('having a matching birthday in their group.')
108. print('That\'s probably more than you would think!')
Try to find the answers to the following questions. Experiment with some modifications to the code and rerun the program to see what effect the changes have.
numBDays = int(response)
on line 57?'January'
instead of 'Jan'
?'X simulations run...'
appear every 1,000 simulations instead of every 10,000?