A Real Quantum Computer Playing Monty Hall Against Itself

Since the 15-qubits ibmqx5 is available through QISKit in the IBM Q experience, it has become possible to simulate the game of Monty Hall, whose counterintuitive, albeit mathematically correct, aspect of optimal winning strategy is the subject of an abundant literature, including quantum models.
In this blog, I shall limit myself to present a program in which the computer is measured to itself. This program can be easily modified to allow an interactive game between the computer and a player.

First draft of the project using the simulator on the composer

The following circuit has been developed (FIG1).

FIG1: Partition running on the online simulator. Details: bottom left: realization of a W state - bottom right: Toffoli gate incorporating a swap.


This circuit is compatible with ibmqx5 as far as connections are concerned, except that the cNOT gates whose direction is incorrect must be replaced by reverse cNOT gates in the QISKIt version1.

Initially, two W-states are established, respectively concerning the qubits Q3, Q4, Q5 and the qubits Q13, Q14 and Q15. Then the qubit Q2, starting in the basic state, migrates up to Q5 by successive swaps, being on the way the target of three Toffoli gates whose control qubits are respectively Q3 original and Q15, Q4 original and Q14, Q5 original and Q13.
The qubit Q6 is subjected to a Hadamard gate and controls by a cNOT the target Q5 before the final measurements.

In the end, we obtain (by convention):
Q2, Q3 and Q4 respectively representing the choice of the left, central or right door to hide the car (1) or one of the two goats (0).
Q15, Q14 and Q13 respectively representing the initial choice (1) or not (0) of the left, central or right door by the player.
Q6 representing the decision to switch (1) or to stick with the first choice (0)
Q5 representing a final victory (1) or a defeat (0)

We see that the model assumes that the game master chooses randomly with probability 1/3 (using a true random generator) the door that hides the car. The door that opens in case of possible choice between two is also supposed to be chosen at random, but this does not influence the results.
The player meanwhile randomly designates each door with probability 1/3 and switches with probability 1/2. True random generators are also used here.

A typical result of an experiment comprising 1024 runs practiced on the online simulator is presented in FIG2. In total, 51% of victory was observed, for a strategy consisting of switching half the time at random. Counter-intuitively and as expected by the theory, sticking to the initial choice leads to 1/3 of gains (third column) and switching leads to 2/3 of gains (fourth column).

FIG2
Result frequencies observed after a Z-measurement of Q5 (sixth bit before the end) and Q6 (seventh bit before the end).

Experimentation on the Real Quantum Computer

The program tested above on the simulator (FIG1) was rewritten in Python language as a circuit on the QISKit platform of IBM Q experience. The cNOT gates were reverted whenever necessary. The U3 gates were replaced by Ry gates. The program has been successfully tested again using the online simulator2.

An experiment comprising 8192 runs practiced on the ibmqx5, with measurement of Q5 and Q6, is presented in FIG3. The quantum computer predicts a win in 48% of the cases for the mixed strategy (1/2 random switch), 40% for sticking systematically and 55% for switching systematically. 


FIG3. Experiment on the ibmqx5
Result frequencies observed after a Z-measurement of Q5 (sixth bit before the end) and Q6 (seventh bit before the end).
This particular experiment demonstrates the general ability of this superconducting quantum computing system to discriminate the most effective strategies in a problem of this type by running a single circuit3

In order to analyze this circuit in more detail, the same experiment of 8192 runs was made by measuring this time all the concerned qubits. Here the results have somewhat been less decided, with however a percentage of win always more marked for the switching strategy  (53% of the cases vs 43% for the sticking strategy and 48% for the balanced strategy). This decrease in circuit performance is likely due, at least in part, to additional readout errors introduced by measuring more qubits4.

The next step was to allow the quantum computer to play against itself only one gaming session at a time using this circuit stripped of its three Toffoli gates. To do this, after each experiment limited to one single run, the conventional part of the program checks whether the condition of two W states is fulfilled. If this is not the case, a new experience of one run is practiced, and so on until the condition is satisfied. The "game master" part of the program then "chooses" a door where to hide the car according to the measurement of the first W state. The "player" part of the program designates a door, its decision being represented by the measurement of the 2nd W state. The game master then opens a door behind which a goat is discovered. In this version, an additional qubit, in a state of superposition thanks to a Hadamard gate, is used to decide, if necessary, which door to open. The "player" then decides whether to switch or not, its decision being represented by the Q6 measurement. A typical output of this program, which was actually obtained on the ibmqx5, is as follows5:

Game Master: One car and two goats are now hidden behind these doors.
                        Which door do you choose?

Player:      My choice is the left door

Game Master: Now I open the central door and you see a goat
                        You get an opportunity to change your choice!
                         Do you want to switch for the right door?

Player:      I stick with my first choice, the left door

Game Master: You opened the left door and won the car! Congratulations!


Notes:
1. This modification was not possible on the IBM Q experience composer because the length of the program presented here is at the extreme limit allowed by this programming option.
2. Indeed, the local simulator appeared excessively slow to run this program which includes 78 Hadamard gates, 86 cNOT gates, 8 Y-rotation gates, 2 Pauli X gates, 3 Pauli Z gates, 9 T gates, 12 T-dart gates, and 30 barriers.
3. To test strategies with switching varying from 0% to 100%, simply replace the Hadamard gate acting on the qubit representing the choice of switching by an appropriate sequence (Hadamard, then Ry-gate, then Hadamard) with an adapted theta angle for the rotation.
4. The data analysis also shows that only 46% of the Q2Q3Q4 measurements and 50% of the Q13Q14Q15 measurements corresponded to a W state after measurements. Moreover, for the 23% of runs where a result compatible with the presence of two W states was observed, Q5 correctly reflected a win or loss in only 59% of the cases.
5. The reader will immediately understand that it is very easy to switch from this program to an interactive program between the quantum computer and a human player, which has been done. In this case, only one W state remains and the switch qubit is no longer needed, the decisions of the player manifesting themselves naturally. Such a five-qubits version is adaptable without problems to the ibmqx2 and the ibmqx4.

Comments