ALMOST UNLOCKING THE REPEATING PRISONER’S DILEMMA

For people familiar with game theory, any introduction to the repeating Prisoner’s Dilemma and its Tit for Tat solution will be remedial. Robert Axelrod’s The Evolution of Cooperation (1984) is legendary for demonstrating that cooperation in Prisoner’s Dilemma can emerge without any use of externally applied coercive force to induce it.^{^[651]} It is good to recall why theorists believe the Prisoner’s Dilemma game is not only prevalent but also paradigmatic of the problem of achieving cooperation among dyads of individuals or in vast collective undertakings.

Steven Kuhn’s entry “Prisoner’s Dilemma” in the Stanford Encyclopedia of Philosophy makes it clear: [Either] my temptation is to enjoy some benefits brought about by burdens shouldered by others... [or my] temptation is to benefit myself by hurting others.”^{^[652]} It is generally understood that in Prisoner’s Dilemma scenarios, one benefits specifically because someone else is made worse off. Somebody else pays the price for one’s success. Selfish gene theory, if accepted, requires that agents must adopt this behavioral trait consistent with noncooperative game theory as a condition of survival. Dawkins argues that we must accept the truth of “the gene’s law of universal ruthless selfishness.”^{^[653]} If we agree with this chain of argument, mandating the necessity of individualistic strategic optimization, then we must take the Prisoner’s Dilemma seriously.^{^[654]}

Game theorists agree that in a series of Prisoner’s Dilemma games played with exacting precision and a well-known end point, it is rational to defect on the first round of engagement, thereby missing out on the cooperative gains that could have been made.^{^[655]} Also, if the subsequent move’s outcome is significantly less important than the current round’s, then it will also pay to consistently defect in a repeating Prisoner’s Dilemma.^{^[656]} Yet for game theorists, solving the Prisoner’s Dilemma is pressing because they consider it both to exist everywhere and to entail a large- scale waste of resources that seems counterintuitive to accept.^{^[657]} Axelrod is emphatic about the significance of the Prisoner’s Dilemma that is encountered by “not only people but also nations and bacteria.”^{^[658]} His treatment of the repeating Prisoner's Dilemma is prescriptive.

He tells us, “Since the Prisoner's Dilemma is so common in everything from personal relations to international relations, it would be useful to know how best to act when in this type of setting.”^{^[659]}

Axelrod finds his research pertinent to the question of social order. He is dissatisfied with Thomas Hobbes’s alleged solution of the Prisoner’s Dilemma relying on a heavy-handed state. Thus, he poses the alternative question, “Under what conditions will cooperation emerge in a world of egoists without central authority?”^{^[660]} Axelrod exemplifies how by 1984, the time of The Evolution of Cooperation’s publication date, theorists widely accepted that all manner of interactions from the worlds of social relations and warfare to the animal kingdom are best described as Prisoner’s Dilemma games. Axelrod’s fame lies in presenting a solution to the Prisoner's Dilemma that does not rely on externally applied coercive force to achieve the cooperative outcome. Connecting the international relations problem of anarchy to the central thrust of his investigation, Axelrod observes,

Today nations interact without central authority. Therefore the requirements for the emergence of cooperation have relevance to many of the central issues of international politics. The most important problem is the security dilemma: nations often seek their own security through means which challenge the security of others.^{^[661]}

Axelrod finds the Prisoner’s Dilemma in the Soviet’s invasion of Afghanistan in 1979, because in his view, the Soviet Union was defecting while hoping the United States would cooperate. Inviting friends over to dinner is a Prisoner's Dilemma game because they may not reciprocate.^{^[662]} Journalists, business leaders, and congressional representatives are similarly locked into repeating Prisoner's Dilemma games. Each wants to gain from the other's cooperation without reciprocating but finds it necessary to reciprocate as a condition of the other’s cooperation.

Axelrod claims that the applicability of his result applies to political philosophy, international politics, economic and social exchange, and international political economy.^{^[663]} Axelrod refers to the “norm of reciprocity” and proceeds to deduce it from a tournament with combatants structured by a repeating Prisoner’s Dilemma game in which the winning strategy was that of conditional cooperation, referred to as “Tit for Tat.”^{^[664]}

Axelrod held an open tournament, inviting fellow academicians from any field to submit programs. These programs were instructions for play in every round of the 200-round supergame that specified which move the protagonist would take depending on the opponent’s action and the outcome of the previous round. Axelrod designed a Prisoner’s Dilemma game with the following rewards: suckering the other player yields 5 points, mutual cooperation yields 3 points each, mutual defection yields ι point each, and being suckered yields 0 points. Note that as with the use of the models in evolutionary game theory, the rewards are fixed and objectively defined. Furthermore, the success criteria, in this case winning the tournament, are set at a systemic level beyond the subjective evaluation of participants. Axelrod’s tournament went on for 200 rounds per dyad of players who could perfectly recollect the entire history of the game. The identity of the other agent is constant. Every entrant into the tournament “writes a program that embodies a rule to select the cooperative or noncooperative choice on each move.”^{^[665]} The computer program uses this input and plays entrants’ strategies against one another. All entrants fully understand the rules of noncooperative game theory and the logic of the Prisoner’s Dilemma.

The result of this tournament stands as a eureka moment in the history of game theory. The winning strategy, submitted by Anatol Rapoport, was Tit for Tat. This strategy always cooperates first and then plays what the other player did in the subsequent round of play.^{^[666]} Here is what’s interesting.

Given Axelrod’s payoff structure of 5, 3, 1, 0 for the 4 various Prisoner’s Dilemma outcomes, the Tit for Tat strategy averaged 504 points, which is less than half of the amount, 1200, that would be shared among unconditional cooperators who always cooperated over 200 rounds of play. Axelrod’s point is that even though it is possible to write programs that will steal points from cooperators, those who cooperate will gain more points overall than those who are prone to undermining others by defecting. Tit for Tat has the property of cooperating with other cooperators but defecting when paired with an exploitative strategy. Axelrod called this strategy “nice” because it always seeks cooperation first and only defects if the other player did so in the previous round. This strategy is considered benign because it does not try to get more than half of the spoils possible through cooperation.^{^[667]}

The Tit for Tat strategy applied to an indefinitely repeating Prisoner’s Dilemma game may be evaluated for its mathematical robustness and for its social implications. From the perspective of evolutionary game theory, the main question is whether Tit for Tat represents an evolutionary stable strategy. This strategy does well when paired with itself, and both Axelrod and Dawkins stress that it is relatively impervious to challenge by mutant invaders.^{^[668]} This invader differs from the others and will be reflected by more of its phenotype in the next round of play if it gains more fitness points than other actors. However, despite the intrinsic allure that purely self-interested optimization could, in precisely replicating circumstances of dyadic play, demonstrably show the superiority of this nice strategy, Tit for Tat is not an evolutionary stable strategy.

To see this, assume that over many rounds of indefinite dyadic play, successful parents yield more offspring in the subsequent generation. Tit for Tat begins to become a predominant strategy, prevailing over its nasty cousins that are not able to reap rewards from cooperating.

However, as Tit for Tat becomes more successful, even nicer strategies, specifically the Tit for Two Tats strategy proves even more successful. This nicer strategy is forgiving and only defects on another player after that player fails to cooperate in two rounds of play. Tit for Two Tats is then comparatively successful because Tit for Tat can easily become embroiled in a cycle of endless payback against another Tit for Tat strategy if the two players end up off-cycle because of erroneous play or imperfect memory recall. Tit for Two Tats is more forgiving and hence does not inadvertently spark an endless conflict. However, as Tit for Two Tats becomes predominant over Tit for Tat, it has the weakness that it opens the door again for the nasty strategy Probatory Retaliator that initiates play by first defecting and then cooperating. The moral is that even though a tendency toward cooperation is beneficial when many share the trait, it is susceptible to clever exploitation by a narrowly self-interested individually maximizing agent. Thus, even though Axelrod and Dawkins celebrate the outcome that the indefinitely repeated Prisoner’s Dilemma could result in nice agents who prosper by enacting reciprocal altruism, this result is suggestive but not mathematically robust.^{^[669]} Thus, Tit for Tat sounds good on paper as a solution to a narrowly circumscribed PD encounter, but even in this idealized form, it fails to solidify cooperation as either necessary or likely.^{^[670]} Despite this weakness of Tit for Tat, even when considered solely between two players with good recall and perfectly repeating conditions, this strategy of conditional cooperation is still the mainstay of the neoliberal approach to explaining how cooperation can emerge from narrow self-interest and individualistic maximization.^{^[671]}

Dawkins devotes the chapter “Nice Guys Finish First” to his treatment of Axelrod’s tournament and the relative triumph of Tit for Tat in the 1989 edition of The Selfish Gene. Like Axelrod, he is optimistic about pure selfishness leading to cooperative conduct without introducing any extraneous motives or considerations.

However, not only is Tit for Tat not evolutionarily stable, but it is also not clear that it can deliver on its promise to achieve social order in the Prisoner’s Dilemma games widely believed by game theorists to characterize a significant class of interactions throughout society. The hope was that Tit for Tat, and the demonstration that cooperation can evolve in populations of egoists, could render order out of anarchy without the introduction of a central authority. This is especially true for the neoliberal school of international relations theorists.^{^[672]}

Both Dawkins and Axelrod move seamlessly in their discussions between actors who are simply replicating strategies and those who are human agents.^{^[673]} The keystone chapter in Axelrod’s Evolution of Cooperation is “The Live-and- Let-Live System in Trench Warfare in World War I.”^{^[674]} Like many other social interactions, trench warfare is determined by Axelrod to be a Prisoner’s Dilemma game with the following payoff structure. Each agent chooses between either shooting to kill or shooting without aim. One’s best outcome is to shoot to kill the other, while the other shoots aimlessly. Both prefer mutual aimlessness to mutually shooting to kill. Both least prefer to be the sucker who may be killed without attempting self-defense. Axelrod explains how by his analysis, trench warfare is “an iterated Prisoner’s Dilemma in which conditional strategies are possible.”^{^[675]} His idea is that a soldier would conditionally cooperate by withholding fire if the enemy did so in the prior round.

Yet counter to Axelrod’s argument, Andrew Gelman argues in a recent paper that Axelrod’s identification of a Prisoner’s Dilemma payoff matrix underlying trench warfare is mistaken.^{^[676]} Gelman questions whether any single soldier cares whether or not he kills an opposing soldier. He further suggests that it is in no one’s immediate self-interest to fire at all because this draws attention to oneself and makes it possible to identify one’s location, thereby making oneself a more accessible target. Thus, again, even in an empirical application, it is not clear how relevant Tit for Tat conditional cooperation is to actors’ choices.^{^[677]}

Despite the fact that Tit for Tat is not an evolutionary stable strategy even under the refined conditions of Axelrod’s tournament, neoliberals have pinned great hope on this apparent derivation of cooperation from the strict assumptions of myopic self-interest. Here is a typical explanation of this hope in the domain of international relations theory:

Most game theoretic studies of international cooperation [including Axelrod (1984)] and regimes have focused on the Prisoner’s Dilemma (PD). PD is attractive since it can produce cooperative behavior under “realist” conditions. If play is repeated, the costs of defecting on any single move must be calculated not only with reference to the immediate payoff, but with reference to the opportunity costs associated with future interactions. Yet under assumptions of complex interdependence, the “dilemma” of PD diminishes. The very existence of a network of regimes and transnational relations among the advanced industrial states facilitates communication, enhances the importance of reputation, and lengthens the “shadow of the future.” In its heuristic use PD indicates why these institutions deter suboptimal outcomes.^{^[678]}

Axelrod’s suggestive demonstration that cooperation can spontaneously evolve without the imposition of a reward structure by a centralized authority led to enthusiasm for building institutions that created conditions of transparency. The idea is that if agents’ actions are visible, then the alchemy of Axelrod’s experimental setting would lead the actors to adopt the nice Tit for Tat strategy. Hume’s knaves and Hobbes’s Foole are transmuted, like lead into gold, to behave as enlightened individuals. This optimism about institutional design deserves to be considered an iteration of liberalism because it looks to structural constraints to induce cooperation among rational egoists. Institutions that work through the principle of transparency are more attractive than those working through surveillance, monitoring, policing, and sanctioning.

However, the hopes for institutions that could emulate Axelrod’s tournaments to induce the intentional adoption of Tit for Tat are overstated. In the replicator dynamics used to identify evolutionary stable strategies, behavior is understood to be programmed into actors. Thus, nasty strategies are eliminated in successive generations of play. At first, they succeed rather well. However, as the easy prey is starved out of existence, and the competition becomes increasingly nasty, nastiness cannot pay off against itself. At this point, Tit for Tat prevails. However, as discussed earlier, Tit for Tat does not play as well as the more forgiving strategy Tit for Two Tats among less myopically self-interested actors. Biological evolution does not presume that actors may adopt strategies at will or whim, depending on a specific environment. Therefore, even if a clever institutional designer were to perfectly build institutions on the premise of indefinitely repeated perfectly transparent rounds of dyadic PD play, predatory opportunism cannot be effectively removed from the rational actor’s motivation set. As soon as the end of play is in sight, or the next round is not almost as important as the present one, or players interact with other players only a few times, the hopes for Tit for Tat fade.^{^[679]}

Both Russell Hardin and Ken Binmore, two ardent devotees of the parsimony of the rational choice approach to politics, recognize this reality. Hardin, although not critical of the implications of Axelrod’s study for ongoing two-person interactions, levies heavy criticism on the ability of Tit for Tat to achieve cooperation in an anarchic situation with many actors.^{^[680]} Hardin, as discussed in Chapter 9, views basic exchange as a Prisoner’s Dilemma.^{^[681]} He emphasizes a conclusion he did not budge from over the course of his career: “my failure to cooperate in a large- number collective action... is most likely to be a genuinely self-serving action that makes me better off now and in the longer run.”^{^[682]} Hardin argues that individualistic optimization is the course determined by rational choice. He directly counters Axelrod’s hopeful counsel for institutional designers, noting that “an understanding of dyadic iterated exchange relations does not yield us an analogous understanding of large-number iterated collective actions.”^{^[683]}

Ken Binmore, a mathematical game theorist who migrated into political philosophy, similarly has a fondness for the Tit for Tat strategy.^{^[684]} However, by the end of his naturalized approach to justice recognizing solely rationally selfinterested actors, he too must accept that dyadic conditional cooperation is insufficient to achieve large-scale social cooperation. Contemporary society, which gave rise to the Enron scandal and then worse lapses in collective encounters during the 2007 financial crisis, is far beyond repeated dyadic encounters or even small groups of individuals. Even though Binmore stresses the value of decentralization, institutions must still be designed to prevent the failure of collective action predetermined by rational agency. In his conclusion “Designing a Social Mechanism,” he laments,

To achieve an efficient outcome, it is necessary to decentralize decision-making, but the agents to whom decisions are decentralized won’t usually share the objectives the designer is seeking to achieve. Some of them will want to embezzle the funds entrusted to their care... To some extent, the agents’ behavior can be controlled by straightforward regulation that forbids certain practices, and makes others mandatory.^{^[685]}

Even though the liberal hope is to avoid a Leviathan state, and neoliberalism strives to design institutions purely on the premise of transparency, even Binmore acknowledges that ultimately behavior must be policed to enforce compliance. He states, “But only behavior that can be effectively monitored can be controlled in this way. Deviants can then be detected and punished by the external enforcement agency whose existence is taken for granted in classical mechanism design.”^{^[686]} Despite the optimistic reception of Tit for Tat, it remains unclear how this remedy for two-person indefinitely repeating PDs can possibly be extended to situations in which a new actor is encountered in almost every round of play.^{^[687]}

<< | >>

↑

Source: Amadae S.M.. Prisoners of Reason: Game Theory and Neoliberal Political Economy. Cambridge University Press,2016. — 355 p.. 2016

ALMOST UNLOCKING THE REPEATING PRISONER’S DILEMMA

More on the topic ALMOST UNLOCKING THE REPEATING PRISONER’S DILEMMA: