View the full thesis here.


Despite concerted efforts by researchers and policymakers, governments are failing to implement the global coordination needed to implement policies that could avert the disaster of unmitigated climate change. Existing economic models are often ill-equipped to capture the complexities of dynamic, strategic interactions among multiple agents. The research on international mechanisms such as climate clubs, for instance, is often limited to one-shot games due to the combinatorial explosion of sequential negotiation steps. Addressing this gap, this thesis leverages state-of-the-art multi-agent reinforcement learning (MARL) algorithms evaluated on the Rice-N integrated assessment model (IAM). The first approach evaluates the efficacy of the meta-learning opponent-shaping algorithm, ‘Shaper’, in exploiting the learning dynamics of other agents to outperform them in a competitive climate policy setting. Even though Shaper performs well on new economic games introduced here and cooperates in self-play, it fails to achieve the same results on Rice-N. Secondly, the meta-learning ‘Good Shepherd’ algorithm trains a policy that tunes the mitigation efforts and tariff of a climate club that other agents can join or leave unilaterally. This approach produces club structures based on several climate and economic objectives that align with the literature while yielding a novel perspective on dynamic club participation. While these results overall suggest a strong use case for the application of MARL to climate policy, more research into both algorithms and economic models is needed, as well as an interdisciplinary alignment on terminology and goals.