# Available algorithms : thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers. <font color="orange"><b>~</b></font>: implemented but lightly tested. <font color="red"><b>X</b></font>: known problems; please see Github issues. Algorithms | Category | Reference | Status --------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ Information Set Monte Carlo Tree Search (IS-MCTS) | Search | [Cowley et al. '12](https://ieeexplore.ieee.org/abstract/document/6203567) | <font color="orange"><b>~</b></font> Max^n | Search | [Luckhart & Irani '86](https://www.semanticscholar.org/paper/An-Algorithmic-Solution-of-N-Person-Games-Luckhart-Irani/6ab06950332412d25b0915d7796d60040228decd) | <font color="orange"><b>~</b></font> Minimax (and Alpha-Beta) Search | Search | [Wikipedia1](https://en.wikipedia.org/wiki/Minimax#Minimax_algorithm_with_alternate_moves), [Wikipedia2](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning), Knuth and Moore '75 |  Monte Carlo Tree Search | Search | [Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search), [UCT paper](http://ggp.stanford.edu/readings/uct.pdf), [Coulom '06](https://hal.inria.fr/inria-00116992/document), [Cowling et al. survey](http://www.incompleteideas.net/609%20dropbox/other%20readings%20and%20resources/MCTS-survey.pdf) |  Perfect Information Monte Carlo (PIMC) | Search | [Long et al. '10](https://ojs.aaai.org/index.php/AAAI/article/view/7562) | <font color="orange"><b>~</b></font> Lemke-Howson (via <tt>nashpy</tt>) | Opt. | [Wikipedia](https://en.wikipedia.org/wiki/Lemke%E2%80%93Howson_algorithm), [Shoham & Leyton-Brown '09](http://masfoundations.org/) |  ADIDAS | Opt. | [Gemp et al '22](https://arxiv.org/abs/2106.01285) | <font color="orange"><b>~</b></font> Least Core via Linear Programming | Opt. | [Yan & Procaccia '21](https://ojs.aaai.org/index.php/AAAI/article/view/16721) | <font color="orange"><b>~</b></font> Least Core via Saddle-Point (Lagrangian) Programming | Opt. | Gemp et al '24 | <font color="orange"><b>~</b></font> Sequence-form linear programming | Opt. | [Koller, Megiddo, and von Stengel '94](http://theory.stanford.edu/~megiddo/pdf/stoc94.pdf), <br> [Shoham & Leyton-Brown '09](http://masfoundations.org/) |  Shapley Values (incl. approximations via Monte Carlo sampling) | Opt. | [Mitchell et al. '22](https://www.jmlr.org/papers/v23/21-0439.html) | <font color="orange"><b>~</b></font> Stackelberg equilibrium solver | Opt. | [Conitzer & Sandholm '06](https://users.cs.duke.edu/~conitzer/commitEC06.pdf) | <font color="orange"><b>~</b></font> MIP-Nash | Opt. | [Sandholm et al. '05](https://dl.acm.org/doi/10.5555/1619410.1619413) | <font color="orange"><b>~</b></font> Magnetic Mirror Descent (MMD) with dilated entropy | Opt. | [Sokota et al. '22](https://arxiv.org/abs/2206.05825) | <font color="orange"><b>~</b></font> Counterfactual Regret Minimization (CFR) | Tabular | [Zinkevich et al '08](https://poker.cs.ualberta.ca/publications/NIPS07-cfr.pdf), [Neller & Lanctot '13](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf) |  CFR against a best responder (CFR-BR) | Tabular | [Johanson et al '12](https://poker.cs.ualberta.ca/publications/AAAI12-cfrbr.pdf) |  Exploitability / Best response | Tabular | [Shoham & Leyton-Brown '09](http://masfoundations.org/) |  External sampling Monte Carlo CFR | Tabular | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf) |  Fixed Strategy Iteration CFR (FSICFR) | Tabular | [Neller & Hnath '11](https://cupola.gettysburg.edu/csfac/2/) | <font color="orange"><b>~</b></font> Extensive-form Regret Minimization | Tabular | [Morrill et. al. '22](https://arxiv.org/abs/2102.06973) | <font color="orange"><b>~</b></font> Mean-field Ficticious Play for MFG | Tabular | [Perrin et. al. '20](https://arxiv.org/abs/2007.03458) | <font color="orange"><b>~</b></font> Online Mirror Descent for MFG | Tabular | [Perolat et. al. '21](https://arxiv.org/abs/2103.00623) | <font color="orange"><b>~</b></font> Munchausen Online Mirror Descent for MFG | Tabular | [Lauriere et. al. '22](https://arxiv.org/pdf/2203.11973) | <font color="orange"><b>~</b></font> Fixed Point for MFG | Tabular | [Huang et. al. '06](https://zbmath.org/?q=an:1136.91349) | <font color="orange"><b>~</b></font> Boltzmann Policy Iteration for MFG | Tabular | [Lauriere et. al. '22](https://arxiv.org/pdf/2203.11973) | <font color="orange"><b>~</b></font> Outcome sampling Monte Carlo CFR | Tabular | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf) |  Policy Iteration | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) |  Q-learning | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) |  Regret Matching | Tabular | [Hart & Mas-Colell '00](https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0262.00153) |  Restricted Nash Response (RNR) | Tabular | [Johanson et al '08](http://johanson.ca/publications/poker/2007-nips-rnash/2007-nips-rnash.html) | <font color="orange"><b>~</b></font> SARSA | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) |  Value Iteration | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) |  Advantage Actor-Critic (A2C) | RL | [Mnih et al. '16](https://arxiv.org/abs/1602.01783) |  Deep Q-networks (DQN) | RL | [Mnih et al. '15](https://www.nature.com/articles/nature14236) |  Ephemeral Value Adjustments (EVA) | RL | [Hansen et al. '18](https://arxiv.org/abs/1810.08163) | <font color="orange"><b>~</b></font> Proximal Policy Optimization (PPO) | RL | [Schulman et al. '18](https://arxiv.org/abs/1707.06347) | <font color="orange"><b>~</b></font> Mean Field Proximal Policy Optimization (MF-PPO) | RL | [Algumaei et al. '23](https://link.springer.com/chapter/10.1007/978-3-031-33377-4_28) | <font color="orange"><b>~</b></font> AlphaZero (C++/LibTorch) | MARL | [Silver et al. '18](https://science.sciencemag.org/content/362/6419/1140) |  AlphaZero (Python/TF) | MARL | [Silver et al. '18](https://science.sciencemag.org/content/362/6419/1140) |  Correlated Q-Learning | MARL | [Greenwald & Hall '03](https://www.aaai.org/Papers/ICML/2003/ICML03-034.pdf) | <font color="orange"><b>~</b></font> Asymmetric Q-Learning | MARL | [Kononen '04](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9458&rep=rep1&type=pdf) | <font color="orange"><b>~</b></font> Deep CFR | MARL | [Brown et al. '18](https://arxiv.org/abs/1811.00164) |  DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE) | MARL | [Foerster, Farquhar, Al-Shedivat et al. '18](http://proceedings.mlr.press/v80/foerster18a/foerster18a.pdf) | <font color="orange"><b>~</b></font> Exploitability Descent (ED) | MARL | [Lockhart et al. '19](https://arxiv.org/abs/1903.05614) |  (Extensive-form) Fictitious Play (XFP) | MARL | [Heinrich, Lanctot, & Silver '15](http://proceedings.mlr.press/v37/heinrich15.pdf) |  Learning with Opponent-Learning Awareness (LOLA) | MARL | [Foerster, Chen, Al-Shedivat, et al. '18](https://arxiv.org/pdf/1709.04326.pdf) | <font color="orange"><b>~</b></font> Nash Q-Learning | MARL | [Hu & Wellman '03](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf) | <font color="orange"><b>~</b></font> Neural Fictitious Self-Play (NFSP) | MARL | [Heinrich & Silver '16](https://arxiv.org/abs/1603.01121) |  Neural Replicator Dynamics (NeuRD) | MARL | [Omidshafiei, Hennes, Morrill, et al. '19](https://arxiv.org/abs/1906.00190) | <font color="red"><b>X</b></font> Regret Policy Gradients (RPG, RMPG) | MARL | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026) |  Policy-Space Response Oracles (PSRO) | MARL | [Lanctot et al. '17](https://arxiv.org/abs/1711.00832) |  GQ-based ("all-actions") Policy Gradient (QPG) | MARL | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026) |  Regression CFR (RCFR) | MARL | [Waugh et al. '15](https://arxiv.org/abs/1411.7974), [Morrill '16](https://poker.cs.ualberta.ca/publications/Morrill_Dustin_R_201603_MSc.pdf) |  Rectified Nash Response (PSRO_rn) | MARL | [Balduzzi et al. '19](https://arxiv.org/abs/1901.08106) | <font color="orange"><b>~</b></font> Mean-Field PSRO (MFPSRO) | MARL | [Muller et al. '21](https://arxiv.org/abs/2111.08350.08106) | <font color="orange"><b>~</b></font> Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC) | MARL | [Bowling & Veloso '02](https://www.sciencedirect.com/science/article/pii/S0004370202001212) | <font color="orange"><b>~</b></font> α-Rank | Eval. / Viz. | [Omidhsafiei et al. '19](https://www.nature.com/articles/s41598-019-45619-9), [arXiv](https://arxiv.org/abs/1903.01373) |  Nash Averaging | Eval. / Viz. | [Balduzzi et al. '18](https://arxiv.org/abs/1806.02643) | <font color="orange"><b>~</b></font> Replicator / Evolutionary Dynamics | Eval. / Viz. | [Hofbaeur & Sigmund '98](https://www.cambridge.org/core/books/evolutionary-games-and-population-dynamics/A8D94EBE6A16837E7CB3CED24E1948F8), [Sandholm '10](https://mitpress.mit.edu/books/population-games-and-evolutionary-dynamics) |  Voting-as-Evaluation (VasE) | Eval. / Viz. | [Lanctot et al. '23](https://arxiv.org/abs/2312.03121) | 