# Available algorithms

![](_static/green_circ10.png "green circle"): thoroughly-tested. In many cases,
we verified against known values and/or reproduced results from papers.

<font color="orange"><b>~</b></font>: implemented but lightly tested.

<font color="red"><b>X</b></font>: known problems; please see Github issues.

Algorithms                                                            | Category     | Reference                                                                                                                                                                                                                                                                                                       | Status
--------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------
Information Set Monte Carlo Tree Search (IS-MCTS)                     | Search       | [Cowley et al. '12](https://ieeexplore.ieee.org/abstract/document/6203567)                                                                                                                                                                                                                                      | <font color="orange"><b>~</b></font>
Max^n                                                                 | Search       | [Luckhart &amp; Irani '86](https://www.semanticscholar.org/paper/An-Algorithmic-Solution-of-N-Person-Games-Luckhart-Irani/6ab06950332412d25b0915d7796d60040228decd)                                                                                                                                             | <font color="orange"><b>~</b></font>
Minimax (and Alpha-Beta) Search                                       | Search       | [Wikipedia1](https://en.wikipedia.org/wiki/Minimax#Minimax_algorithm_with_alternate_moves), [Wikipedia2](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning), Knuth and Moore '75                                                                                                                         | ![](_static/green_circ10.png "green circle")
Monte Carlo Tree Search                                               | Search       | [Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search), [UCT paper](http://ggp.stanford.edu/readings/uct.pdf), [Coulom '06](https://hal.inria.fr/inria-00116992/document), [Cowling et al. survey](http://www.incompleteideas.net/609%20dropbox/other%20readings%20and%20resources/MCTS-survey.pdf) | ![](_static/green_circ10.png "green circle")
Perfect Information Monte Carlo (PIMC)                                | Search       | [Long et al. '10](https://ojs.aaai.org/index.php/AAAI/article/view/7562)                                                                                                                                                                                                                                        | <font color="orange"><b>~</b></font>
Lemke-Howson (via <tt>nashpy</tt>)                                    | Opt.         | [Wikipedia](https://en.wikipedia.org/wiki/Lemke%E2%80%93Howson_algorithm), [Shoham &amp; Leyton-Brown '09](http://masfoundations.org/)                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")
ADIDAS                                                                | Opt.         | [Gemp et al '22](https://arxiv.org/abs/2106.01285)                                                                                                                                                                                                                                                              | <font color="orange"><b>~</b></font>
Least Core via Linear Programming                                     | Opt.         | [Yan &amp; Procaccia '21](https://ojs.aaai.org/index.php/AAAI/article/view/16721)                                                                                                                                                                                                                               | <font color="orange"><b>~</b></font>
Least Core via Saddle-Point (Lagrangian) Programming                  | Opt.         | Gemp et al '24                                                                                                                                                                                                                                                                                                  | <font color="orange"><b>~</b></font>
Sequence-form linear programming                                      | Opt.         | [Koller, Megiddo, and von Stengel '94](http://theory.stanford.edu/~megiddo/pdf/stoc94.pdf), <br> [Shoham &amp; Leyton-Brown '09](http://masfoundations.org/)                                                                                                                                                    | ![](_static/green_circ10.png "green circle")
Shapley Values (incl. approximations via Monte Carlo sampling)        | Opt.         | [Mitchell et al. '22](https://www.jmlr.org/papers/v23/21-0439.html)                                                                                                                                                                                                                                             | <font color="orange"><b>~</b></font>
Stackelberg equilibrium solver                                        | Opt.         | [Conitzer &amp; Sandholm '06](https://users.cs.duke.edu/~conitzer/commitEC06.pdf)                                                                                                                                                                                                                               | <font color="orange"><b>~</b></font>
MIP-Nash                                                              | Opt.         | [Sandholm et al. '05](https://dl.acm.org/doi/10.5555/1619410.1619413)                                                                                                                                                                                                                                           | <font color="orange"><b>~</b></font>
Magnetic Mirror Descent (MMD) with dilated entropy                    | Opt.         | [Sokota et al. '22](https://arxiv.org/abs/2206.05825)                                                                                                                                                                                                                                                           | <font color="orange"><b>~</b></font>
Counterfactual Regret Minimization (CFR)                              | Tabular      | [Zinkevich et al '08](https://poker.cs.ualberta.ca/publications/NIPS07-cfr.pdf), [Neller &amp; Lanctot '13](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf)                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
CFR against a best responder (CFR-BR)                                 | Tabular      | [Johanson et al '12](https://poker.cs.ualberta.ca/publications/AAAI12-cfrbr.pdf)                                                                                                                                                                                                                                | ![](_static/green_circ10.png "green circle")
Exploitability / Best response                                        | Tabular      | [Shoham &amp; Leyton-Brown '09](http://masfoundations.org/)                                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
External sampling Monte Carlo CFR                                     | Tabular      | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf)                                                                                                                                                            | ![](_static/green_circ10.png "green circle")
Fixed Strategy Iteration CFR (FSICFR)                                 | Tabular      | [Neller &amp; Hnath '11](https://cupola.gettysburg.edu/csfac/2/)                                                                                                                                                                                                                                                | <font color="orange"><b>~</b></font>
Extensive-form Regret Minimization                                    | Tabular      | [Morrill et. al. '22](https://arxiv.org/abs/2102.06973)                                                                                                                                                                                                                                                         | <font color="orange"><b>~</b></font>
Mean-field Ficticious Play for MFG                                    | Tabular      | [Perrin et. al. '20](https://arxiv.org/abs/2007.03458)                                                                                                                                                                                                                                                          | <font color="orange"><b>~</b></font>
Online Mirror Descent for MFG                                         | Tabular      | [Perolat et. al. '21](https://arxiv.org/abs/2103.00623)                                                                                                                                                                                                                                                         | <font color="orange"><b>~</b></font>
Munchausen Online Mirror Descent for MFG                              | Tabular      | [Lauriere et. al. '22](https://arxiv.org/pdf/2203.11973)                                                                                                                                                                                                                                                        | <font color="orange"><b>~</b></font>
Fixed Point for MFG                                                   | Tabular      | [Huang et. al. '06](https://zbmath.org/?q=an:1136.91349)                                                                                                                                                                                                                                                        | <font color="orange"><b>~</b></font>
Boltzmann Policy Iteration for MFG                                    | Tabular      | [Lauriere et. al. '22](https://arxiv.org/pdf/2203.11973)                                                                                                                                                                                                                                                        | <font color="orange"><b>~</b></font>
Outcome sampling Monte Carlo CFR                                      | Tabular      | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf)                                                                                                                                                            | ![](_static/green_circ10.png "green circle")
Policy Iteration                                                      | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Q-learning                                                            | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Regret Matching                                                       | Tabular      | [Hart &amp; Mas-Colell '00](https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0262.00153)                                                                                                                                                                                                                    | ![](_static/green_circ10.png "green circle")
Restricted Nash Response (RNR)                                        | Tabular      | [Johanson et al '08](http://johanson.ca/publications/poker/2007-nips-rnash/2007-nips-rnash.html)                                                                                                                                                                                                                | <font color="orange"><b>~</b></font>
SARSA                                                                 | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Value Iteration                                                       | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Advantage Actor-Critic (A2C)                                          | RL           | [Mnih et al. '16](https://arxiv.org/abs/1602.01783)                                                                                                                                                                                                                                                             | ![](_static/green_circ10.png "green circle")
Deep Q-networks (DQN)                                                 | RL           | [Mnih et al. '15](https://www.nature.com/articles/nature14236)                                                                                                                                                                                                                                                  | ![](_static/green_circ10.png "green circle")
Ephemeral Value Adjustments (EVA)                                     | RL           | [Hansen et al. '18](https://arxiv.org/abs/1810.08163)                                                                                                                                                                                                                                                           | <font color="orange"><b>~</b></font>
Proximal Policy Optimization (PPO)                                    | RL           | [Schulman et al. '18](https://arxiv.org/abs/1707.06347)                                                                                                                                                                                                                                                         | <font color="orange"><b>~</b></font>
Mean Field Proximal Policy Optimization (MF-PPO)                      | RL           | [Algumaei et al. '23](https://link.springer.com/chapter/10.1007/978-3-031-33377-4_28)                                                                                                                                                                                                                           | <font color="orange"><b>~</b></font>
AlphaZero (C++/LibTorch)                                              | MARL         | [Silver et al. '18](https://science.sciencemag.org/content/362/6419/1140)                                                                                                                                                                                                                                       | ![](_static/green_circ10.png "green circle")
AlphaZero (Python/TF)                                                 | MARL         | [Silver et al. '18](https://science.sciencemag.org/content/362/6419/1140)                                                                                                                                                                                                                                       | ![](_static/green_circ10.png "green circle")
Correlated Q-Learning                                                 | MARL         | [Greenwald &amp; Hall '03](https://www.aaai.org/Papers/ICML/2003/ICML03-034.pdf)                                                                                                                                                                                                                                | <font color="orange"><b>~</b></font>
Asymmetric Q-Learning                                                 | MARL         | [Kononen '04](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9458&rep=rep1&type=pdf)                                                                                                                                                                                                             | <font color="orange"><b>~</b></font>
Deep CFR                                                              | MARL         | [Brown et al. '18](https://arxiv.org/abs/1811.00164)                                                                                                                                                                                                                                                            | ![](_static/green_circ10.png "green circle")
DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE) | MARL         | [Foerster, Farquhar, Al-Shedivat et al. '18](http://proceedings.mlr.press/v80/foerster18a/foerster18a.pdf)                                                                                                                                                                                                      | <font color="orange"><b>~</b></font>
Exploitability Descent (ED)                                           | MARL         | [Lockhart et al. '19](https://arxiv.org/abs/1903.05614)                                                                                                                                                                                                                                                         | ![](_static/green_circ10.png "green circle")
(Extensive-form) Fictitious Play (XFP)                                | MARL         | [Heinrich, Lanctot, &amp; Silver '15](http://proceedings.mlr.press/v37/heinrich15.pdf)                                                                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")
Learning with Opponent-Learning Awareness (LOLA)                      | MARL         | [Foerster, Chen, Al-Shedivat, et al. '18](https://arxiv.org/pdf/1709.04326.pdf)                                                                                                                                                                                                                                 | <font color="orange"><b>~</b></font>
Nash Q-Learning                                                       | MARL         | [Hu &amp; Wellman '03](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf)                                                                                                                                                                                                                                     | <font color="orange"><b>~</b></font>
Neural Fictitious Self-Play (NFSP)                                    | MARL         | [Heinrich &amp; Silver '16](https://arxiv.org/abs/1603.01121)                                                                                                                                                                                                                                                   | ![](_static/green_circ10.png "green circle")
Neural Replicator Dynamics (NeuRD)                                    | MARL         | [Omidshafiei, Hennes, Morrill, et al. '19](https://arxiv.org/abs/1906.00190)                                                                                                                                                                                                                                    | <font color="red"><b>X</b></font>
Regret Policy Gradients (RPG, RMPG)                                   | MARL         | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026)                                                                                                                                                                                                                                             | ![](_static/green_circ10.png "green circle")
Policy-Space Response Oracles (PSRO)                                  | MARL         | [Lanctot et al. '17](https://arxiv.org/abs/1711.00832)                                                                                                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")
GQ-based ("all-actions") Policy Gradient (QPG)                         | MARL         | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026)                                                                                                                                                                                                                                             | ![](_static/green_circ10.png "green circle")
Regression CFR (RCFR)                                                 | MARL         | [Waugh et al. '15](https://arxiv.org/abs/1411.7974), [Morrill '16](https://poker.cs.ualberta.ca/publications/Morrill_Dustin_R_201603_MSc.pdf)                                                                                                                                                                   | ![](_static/green_circ10.png "green circle")
Rectified Nash Response (PSRO_rn)                                     | MARL         | [Balduzzi et al. '19](https://arxiv.org/abs/1901.08106)                                                                                                                                                                                                                                                         | <font color="orange"><b>~</b></font>
Mean-Field PSRO (MFPSRO)                                              | MARL         | [Muller et al. '21](https://arxiv.org/abs/2111.08350.08106)                                                                                                                                                                                                                                                     | <font color="orange"><b>~</b></font>
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC)                     | MARL         | [Bowling &amp; Veloso '02](https://www.sciencedirect.com/science/article/pii/S0004370202001212)                                                                                                                                                                                                                 | <font color="orange"><b>~</b></font>
&alpha;-Rank                                                          | Eval. / Viz. | [Omidhsafiei et al. '19](https://www.nature.com/articles/s41598-019-45619-9), [arXiv](https://arxiv.org/abs/1903.01373)                                                                                                                                                                                         | ![](_static/green_circ10.png "green circle")
Nash Averaging                                                        | Eval. / Viz. | [Balduzzi et al. '18](https://arxiv.org/abs/1806.02643)                                                                                                                                                                                                                                                         | <font color="orange"><b>~</b></font>
Replicator / Evolutionary Dynamics                                    | Eval. / Viz. | [Hofbaeur &amp; Sigmund '98](https://www.cambridge.org/core/books/evolutionary-games-and-population-dynamics/A8D94EBE6A16837E7CB3CED24E1948F8), [Sandholm '10](https://mitpress.mit.edu/books/population-games-and-evolutionary-dynamics)                                                                       | ![](_static/green_circ10.png "green circle")
Voting-as-Evaluation (VasE)                                           | Eval. / Viz. | [Lanctot et al. '23](https://arxiv.org/abs/2312.03121)                                                                                                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")