# Available algorithms ![](_static/green_circ10.png "green circle"): thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers. ~: implemented but lightly tested. X: known problems; please see github issues. Algorithms | Category | Reference | Status --------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ Information Set Monte Carlo Tree Search (IS-MCTS) | Search | [Cowley et al. '12](https://ieeexplore.ieee.org/abstract/document/6203567) | ~ Max^n | Search | [Luckhart & Irani '86](https://www.semanticscholar.org/paper/An-Algorithmic-Solution-of-N-Person-Games-Luckhart-Irani/6ab06950332412d25b0915d7796d60040228decd) | ~ Minimax (and Alpha-Beta) Search | Search | [Wikipedia1](https://en.wikipedia.org/wiki/Minimax#Minimax_algorithm_with_alternate_moves), [Wikipedia2](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning), Knuth and Moore '75 | ![](_static/green_circ10.png "green circle") Monte Carlo Tree Search | Search | [Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search), [UCT paper](http://ggp.stanford.edu/readings/uct.pdf), [Coulom '06](https://hal.inria.fr/inria-00116992/document), [Cowling et al. survey](http://www.incompleteideas.net/609%20dropbox/other%20readings%20and%20resources/MCTS-survey.pdf) | ![](_static/green_circ10.png "green circle") Perfect Information Monte Carlo (PIMC) | Search | [Long et al. '10](https://ojs.aaai.org/index.php/AAAI/article/view/7562) | ~ Lemke-Howson (via nashpy) | Opt. | [Wikipedia](https://en.wikipedia.org/wiki/Lemke%E2%80%93Howson_algorithm), [Shoham & Leyton-Brown '09](http://masfoundations.org/) | ![](_static/green_circ10.png "green circle") ADIDAS | Opt. | [Gemp et al '22](https://arxiv.org/abs/2106.01285) | ~ Least Core via Linear Programming | Opt. | [Yan & Procaccia '21](https://ojs.aaai.org/index.php/AAAI/article/view/16721) | ~ Least Core via Saddle-Point (Lagrangian) Programming | Opt. | Gemp et al '24 | ~ Sequence-form linear programming | Opt. | [Koller, Megiddo, and von Stengel '94](http://theory.stanford.edu/~megiddo/pdf/stoc94.pdf),
[Shoham & Leyton-Brown '09](http://masfoundations.org/) | ![](_static/green_circ10.png "green circle") Shapley Values (incl. approximations via Monte Carlo sampling) | Opt. | [Mitchell et al. '22](https://www.jmlr.org/papers/v23/21-0439.html) | ~ Stackelberg equilibrium solver | Opt. | [Conitzer & Sandholm '06](https://users.cs.duke.edu/~conitzer/commitEC06.pdf) | ~ MIP-Nash | Opt. | [Sandholm et al. '05](https://dl.acm.org/doi/10.5555/1619410.1619413) | ~ Magnetic Mirror Descent (MMD) with dilated entropy | Opt. | [Sokota et al. '22](https://arxiv.org/abs/2206.05825) | ~ Counterfactual Regret Minimization (CFR) | Tabular | [Zinkevich et al '08](https://poker.cs.ualberta.ca/publications/NIPS07-cfr.pdf), [Neller & Lanctot '13](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf) | ![](_static/green_circ10.png "green circle") CFR against a best responder (CFR-BR) | Tabular | [Johanson et al '12](https://poker.cs.ualberta.ca/publications/AAAI12-cfrbr.pdf) | ![](_static/green_circ10.png "green circle") Exploitability / Best response | Tabular | [Shoham & Leyton-Brown '09](http://masfoundations.org/) | ![](_static/green_circ10.png "green circle") External sampling Monte Carlo CFR | Tabular | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf) | ![](_static/green_circ10.png "green circle") Fixed Strategy Iteration CFR (FSICFR) | Tabular | [Neller & Hnath '11](https://cupola.gettysburg.edu/csfac/2/) | ~ Mean-field Ficticious Play for MFG | Tabular | [Perrin et. al. '20](https://arxiv.org/abs/2007.03458) | ~ Online Mirror Descent for MFG | Tabular | [Perolat et. al. '21](https://arxiv.org/abs/2103.00623) | ~ Munchausen Online Mirror Descent for MFG | Tabular | [Lauriere et. al. '22](https://arxiv.org/pdf/2203.11973) | ~ Fixed Point for MFG | Tabular | [Huang et. al. '06](https://zbmath.org/?q=an:1136.91349) | ~ Boltzmann Policy Iteration for MFG | Tabular | [Lauriere et. al. '22](https://arxiv.org/pdf/2203.11973) | ~ Outcome sampling Monte Carlo CFR | Tabular | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf) | ![](_static/green_circ10.png "green circle") Policy Iteration | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) | ![](_static/green_circ10.png "green circle") Q-learning | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) | ![](_static/green_circ10.png "green circle") Regret Matching | Tabular | [Hart & Mas-Colell '00](https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0262.00153) | ![](_static/green_circ10.png "green circle") Restricted Nash Response (RNR) | Tabular | [Johanson et al '08](http://johanson.ca/publications/poker/2007-nips-rnash/2007-nips-rnash.html) | ~ SARSA | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) | ![](_static/green_circ10.png "green circle") Value Iteration | Tabular | [Sutton & Barto '18](http://incompleteideas.net/book/the-book-2nd.html) | ![](_static/green_circ10.png "green circle") Advantage Actor-Critic (A2C) | RL | [Mnih et al. '16](https://arxiv.org/abs/1602.01783) | ![](_static/green_circ10.png "green circle") Deep Q-networks (DQN) | RL | [Mnih et al. '15](https://www.nature.com/articles/nature14236) | ![](_static/green_circ10.png "green circle") Ephemeral Value Adjustments (EVA) | RL | [Hansen et al. '18](https://arxiv.org/abs/1810.08163) | ~ Proximal Policy Optimization (PPO) | RL | [Schulman et al. '18](https://arxiv.org/abs/1707.06347) | ~ AlphaZero (C++/LibTorch) | MARL | [Silver et al. '18](https://science.sciencemag.org/content/362/6419/1140) | ![](_static/green_circ10.png "green circle") AlphaZero (Python/TF) | MARL | [Silver et al. '18](https://science.sciencemag.org/content/362/6419/1140) | ![](_static/green_circ10.png "green circle") Correlated Q-Learning | MARL | [Greenwald & Hall '03](https://www.aaai.org/Papers/ICML/2003/ICML03-034.pdf) | ~ Asymmetric Q-Learning | MARL | [Kononen '04](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9458&rep=rep1&type=pdf) | ~ Deep CFR | MARL | [Brown et al. '18](https://arxiv.org/abs/1811.00164) | ![](_static/green_circ10.png "green circle") DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE) | MARL | [Foerster, Farquhar, Al-Shedivat et al. '18](http://proceedings.mlr.press/v80/foerster18a/foerster18a.pdf) | ~ Exploitability Descent (ED) | MARL | [Lockhart et al. '19](https://arxiv.org/abs/1903.05614) | ![](_static/green_circ10.png "green circle") (Extensive-form) Fictitious Play (XFP) | MARL | [Heinrich, Lanctot, & Silver '15](http://proceedings.mlr.press/v37/heinrich15.pdf) | ![](_static/green_circ10.png "green circle") Learning with Opponent-Learning Awareness (LOLA) | MARL | [Foerster, Chen, Al-Shedivat, et al. '18](https://arxiv.org/pdf/1709.04326.pdf) | ~ Nash Q-Learning | MARL | [Hu & Wellman '03](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf) | ~ Neural Fictitious Self-Play (NFSP) | MARL | [Heinrich & Silver '16](https://arxiv.org/abs/1603.01121) | ![](_static/green_circ10.png "green circle") Neural Replicator Dynamics (NeuRD) | MARL | [Omidshafiei, Hennes, Morrill, et al. '19](https://arxiv.org/abs/1906.00190) | X Regret Policy Gradients (RPG, RMPG) | MARL | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026) | ![](_static/green_circ10.png "green circle") Policy-Space Response Oracles (PSRO) | MARL | [Lanctot et al. '17](https://arxiv.org/abs/1711.00832) | ![](_static/green_circ10.png "green circle") Q-based ("all-actions") Policy Gradient (QPG) | MARL | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026) | ![](_static/green_circ10.png "green circle") Regularized Nash Dynamics (R-NaD) | MARL | [Perolat, De Vylder, et al. '22](https://arxiv.org/abs/2206.15378) | ![](_static/green_circ10.png "green circle") Regression CFR (RCFR) | MARL | [Waugh et al. '15](https://arxiv.org/abs/1411.7974), [Morrill '16](https://poker.cs.ualberta.ca/publications/Morrill_Dustin_R_201603_MSc.pdf) | ![](_static/green_circ10.png "green circle") Rectified Nash Response (PSRO_rn) | MARL | [Balduzzi et al. '19](https://arxiv.org/abs/1901.08106) | ~ Mean-Field PSRO (MFPSRO) | MARL | [Muller et al. '21](https://arxiv.org/abs/2111.08350.08106) | ~ Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC) | MARL | [Bowling & Veloso '02](https://www.sciencedirect.com/science/article/pii/S0004370202001212) | ~ α-Rank | Eval. / Viz. | [Omidhsafiei et al. '19](https://www.nature.com/articles/s41598-019-45619-9), [arXiv](https://arxiv.org/abs/1903.01373) | ![](_static/green_circ10.png "green circle") Nash Averaging | Eval. / Viz. | [Balduzzi et al. '18](https://arxiv.org/abs/1806.02643) | ~ Replicator / Evolutionary Dynamics | Eval. / Viz. | [Hofbaeur & Sigmund '98](https://www.cambridge.org/core/books/evolutionary-games-and-population-dynamics/A8D94EBE6A16837E7CB3CED24E1948F8), [Sandholm '10](https://mitpress.mit.edu/books/population-games-and-evolutionary-dynamics) | ![](_static/green_circ10.png "green circle") Voting-as-Evaluation (VasE) | Eval. / Viz. | [Lanctot et al. '23](https://arxiv.org/abs/2312.03121) | ![](_static/green_circ10.png "green circle")