--- title: Using R to compute the Kantorovich distance date : 2013-07-02 --- &lead ```{r setup0, echo=FALSE} opts_chunk$set(fig.path="assets/fig/KantorovichWithR", tidy=FALSE) ``` ```{r setup, echo=FALSE, message=FALSE} library(rgl) library(rcdd) knit_hooks$set(webgl = hook_webgl) ``` The Kantorovich distance between two probability measures $\mu$ and $\nu$ on a finite set $A$ equipped with a metric $d$ is defined as $$d'(\mu,\nu)= \min_{\Lambda \in {\cal J}(\mu, \nu)} \int d(x,y)\textrm{d}\Lambda(x,y) $$ where ${\cal J}(\mu, \nu)$ is the set of all joinings of $\mu$ and $\nu$, that is, the set of all probability measures $\Lambda$ on $A \times A$ whose margins are $\mu$ and $\nu$. The Kantorovich distance can also be defined for more general metric spaces $(A,d)$ but our purpose is to show how to compute the Kantorovich distance in R when $A$ is finite. Actually the main part of the work will be to get the *extreme points* of the set of joinings ${\cal J}(\mu, \nu)$. Indeed, this set has a convex structure and the numerical application $${\cal J}(\mu, \nu) \ni \Lambda \mapsto \int d(x,y)\textrm{d}\Lambda(x,y)$$ is linear. Therefore, any extremal value of this application, in particular the Kantorovich distance $d'(\mu,\nu)$, is attained by an extreme joining $\Lambda \in {\cal J}(\mu, \nu)$. This latter point will be explained below, and we will also see that ${\cal J}(\mu, \nu)$ is a *convex polyhedron*. ## Computing extreme joinings in R What is an extreme joining in ${\cal J}(\mu, \nu)$ ? First of all, what is a joining in ${\cal J}(\mu, \nu)$ ? Consider for instance $A=\{a_1,a_2,a_3\}$ then a joining of $\mu$ and $\nu$ is given by a matrix $$P=\begin{pmatrix} p_{11} & p_{12} & p_{13} \\ p_{21} & p_{22} & p_{23} \\ p_{31} & p_{32} & p_{33} \end{pmatrix}$$ whose $(i,j)$-th entry $p_{ij}$ is the probability $p_{ij}=\Pr(X=i,Y=j)$ where $X \sim \mu$ and $Y \sim \nu$ are random variables on $A$. Given a distance $d$ on $A$, the Kantorovich distance $d'(\mu,\nu)$ is then the minimal possible value of the mean distance $\mathbb{E}[d(X,Y)]$ between $X$ and $Y$. Note that $\mathbb{E}[d(X,Y)]=\Pr(X \neq Y)$ when taking $d$ as the discrete $0-1$ distance on $A$. ### The $H$-representation of ${\cal J}(\mu, \nu)$ The possible values of the $p_{ij}$ satisfy the following three sets of linear equality/inequality constraints: $$\begin{cases} {\rm (1a) } \quad \sum_j p_{ij} = \mu(a_i) & \forall i \\ {\rm (1b) } \quad \sum_i p_{ij} = \nu(a_j) & \forall j \\ {\rm (2) } \quad p_{ij} \geq 0 & \forall i,j \\ \end{cases}.$$ Considering $P$ written in stacked form : $$P={\begin{pmatrix} p_{11} & p_{12} & p_{13} & p_{21} & p_{22} & p_{23} & p_{31} & p_{32} & p_{33} \end{pmatrix}}'$$ then the first set ${\rm (1a)}$ of linear equality constraints is $M_1 P = \begin{pmatrix} \mu(a_1) \\ \mu(a_2) \\ \mu(a_3) \end{pmatrix}$ with $$ M_1 = \begin{pmatrix} 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \end{pmatrix} $$ and the second set ${\rm (1b)}$ of linear equality constraints is $M_2 P = \begin{pmatrix} \nu(a_1) \\ \nu(a_2) \\ \nu(a_3) \end{pmatrix}$ with $$ M_2 = \begin{pmatrix} 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \end{pmatrix}. $$ With the terminology of the [cddlibb library](http://web.mit.edu/sage/export/tmp/cddlib-094b/doc/cddlibman.pdf), ${\cal J}(\mu, \nu)$ is a *convex polyhedron* and the linear constraints above define its *H-representation*. Schematically, one can imagine ${\cal J}(\mu, \nu)$ as a convex polyhedra embedded in a higher dimensional space: ```{r rglkantorovich1, echo=FALSE, webgl=TRUE} M <- rbind( c(0,0,0), c(-1,4,0), c(4,9,0), c(6,3,0) ) points3d(M,col='red') quads3d(M, col='green') text3d(t(t(M)-c(0,0,0.5)), text=c("a","b","c","d"), col="green") ``` Here ${\cal J}(\mu, \nu)$ is a $4$-dimensional convex polyhedron embedded in a $9$-dimensional space: its elements are given by $9$ parameters $p_{ij}$ but they are determined by only $4$ of them because of the linear equality constraints ${\rm (1a)}$ and ${\rm (1b)}$. ### Vertices of ${\cal J}(\mu, \nu)$ achieve the Kantorovich distance In the introduction we mentionned that the application $${\cal J}(\mu, \nu) \ni \Lambda \mapsto \int d(x,y)\textrm{d}\Lambda(x,y)$$ is linear and we claimed that consequently any of its extremal values is attained by an *extreme point* of ${\cal J}(\mu, \nu)$. Why ? Extreme points of a convex polyhedron are nothing but its vertices. Consider a point $x$ in a convex polyhedron which is not a vertex, as in the figure below. Then $x$ is a convex combination of the vertices $a$, $b$, $c$, $d$, and therefore the image $\ell(x)$ of $x$ by any linear function $\ell$ is the same convex combination of $\ell(a)$, $\ell(b)$, $\ell(c)$, $\ell(d)$. See figure below, where the polyhedron is in the $xy$-plane and the value of $\ell$ is given on the $z$-axis. ```{r rglkantorovich2, echo=FALSE, webgl=TRUE} a <- M[1,-3] b <- M[2,-3] c <- M[3,-3] d <- M[4,-3] f <- function(x,y) 4*x+6*y+11 x <- seq(-3,9,length=30) y <- seq(-3,12,length=30) z <- outer(x, y, f) #open3d() bg3d("white") material3d(col="black") plot3d(x, y, z, aspect=c(1, 1, 0.5), col = "lightblue", type="n", zlim=c(min(z)-3,max(z)+3), xlab = "", ylab = "", zlab = "", box=FALSE, axes=FALSE) surface3d(x, y, z, back="lines", col = "lightblue", add=TRUE, alpha=0.5) segments3d( # axe z rbind(c(0,0,0),c(0,0,80)), col='black' ) points3d(M, col='red', size=8) quads3d(M, col='green') fa <- c(a,f(a[1],a[2])) fb <- c(b,f(b[1],b[2])) fc <- c(c,f(c[1],c[2])) fd <- c(d,f(d[1],d[2])) fM <- rbind(fa,fb,fc,fd) points3d(fM, col='red', size=8) Segments <- rbind( M[1,], fa, M[2,], fb, M[3,], fc, M[4,], fd ) segments3d(Segments, col='red') zfa <- c(0,0,fa[3]) zfb <- c(0,0,fb[3]) zfc <- c(0,0,fc[3]) zfd <- c(0,0,fd[3]) segments3d( rbind( fa, zfa, fb, zfb, fc, zfc, fd, zfd ) , col='red') X <- c(2.25, 4, 0 ) fX <- c(X[1:2], f(X[1],X[2])) zfX <- c(0, 0, f(X[1],X[2])) points3d( rbind( X, fX,zfX ), col='green', size=8) points3d(rbind( zfa, zfb, zfc, zfd), col='red', size=8 ) lines3d( rbind( X, fX, zfX ), col='green') text3d(t(t(rbind(M,X))-c(0,0,6)), text=c("a","b","c","d","x"), col=c(rep("red",4),"green")) text3d(t(t(rbind( zfa, zfb, zfc, zfd, zfX ))-c(0.35,0.35,0)), text=c("l(a)","l(b)","l(c)","l(d)","l(x)"), col=c(rep("red",4),"green")) ``` Consequently, among the four values $\ell(a)$, $\ell(b)$, $\ell(c)$, $\ell(d)$ of $\ell$ at the vertices, there is at least one lower than $\ell(x)$ and at least one higher than $\ell(x)$. ### Getting the $V$-representation of ${\cal J}(\mu, \nu)$ and the Kantorovich distance with R The [rccd package](http://cran.r-project.org/web/packages/rcdd/vignettes/vinny.pdf) for R provides an R interface to the cddlib library. Its main function `scdd()` performs the conversion between *$H$-representation* and *$V$-representation* of a convex polyhedron, which is given by the set of vertices of the convex polyhedron. The set of vertices provide a representation of a polyhedron because each point in the polyhedron is a convex combination of the vertices; we say that the convex polyhedra is the convex hull of its extreme points. Let us use `scdd()` to get the vertices of ${\cal J}(\mu, \nu)$. We consider the following example: ```{r} mu <- c(1/7,2/7,4/7) nu <- c(1/4,1/4,1/2) ``` We firstly define its $H$-representation in a R object with the `makeH()` function, whose help page begins as follows: ```{r, eval=FALSE, message=FALSE} library(rcdd) help(makeH) ``` ![](assets/img/capture_makeH_help.png) The matrix `a1` and the vector `b1` specify the linear inequality constraints. Our linear inequality constraints ${\rm 2)}$ are defined in this way in R as follows: ```{r} m <- length(mu) n <- length(nu) a1 <- -diag(m*n) b1 <- rep(0,m*n) ``` The matrix `a2` and the vector `b2` specify the linear equality constraints. We simply construct `b2` by concatenating $\mu$ and $\nu$: ```{r} b2 <- c(mu,nu) ``` The matrix `a2` is obtained by stacking the two matrices $M_1$ and $M_2$ defined above. We construct it as follows in R: ```{r} M1 <- t(model.matrix(~0+gl(m,n)))[,] M2 <- t(model.matrix(~0+factor(rep(1:n,m))))[,] a2 <- rbind(M1,M2) ``` Then we can get the vertices of ${\cal J}(\mu, \nu)$ in a list as follows: ```{r} H <- makeH(a1,b1,a2,b2) # H-representation V <- scdd(H)$output[,-c(1,2)] # V-representation (vertices), but not convenient V <- lapply(1:nrow(V), function(i) matrix(V[i,], ncol=n, byrow=TRUE) ) ``` The command lines below show that there are `r length(V)` vertices (extreme joinings) and display the first five of them: ```{r} length(V) head(V,5) ``` Now, to get the Kantorovich distance, it suffices to evaluate $\mathbb{E}[d(X,Y)]$ for all extreme joinings, and to take the lower value. Consider for instance the discrete distance $d$. Set it as a matrix in R: ```{r} ( D <- 1-diag(3) ) sapply(V, function(P){ sum(D*P) }) ``` Then call `min()` to get the Kantorovich distance $d(\mu,\nu)$ ## Use exact arithmetic ! Are you satisfied ? I'm not: my probability measures $\mu$ and $\nu$ have rational weights, and then the coordinates of the vertices should be rational too. How to get the vertices in exact rational arithmetic ? Remember [my article about the Gauss hypergeometric function](http://stla.github.io/stlapblog/posts/BS_F21_v3.html). Let's load the `gmp` package: ```{r, message=FALSE} library(gmp) ``` Actually this is the point suggested by the message appearing when loading `rcdd`: ```{r, results='asis', eval=FALSE} library(rcdd) If you want correct answers, use rational arithmetic. See the Warnings sections added to help pages for functions that do computational geometry. ``` There's almost nothing to change to our previous code: everything is done in `rcdd` to handle rational arithmetic. You just have to set the input parameters as `bigq` numbers. Firstly define $\mu$ and $\nu$ as "`bigq`" objects: ```{r} ( mu <- as.bigq(c(1,2,4),7) ) ( nu <- as.bigq(c(1,1,1),c(4,4,2)) ) ``` Then define `b2` as before: ```{r} b2 <- c(mu,nu) ``` The other input parameters `a1`, `b1`, `a2` contain integers only, it is more straightforward to convert them. Actually these parameters must be in character mode to input them in the `makeH()` function, then we convert them as follows: ```{r} asab <- function(x) as.character(as.bigq(x)) a1 <- asab(a1) b1 <- asab(b1) a2 <- asab(a2) b2 <- asab(b2) ``` And now we run exactly the same code as before: ```{r} H <- makeH(a1,b1,a2,b2) # H-representation V <- scdd(H)$output[,-c(1,2)] # V-representation (vertices), but not convenient V <- lapply(1:nrow(V), function(i) matrix(V[i,], ncol=n, byrow=TRUE) ) head(V,5) ``` There's a slight problem when applying the `sapply()` function to a list of `bigq` objects. You can use the `lapply()` function which works well but returns a list: ```{r} D <- as.bigq(D) lapply(V, function(P){ sum(D*P) }) ``` We can't convert this list to a `bigq` vector with the `unlist()` function: this does not work, and this is why `sapply()` does not work too. To get a vector, use a loop: ```{r} distances <- as.bigq(rep(NA,length(V))) for(i in 1:length(V)){ distances[i] <- sum(D*V[[i]]) } distances ``` The column structure of `distances` is unexpected. To get a usual vector, type: ```{r} attr(distances,"nrow") <- NULL distances ``` The `min()` function directly works with `bigq` objects: ```{r} ( Kantorovich <- min(distances) ) ``` To get the numeric evaluation, use: ```{r} asNumeric(Kantorovich) ``` That's all.