{ "metadata": { "name": "", "signature": "sha256:bd1260c9e4f218e8b512539ac98f627e49966a42fdd727bf542b5c578a0a6219" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Advanced IPython breakout session\n", "===\n", "Created for the Berkeley Python Bootcamp August 2012 by Henrik Brink <brink@berkeley.edu>.\n", "\n", "Introduction\n", "---\n", "In this breakout session we're going to try some of the more advanced features of IPython, mostly serving to optimize your code by using other languages (such as C/Cython) inline as well as seamless parallelization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The problem\n", "\n", "We are going to define a simple problem that some of you might have run into during your work. Given some function that is complicated to calculate, you might want to use some sampling technique to trace out the function more efficiently than the brute force approach.\n", "\n", "Let's first define some function that is supposed to be complicated. And pretty. Sorry if it's neither." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pylab\n", "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "\"\"\"\n", "A complicated function consisting of 3 2D gaussians.\n", "\"\"\"\n", "def gaussian(x, y):\n", " c_x1 = 0.5; c_y1 = 0.2; w_x1 = 0.15; w_y1 = 0.04\n", " c_x2 = 0.7; c_y2 = 0.7; w_x2 = 0.05; w_y2 = 0.05\n", " c_x3 = 0.3; c_y3 = 0.7; w_x3 = 0.05; w_y3 = 0.07\n", " return \\\n", " exp( -( ((c_x1 - x)/w_x1)**2. + ((c_y1 - y)/w_y1)**2. )/2 ) + \\\n", " exp( -( ((c_x2 - x)/w_x2)**2. + ((c_y2 - y)/w_y2)**2. )/2 ) + \\\n", " exp( -( ((c_x3 - x)/w_x3)**2. + ((c_y3 - y)/w_y3)**2. )/2 )" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how it looks (and pretend this is computationally intractable)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "resolution = 50.\n", "inds = indices((resolution,resolution))/resolution\n", "imshow( gaussian(*inds).T, origin=\"lower\", interpolation=\"nearest\", extent=(0,1,0,1) );" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait, who is that?\n", "\n", "### Metropolis-Hastings sampler\n", "\n", "Anyways, let's look at a way to sample arbitrary functions: The [Metropolis-Hastings algorithm](http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm).\n", "\n", "A very simple Metropolis-Hastings 2D random walker is defined below, along with a function for plotting the points returned by the walker:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"\"\"\n", "A very simple Metropolis-Hastings random walker in 2D\n", "\"\"\"\n", "def random_walker(func, n_steps=1000, start=[0,0], burn_in=100):\n", " points = []\n", " point = start\n", " for i in np.arange(n_steps):\n", " # Our jumping distribution is the [-0.1,0.1] Uniform distribution around the current point\n", " new_point = point + (rand(2)-0.5)/5.\n", " accept = func(*new_point) / func(*point)\n", " if accept >= 1 or rand() <= accept:\n", " point = new_point\n", " if i > burn_in: points.append(point)\n", " return asarray(points)\n", "\n", "\"\"\"\n", "A function to plot the points\n", "\"\"\"\n", "def plot_points(points):\n", " h2,_,_ = histogram2d(points[:,0], points[:,1], [resolution,resolution], range=[[0,1],[0,1]])\n", " imshow(h2.T, origin=\"lower\", extent=(0,1,0,1), interpolation='nearest')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "In the cell below, sample the function defined above using the Metropolis-Hastings sampler and plot the result." ] }, { "cell_type": "code", "collapsed": true, "input": [ "# Yes, you're up!" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try different walker parameters and see what happens.\n", "\n", "What's wrong with that image?\n", "\n", "### [Optional] Exercise 2\n", "\n", "Time the random walker with n_steps=1e5 steps." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Time it" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3\n", "\n", "In the cell below, finish the code to start several random walkers at different locations and combine the output to show a more complete picture. We're creating 10 random starting positions, so you can go an order of magnitude down in sample size. Remember that this will take 10 times whatever you got in Exercise 2." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Fill out the missing pieces\n", "starts = rand(10,2)\n", "points = []\n", "for start in starts:\n", " #points.append( ... )\n", "plot_points( concatenate(points) )" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hopefully, you were able to sample a larger part of the function. But this comes at the cost of far more computing time...\n", "\n", "Parallelization\n", "----\n", "\n", "At this point it would be very obvious to cut down on computation time by parallelizing the independent random walkers. But who wants to go through all the hastleof parallization?\n", "\n", "Luckily, IPython makes it much easier to start parallel computing clusters and use those from IPython. If you're using version 0.13 or later of IPython, you can go to the Notebook Dashboard that appeared when you started the notebook server (you can get there again by hitting File -> Open above). Select the Clusters tab, insert the number of cores on your computer (usually 2 or 4) and hit Start." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only thing you have to do now is enable parallel support in your running notebook:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.parallel import Client\n", "c = Client()\n", "clust = c[:]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have all of the cores in the cluster available, and we can execute code on all of them by prefixing with a `%px`. We will use this to load the pylab interface on all engines:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%px %pylab" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should see some startup output from all the engines in the cluster.\n", "\n", "### Exercise 4\n", "\n", "To run all the code in a notebook shell, you can start the whole cell with `%%px`. \n", "\n", "First, since we already defined the functions `random_walker` and `gaussian`, we need to push them to the external nodes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "clust.push(dict(random_walker=random_walker, gaussian=gaussian));" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, run the `random_walker` example again from exercise 1:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px\n", "# ..." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interacting with the engines\n", "\n", "But how to get the results? We need to use the parallel Client object to get the variables defined on each machine. Below we do this and plot the result:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "plot_points(concatenate( c[:][\"points\"] ))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This hints to a pretty nice feature. With our cluster defined, we can get a single engine with `c[id]`, and we can extract their variables using a standard Dict syntax. By using the --targets argument with `%%px`, we can control exactly what code is run on which engines. See more in the [IPython manual](http://ipython.org/ipython-doc/rel-0.13/parallel/magics.html).\n", "\n", "It's out of the scope of this session, but the very cool thing about these engines is that they need not run on the local machine. In fact, they can run on any supercomputer in the world and still be used in exactly the same way from the Notebook or other IPython clients.\n", "\n", "Other languages in the Notebook\n", "----\n", "\n", "But what if we're still not satisfied with the performance? In the IPython Notebook we can turn any cell into C code using the excellent Cython language and `%cythonmagic` feature. Note that running this requires Cython and a C compiler. You most likely have a C compiler, and Cython can easily be installed with easy_install (or pip):" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "$ sudo easy_install Cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now load Cython functionality into the Notebook:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%px %load_ext cythonmagic" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 5\n", "\n", "Now we can transform any cell into a Cython cell by starting if off with `%%cython`. We will now use this to optimize our code even further, by reimplementing the 2D gaussian function in Cython code and hopefully gain some efficiency. Remember that this function is called at least twice pr. step by a random walker. Fill out the missing parts below. Hint: Cython is often just copy-pastable from regular Python. Hint 2: Use the imported pow instead of \"**\" for exponentiation." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px\n", "%%cython\n", "cimport cython\n", "from math import pow, exp\n", "\"\"\"\n", "A complicated function consisting of 3 2D gaussians, now in C with Cython.\n", "\"\"\"\n", "cdef double c_x1 = 0.5, c_y1 = 0.2, w_x1 = 0.15, w_y1 = 0.04\n", "cdef double c_x2 = 0.7, c_y2 = 0.7, w_x2 = 0.05, w_y2 = 0.05\n", "cdef double c_x3 = 0.3, c_y3 = 0.7, w_x3 = 0.05, w_y3 = 0.07\n", "def gaussian_cython(x, y):\n", " return \\\n", " #...\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Optional] Exercise 6\n", "\n", "Now run the random walker using the cython function as a replacement and time it. Any difference?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Your timing code here" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Putting everything together\n", "\n", "We can now do something very cool: dispath the Cython'ized version of our code to our cluster of engines. Before this works, go back and load the cythonmagic and define the Cython function on all the cores (just place a `%%px` at the top, remember?).\n", "\n", "### Exercise 7\n", "\n", "Once again, fill out the missing parts below." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%px\n", "# ..." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Plot it" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fast eh?" ] } ], "metadata": {} } ] }