{ "metadata": { "name": "", "signature": "sha256:5ac2eda4f8e4ed177155cc89ca2e5f3a22fedddb7b2f05305189ef6a9667ed16" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Data Management with pandas (Python) 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "May, 2014\n", "\n", "Chang Y. Chung" ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas provide data structures that are flexible containers for lower dimensional data. Two main objects are `Series` and `DataFrame`:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* `Series`: 1D labeled NumPy `ndarray`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* `DataFrame`: 2D labeled, table of potentially heterogeneously-typed `Series`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Also, there are `TimeSeries` (`Series` indexed by datetimes) and `Panel` (3D labeled table of `DataFrame`s)." ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`Series` is a NumPy `ndarray` with an axis `index` and `name`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an `ndarray`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "import pandas as pd\n", "\n", "data = np.array([23, 31, 2, 3])\n", "data" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "array([23, 31, 2, 3])" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an `ndarray`, super-charged." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series(data, index=['a', 'b', 'c', 'd'], name='mySeries')\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "a 23\n", "b 31\n", "c 2\n", "d 3\n", "Name: mySeries, dtype: int64" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "s.index" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "Index([u'a', u'b', u'c', u'd'], dtype='object')" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`Series` is like `ndarray`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get by an integer index:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s[0]" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "23" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Slicing returns a view:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s3 = s[:3]\n", "s3[0] = 999\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "a 999\n", "b 31\n", "c 2\n", "d 3\n", "Name: mySeries, dtype: int64" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "a 999\n", "b 31\n", "c 2\n", "d 3\n", "Name: mySeries, dtype: int64" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Boolean indexing works, as well." ] }, { "cell_type": "code", "collapsed": false, "input": [ "above_median = s[s > s.median()]\n", "above_median" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "a 999\n", "b 31\n", "Name: mySeries, dtype: int64" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Mathematical operations and functions are *vectorized*." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s ** 0.5" ], "language": "python", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "a 31.606961\n", "b 5.567764\n", "c 1.414214\n", "d 1.732051\n", "Name: mySeries, dtype: float64" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "np.sqrt(s)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "a 31.606961\n", "b 5.567764\n", "c 1.414214\n", "d 1.732051\n", "Name: mySeries, dtype: float64" ] } ], "prompt_number": 9 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`Series` behaves like a dictionary, as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create a Series from a dictionary." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series({\n", " 'Tom': 0,\n", " 'Mike': 1,\n", " 'Jane': 2,\n", " 'Mary': 3,\n", " 'Claudia': 4\n", "})\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "Claudia 4\n", "Jane 2\n", "Mary 3\n", "Mike 1\n", "Tom 0\n", "dtype: int64" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In-place sort by value. Default is ascending=True." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s.sort()\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "Tom 0\n", "Mike 1\n", "Jane 2\n", "Mary 3\n", "Claudia 4\n", "dtype: int64" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "Tom 0\n", "Mike 1\n", "Jane 2\n", "Mary 3\n", "Claudia 4\n", "dtype: int64" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`sort_index()` returns a new Series, sorted by index." ] }, { "cell_type": "code", "collapsed": false, "input": [ "t = s.sort_index()\n", "t" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "Claudia 4\n", "Jane 2\n", "Mary 3\n", "Mike 1\n", "Tom 0\n", "dtype: int64" ] } ], "prompt_number": 13 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Getting and setting are just like a dictionary." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s['Claudia'] = 23\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "Tom 0\n", "Mike 1\n", "Jane 2\n", "Mary 3\n", "Claudia 23\n", "dtype: int64" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "'Tom' in s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "True" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Use `np.nan` for default value for `get()`. Otherwise, `get()` will return `None`, when not found." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s.get('NOBODY', np.nan)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "nan" ] } ], "prompt_number": 16 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`Series` automatically *aligns* data based on index." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s1 = pd.Series({'a':1, 'b':2, 'c':3})\n", "s1" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "a 1\n", "b 2\n", "c 3\n", "dtype: int64" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "s2 = pd.Series({'b':20, 'c':10, 'd':9})\n", "s2" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "b 20\n", "c 10\n", "d 9\n", "dtype: int64" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The resulting index is a *union* of the input indices." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = s1 + s2\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "a NaN\n", "b 22\n", "c 13\n", "d NaN\n", "dtype: float64" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Missing values (`np.nan` and `None`) can be dropped easily." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s.dropna()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "b 22\n", "c 13\n", "dtype: float64" ] } ], "prompt_number": 20 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Example: Age-Specific Mortality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following German Rodriguez's nice Stata code available at his [web page](http://tinyurl.com/lndec87), let's graph Age-Specific Mortality." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to read a bit newer data on White Population in US, 2009, from CDC's ftp server address found in the publication:\n", "\n", "> Arias, Elizabeth (2014) \"United States Life Tables, 2009\" *National Vital Statistics Reports* Vol. 62, No. 7. Hyattsville, MD: National Center for Health Statistics. Available [here](http://www.cdc.gov/nchs/data/nvsr/nvsr62/nvsr62_07.pdf)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download an excel file from CDC's ftp site and write it locally. `urllib.urlretrieve()` returns a tuple of (local) filename, and the header information, if successful." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import urllib\n", "\n", "nchs = r'ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS' \n", "ftp = r'{0}/Publications/NVSR/62_07/Table04.xls'.format(nchs)\n", "xls = 'white2009.xls'\n", "urllib.urlretrieve(ftp, xls)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "('white2009.xls', )" ] } ], "prompt_number": 21 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`pandas.read_excel` function can read the excel file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use zero-based indices for column and row numbers." ] }, { "cell_type": "code", "collapsed": false, "input": [ "xls = 'white2009.xls'\n", "options = {'header': 2, 'parse_cols': [2], 'skiprows': 6, 'skip_footer': 2}\n", "df = pd.read_excel(xls, 'Sheet1', **options)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We then copy only one column (Series) out of the DataFrame returned from `read_excel()`. Let's see first a few rows." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lx = df['l(x)'].copy()\n", "lx.head()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "0 100000.000000\n", "1 99472.078125\n", "2 99434.945312\n", "3 99409.906250\n", "4 99390.539062\n", "Name: l(x), dtype: float64" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And the last a few rows." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lx.tail()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "96 7127.180664\n", "97 5392.454102\n", "98 3973.862549\n", "99 2848.187988\n", "100 1982.758057\n", "Name: l(x), dtype: float64" ] } ], "prompt_number": 24 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We do some wrangling (munging, recoding, ...) and graphing" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "\n", "# convert to per-person\n", "lx /= 100000.0\n", "\n", "# cummulative hazard\n", "Hx = - np.log(lx)\n", "\n", "# shift(-1) brings up the value of the next row\n", "hx = Hx.shift(-1) - Hx \n", "\n", "# take the mid-range value for age\n", "hx.index += 0.5\n", "\n", "# finally\n", "hx.plot(logy=True)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 25, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEDCAYAAADdpATdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X1cVGXaB/AfCutWtpGbUgJFCwiiKBbqLoWNmUu6wqZW\nQhsFaC8aGWSlbvsEuIlg5UtrVtb6EhWiWQ+WMqXmuK4v8JRZFqioTCHYi6iZ+IKM9/PHHRxR0YE5\nM+fMnN/38+Hz2TPLnLnmarzm5jr3uW8vIYQAEREZSgetAyAiItdj8SciMiAWfyIiA2LxJyIyIBZ/\nIiIDYvEnIjIgFn8iIgNi8SciMiBvZ568vr4eEyZMQKdOnWAymXDfffc58+WIiMhOTh35v//++7j3\n3nuxYMECrFy50pkvRUREbdDm4p+WlgY/Pz9ERka2eNxsNiM8PByhoaHIz88HANTU1CAwMBAA0LFj\nRxXCJSIiNbS5+KempsJsNrd4zGazIT09HWazGeXl5SgsLERFRQUCAgJQXV0NADhz5ow6ERMRkcPa\nXPxjY2Nx9dVXt3isrKwMISEhCAoKgo+PDxITE1FcXIxRo0ZhxYoVmDBhAhISElQLmoiIHKPKBd+z\n2zsAEBAQgNLSUlx++eVYuHChGi9BREQqUqX4e3l5tfu5/v7+qK2tVSMMIiLDCA4Oxp49e9r9fFVm\n+/j7+zf39gGguroaAQEBdj23trYWQgj+CIGsrCzNY9DLD3PBXDAXF//Zu3evQ3VbleIfHR2NyspK\nWK1WNDQ0oKioqE09/uzsbFgsFjVCcWtWq1XrEHSDuVAwFwrmArBYLMjOznb4PG0u/klJSYiJicHu\n3bsRGBiIRYsWwdvbG/PmzUNcXBwiIiIwZswY9OzZ0+5zZmdnw2QytTUUIiLDMZlMqhT/Nvf8CwsL\nL/j4sGHDMGzYsHYF0VT8jf4FkJKSonUIusFcKJgLBXMhR/5qdEq8hBCa7uHr5eUFjUMgInI7jtZO\nLuymI7zuoWAuFMyFgrlQjy6KPy/4EhHZR60Lvmz7EBG5IbZ9iIiozXRR/Nn2kZgDBXOhYC4UzIV6\nbR+nbuZiLzXeCBGRETRNi8/JyXHoPOz5ExG5IY/o+bPtQ0RkH8728UAWi8Xwdzk3YS4UzIWCuVB4\nxMifiIhciyN/IiI35BEjf/b8iYjsw56/B2I/U8FcKJgLBXMBfPkl0KcP0KGDB4z8iYiodUIA69cD\nQ4YAd90F/Pij4+fkyJ+ISKeEANasAXJygIMHgb//HbjvPsDHx/HaqYs7fImISCEEsG4dkJUFHDoE\n/M//AGPGAB07qvcabPvoCC96K5gLBXOhMEIu/vtfwGQCHnsMmDAB+PprOdpXs/ADOhn5cxtHIjK6\nL74Ann0WKC8HsrOB++8HvC9QobmNIxGRB9i7F/jHP4ANG2TxHzcO6NTp0s/ziHn+RERG89NPwMSJ\nwIABQEQEsHu3bPXYU/jVwOKvI0boZ9qLuVAwFwpPyMWJE0BeHtCzpzyuqJAXdDt3dm0cuuj5ExF5\nOiGAwkJg6lTg5puBLVuA0FDt4mHPn4jIybZuBTIzgdOngVmzgEGDHD8ne/5ERDpVUyNn7YweDTz6\nKFBWpk7hVwOLv454Qj9TLcyFgrlQuEsuTp0CZswA+vYFbrgB2LULePBBoIOOKq4uev6c509EnmLV\nKuCJJ4DevYHSUiA4WN3zc54/EZGO7NsHZGTIUf7LLwNxcc59Pfb8iYg0dPKkXHhtwAAgJgb46ivn\nF341sPjriLv0M12BuVAwFwq95eKTT4DISFnwt20Dpkxx3U1ajtJFz5+IyJ3U1soWz2efAfPmAcOH\nax1R27HnT0RkJ5sNmD9ftnkefVSuxXPZZdrEwvX8iYhc4IsvgIcfBi6/HNi4UVmewV2x568jeutn\naom5UDAXCi1yUV8PPP20vIg7fjxgsbh/4QecXPyrqqowbtw43HPPPc58GSIip2i6oFtbKzdVSUsD\nvLy0jkodLun533PPPVi+fPmFA2DPn4h05uBB4MknZXvn1VeBO+/UOqLzuWSef1paGvz8/BAZGdni\ncbPZjPDwcISGhiI/P7/dQRAR6YEQwLvvyrtzr7kG2LFDn4VfDXYV/9TUVJjN5haP2Ww2pKenw2w2\no7y8HIWFhaioqEBBQQEyMzNRW1vrlIA9GXu7CuZCwVwonJmL/fuB+Hi5Js/KlXL1TVevse9KdhX/\n2NhYXH311S0eKysrQ0hICIKCguDj44PExEQUFxcjOTkZs2fPRvfu3XHo0CE8+uij2L59O/8yICJd\nEgJYsADo1w/o3x/4/HN5t66na/dUz5qaGgQGBjYfBwQEoLS0tMXvdOnSBa+99tolz5WSkoKgoCAA\ngK+vL6KiopoXeWv6pjfCsclk0lU8PNbPcRO9xKPVcdNjap3v3XctePFFwNvbhPXrgYMHLdi8WT/v\n9+xji8WCxYsXA0BzvXSE3Rd8rVYr4uPjsWPHDgDAihUrYDab8cYbbwAA3n77bZSWluJf//pX2wLg\nBV8icrEzZ4BXXpE3a02eLDda8Xazu540W9jN398f1dXVzcfV1dUICAho17mys7PPG+EYEXOgYC4U\nzIVCjVzs3QuYTEBREbBpk5zD706F32KxIDs72+HztLv4R0dHo7KyElarFQ0NDSgqKkJCQkK7ztW0\nnj8RkbMIIadtDhwI3HUXsGEDEBamdVRtZzKZVCn+drV9kpKSsGHDBtTV1aFbt26YNm0aUlNTUVJS\ngoyMDNhsNowdOxZTp05tewBs+xCRk337rVya4fBhYMkSz7hD19HaqYuF3bKysmDiTl5EpLLGRrmx\nSm4uMGmS+7V4LsTy605eOTk57l/8OfKXzp7FYHTMhYK5ULQlF59/Lkf7V18NvPYaEBLi3NhczSN2\n8uIFXyJSyy+/yLX2//IXuZfumjWeVfjVuuDLkT8ReQQhgOJiYOJE4I47gJkz5RINnorr+ROR4e3b\nJ4v+nj3AW2/JqZx0cWz76AhzoGAuFMyF4txcnDgBTJsml2O49Va5l66nF3612j66GPmr8UaIyDiE\nAJYvB555Brj5Znlx94YbtI7KNZpmRubk5Dh0Hvb8icitfPaZvKBbXw/Mnu35I/3WeMRsHyKiS/n+\ne7mTVnw8kJIivwSMWvjVoIviz56/xBwomAuF0XPR0AC88ILcYKW+3oKdO4Fx44COHbWOTBvs+ROR\nxyspkS2e4GBg82a5l+5VV2kdlbbY8ycij7Vnj1xmeedOYM4cecMWtcSePxF5jOPHgX/8A/jjH+XU\nza+/ZuF3Fl0Uf/b8JeZAwVwojJALIYD33pOrbe7bB3z5pdxkpVOnlr9nhFxcCnv+ROQRtm+Xa/Ac\nOSKXW+YMnotjz5+I3FptLZCVBXz4odxO0cgzeNqDPX8iciuHDwNTpgCRkXK55Z07gUceYeF3NRZ/\nHWE/U8FcKDwlF8eOyU1VevSQXwBffSVX3vT1tf8cnpILPWDxJyKnOnkSmDsXCA0FduyQm6a//jrg\n7691ZMami54/t3Ek8jxCAEuXAlOnyhbP888DfftqHZX74zaORKRb//2v3DPXZgNmzQIGDdI6Is/D\nC74ehP1MBXOhcKdcVFYCo0cD990HPP44UFambuF3p1zoHYs/ETnsp5/kTlp/+hPQvz+waxdw//1A\nB1YY3WLbh4ja7eefZVtn3jw52n/uOaBrV62jMga2fYjI5Y4fl8ssh4YC334r19b/179Y+N0Ji7+O\nsJ+pYC4UesrFqVOyyIeEAKWlwPr1wOLFwI03uub19ZQLd6ebtX041ZNIv06cABYuBPLz5XTNVauA\nfv20jsqYmqZ6Ooo9fyJq1bFj8oasl16SF3KffRYYMEDrqAhwvHbqYuRPRPpy8KBs78yfDwweDKxe\nDURFaR0VqYk9fx1hP1PBXChcmYv9++Xyyj16AAcOyKUYli3TT+Hn50I9LP5EBKsVePRRoE8fwMcH\n+OYbYMEC+SVAnok9fyID279frrmzfLlcVjkzk9M13QXn+RNRm/3wA5CRIWfu+PoCu3fL5ZZZ+I2D\nxV9H2M9UMBcKNXNx5IjcID0iQq66+c03QF4e8Pvfq/YSTsXPhXqcXvyLi4vx8MMPIzExEWvWrHH2\nyxHRBdTXyzn6oaFy+8Rt2+Qa+9deq3VkpBWX9fyPHDmCp556Cm+++WbLANjzJ3Ka48eB116TO2YN\nGiT3yu3ZU+uoSA0u6/mnpaXBz88PkZGRLR43m80IDw9HaGgo8vPzW33+888/j/T09HYHSkT2++UX\n4MUX5TIMmzYBn3wip2yy8FMTu4t/amoqzGZzi8dsNhvS09NhNptRXl6OwsJCVFRUoKCgAJmZmait\nrYUQApMnT8awYcMQpZfJwjrFfqaCuVC0JRcHD8qVNf/wB7nY2urVwIoVcgqnJ+DnQj123+EbGxsL\nq9Xa4rGysjKEhIQgKCgIAJCYmIji4mJMmTIFycnJAICXX34Z69atw9GjR7Fnzx488sgjqgVPRNKe\nPXJp5aVLgbvvBrZskaN+otY4tLxDTU0NAgMDm48DAgJQWlra4ncmTpyIiRMnOvIyhsGF7RTMheJi\nudi8Wa6785//yJu0KioAPz/XxeZq/Fyox6Hi7+XlpUoQKSkpzX89+Pr6Iioqqvk/ctOfeTzmMY/l\nsc0GHDpkwqxZwLffWnD33YDVasIVV8j/v6JCX/HyWJ1ji8WCxYsXA0BzvXSIaIOqqirRu3fv5uMt\nW7aIuLi45uPc3FyRl5fXllOKNobg0davX691CLrBXCiacnH0qBBz5ggRFCRETIwQK1YI0diobWyu\nxs+FwtHa6dA8/+joaFRWVsJqtaKhoQFFRUVISEho83mys7Obv+GIqKWffgKmTJEbpmzaJPv6mzYB\no0YBHTtqHR25msViQXZ2tsPnsXuef1JSEjZs2IC6ujp069YN06ZNQ2pqKkpKSpCRkQGbzYaxY8di\n6tSpbQuA8/yJLmjHDtnPX7kSSE6WyzG4ascs0j9Ha6cuFnbLysqCiTt5EUEIYN06OUf/yy+BiRPl\ngmtdumgdGemF5dedvHJycty/+HPkL1ksFn4B/spouWhokO2c2bPl/37qKeC++4BOnYyXi4thLhTc\nyYvIjR08KLdJfOUVoFcvYPp04M47gQ5ccpGcTBcjf7Z9yGjKyuQWicXFwMiRch39c1ZOIbogtn2I\n3MyxY7K18/rrcsQ/fjyQlgZcc43WkZE74mYuHoTTXRWekgshgNJSedE2MBBYtUqurLlnD/DMM/YV\nfk/JhRqYC/XoouefnZ3Ntg95lNpaoKAAWLwYaGwEUlLkxindu2sdGbm7praPo3TR9tm/X+CnnwAu\n+knu7NQpOSd/0SK5sNro0UBqKhATA6i0EgpRM4+Y7bNxI/C//yv7oUTuZudOYMECOdLv00cW/OXL\ngSuu0Doyotbpouf/0UfZ2LvXonUYmmM/U6H3XNhswPvvA4MHAyaTnI9fWipv0Lr/fnULv95z4UrM\nhXrLO+ii+D/+eDa8vExah0F0SfX1wLx5QFgY8MILcsbOd98BM2bIDVSInM1kMrl2bR9n8fLywq5d\nAiNGALt3axkJUeuOHpU3Ys2ZA9xyi7wDNyZG66jIyDyi5+/rCxw+rHUUROc7ehSYOxd4+WXgz38G\n1q8HIiK0jorIcbpo+1x9NXDkiJwTbWTsZyq0zsXx47KtExIi/yLdtAl45x1tCr/WudAT5kI9uhj5\nT5+eDW9vE+rrTejcWetoyMgaGoA33gByc4E//lGO9Hv10joqIoVHzfMXQiAgQM6NPmtLYCKXaWyU\nUzVzcoCePYF//hOIjtY6KqLWeUTPH5Ctn8OHWfzJtWw2oKgImDZNbnxeUADExmodFZHz6aLnDyjF\n38jYz1Q4Oxc2G7BsmVxJc948+WOx6LPw83OhYC7Uo5uRv6+vvOhL5EyNjUBhoezpX3WV3Dzlz3/m\n8gtkPLoo/tnZ2Th1yoTDh01ah6IpLmynUDsXp04BS5YAeXnADTfIkf7tt7tH0efnQsFceOAF34wM\n+Y8yM1PLaMjT1NfLdXdeekm2eJ59Frj1Vq2jInKcx6znz7YP+5lnczQXJ07Ilk5wsJyjv3IlUFLi\nnoWfnwsFc6Ee3RR/XvAlNZw+Dbz6KhAaCvznP8AnnwDvvQfcdJPWkRHpi27aPm+9BaxZI6faEbWV\nEHKXrKeektOFc3OB/v21jorIeTxmnj/bPtReX30FPPkkUFMDzJoFDBvmHhdyibTEto+OsJ+psCcX\nP/4o98a94w7grrvkl8Dw4Z5X+Pm5UDAX6mHxJ7dz8iTw4otykbXLLgN27QLS0wEfH60jI3Ifuuj5\nZ2VloVcvEzIyTKip0TIa0jObTV4Teu45oF8/ID8fCA/XOioi12qa55+Tk+NQz18XxV8Igfp6oGtX\nuZQu0dlsNmDFCrn+jq+vLPq33KJ1VETa8ph5/pdfLm+9P3VK60i0w36mwmKxoLEReOstuaTyrFmy\n6G/caLzCz8+FgrlQj25m+3h5KTN+/Py0joa0dOyYnJufkgIEBcntE91lKQYid6Gbtg8gN8UuLmYf\n16hqa2WhX7AAMJnknP2BA7WOikifPKbtA3DGj1F9+SXw4INA797Azz8DW7cCy5ez8BM5k66Kv9Fv\n9DJSP/PkSeDtt+VaO8OHy92z9uyRq20GBxsrF5fCXCiYC/XopucPcOTv6YQAvvhCXsR95x253s6k\nScCIEZyjT+RqTu3579y5E3PnzkVdXR3i4uIwduzY8wM4q281YYKc2fHYY86KiLSwb5/cNevtt+VU\n3uRk4IEH5AifiNrH0Z6/Sy74njlzBomJiVi2bNn5AZz1Bv7+d+CKK+Sa6+S+hAC++Qb48EM5a2f/\nfmDUKOBvf5PTNDlrh8hxLrngm5aWBj8/P0RGRrZ43Gw2Izw8HKGhocjPz7/gcz/88EP85S9/QWJi\n4iVfx+htH3fuZ/7wg7xI+9BDwPXXA/Hxsui/+KKcxfPqq7K/b2/hd+dcqI25UDAX6rGr+KempsJs\nNrd4zGazIT09HWazGeXl5SgsLERFRQUKCgqQmZmJ2tpaAEB8fDxKSkqwZMmSS76O0Yu/O/nuO9nG\neeQROTU3PFy5IWvtWtnqeeUVYPBgoGNHraMlonPZdcE3NjYWVqu1xWNlZWUICQlBUFAQACAxMRHF\nxcWYMmUKkpOTAQAbNmzA+++/j5MnT2Lw4MGXfB2jz/bR6/6kjY3Ajh3A5s3yZ9MmuVNWbKz8GT9e\nbpGoZpHXay60wFwomAv1tHu2T01NDQIDA5uPAwICUFpa2uJ3brvtNtx22212n5Mjf+0JIUf1n38O\nlJbKOffbtskNUm65RS6f/NxzQI8e7N0TubN2F38vFf/lp6SkICgoCAcOALt2+cJiiWr+hm/q8Rnh\n+Ox+prNfb9AgE378EVi2zIJvvwXOnDHh66+B0lILfHyAP/3JhIEDgREjLHj6aWDECOX5Bw4AYWHO\nje/cnOjhv49Wx9u3b0dGRoZu4tHyeM6cOYiKMm59WLx4MQA0d1wcYfdsH6vVivj4eOzYsQMAsHXr\nVmRnZzdfC5gxYwY6dOiAyZMnty2As65Y79sHDBkCVFW16RQew2KxNP9Hv5gTJ4CdO+UIvb5e/hw/\nDpw5I3+EkAvknTghf+rrgaNH5d2zhw/LC7AHDgC/+x0QEiJvsAoPl/36m24CrrvO+e/1UuzNhREw\nFwrmQqHZNo7R0dGorKyE1WpF9+7dUVRUhMLCwnadKzs7GyaTCX37mgzd9mntQ336tNzf+N13gbIy\noLpaFu2gIKBzZzk99rLLAG9v2Yrx8gI6dZKP+frKFVOvukr++PoC3bvLn9/+1qVvr034D1zBXCiY\nC2U9f0fZNfJPSkrChg0bUFdXh27dumHatGlITU1FSUkJMjIyYLPZMHbsWEydOrXtAZz17XXmjLzT\ns6GBM0QAZapkYSHwhz/IefKDBwOhocBvfqN1dESkJUdH/hAaAyCysrLE+vXrhRBCXHWVEIcOaRuT\nVppycPCgEJMmCdGlixDPPCNEZaW2cWmhKRfEXJyNuZA5yMrKEo6Wb10s7NbU9gGMPeOnsRF46SW5\ntPXx48DXX8sNTEJCtI6MiPTCZDIhOzvb4fPoYmG3puJvMpkMW/y3bAEmTTLBz0/Ope/RQ+uItMXe\nroK5UDAXLu75O9O5favbb5dr+wwZomFQLvTzz8DkyXIdnFmzgHvv5fx5Iro0j9rMBTBW26ekRN4Z\nKwRQXg74+VlY+H+lxsjGUzAXCuZCPWz7aODoUeDxx+Vm5IsWGeevHCJynMe2fZ56Sm7g/vTTGgbl\nRBUVcnnj2FjZ5uncWeuIiMgdse3jRlasAAYNkl9sCxaw8BORdlj8XUAI4J//BJ58Uvb509Iu/Hvs\nZyqYCwVzoWAu1KO7nr+nLesshJy9tHKlXCXz2mu1joiI3JnH9vxLSoC5c4Fz9o5xS0LIaxiffirX\n5rnmGq0jIiJPodnCbs7iKW0fIYDMTLnxybp1QJcuWkdERKTQXc/fU9o+8+fL7QzXrrW/8LOfqWAu\nFMyFgrlQjy5G/p42z3/dOnmBd/NmuYwyEZFaPLbn39AgNxk5eNA9p0Lu3QvExABLl8rll4mInMHj\n5vn/5jfAiBHAkiVaR9J29fVAQgKQlcXCT0T6prviDwAZGXLGz5kzWkfSNs8/D/TpA4wf377ns5+p\nYC4UzIWCuVCPLov/LbfI1s/q1VpHYr+dO4E335RLNnBxNiLSO931/Ju8845c9GztWg2CaiMhgKFD\ngfh44IkntI6GiIzAI3r+2dnZ5/05d889chG0HTu0iaktli0DfvoJeOwxrSMhIk9nsVhU2clLtyN/\nAJg+Hdi3D/j3v10cVBv88gvQs6ec3XPrrY6dy2KxcKeiXzEXCuZCwVwoPGLk35pHHgHefx/48Uet\nI2ndzJlyPX5HCz8RkSvpeuQPyE1PTp4E3njDhUHZ6dgxICgI2LqVm6wTkWt59MgfkK2fNWv0udDb\nv/8NmEws/ETkfnRf/H/3O1lkH3pIX2v+NDYCs2eru+MY5zArmAsFc6FgLtSj++IPyJ56QoJcJVMv\n3nsPuP56YOBArSMhImo73ff8mxw7BvTtK+/8HTHCBYFdhBBAdLRcxiEhQdtYiMiYPKLnf6F5/ufq\n3BlYuFAunXD0qGvias369cDx49p/CRGR8Rhinv+FjBsHXH458PLLTgzqEoYPB0aNkrGoiXOYFcyF\ngrlQMBcKjxj5t8XMmcDy5UBZmTavv28f8H//B/ztb9q8PhGRGtxu5A/IdX9eeEEWYR8fJwXWiqlT\ngVOn5AJuRERacXTk75bFXwjgzjuBO+5Qd6rlpTQ0yBk+FgsQHu661yUiOpfh2j6AXDJ5/nwgPx+Y\nMsV1yz8UF8ui76zCzznMCuZCwVwomAv1uGXxB4DgYODzz+XMn/BweQ+As78EXn9drjdEROTu3LLt\nc67aWiAvDygqkheEH3hA/Q1V9uyRe/NWVwOdOql7biKittJ926e+vh79+/fHqlWrnPYa3bvLqZ8l\nJcCcOUBcHFBVpe5rLFgAPPggCz8ReQanF/+ZM2dizJgxzn4ZAMBNN8kpoEOGyDtwn3tObqruqFOn\n5IbyDz/s+Lkuhv1MBXOhYC4UzIV67Cr+aWlp8PPzQ2RkZIvHzWYzwsPDERoaivz8/POet2bNGkRE\nRKBr167qRGsHHx9g8mRg+3bZqgkPl1NDHeksLVwI9OsHhIaqFycRkZbs6vlv3LgRnTt3xgMPPIAd\nv+6raLPZEBYWhrVr18Lf3x/9+/dHYWEhPvvsM2zbtg1PP/005s+fj/r6epSXl+Oyyy7DBx98AK9z\nmvFq9PwvZtMmID1dtobefBO47rq2Pb++Xhb9jz6Sf1kQEemBy+b5W61WxMfHNxf/LVu2ICcnB+Zf\nF9rPy8sDAEyZMuW85y5ZsgRdu3bF8OHDVX8D9mhoAJ5/Xs7WmTdP7g9sr+nT5T7CS5c6Lz4iorZy\ntHZ6t/eJNTU1CAwMbD4OCAhAaWnpBX/3wQcfvOi5UlJSEBQUBADw9fVFVFRU8/odTT0+R4+nTTNh\nxAjg7rstWLgQ+OADE37724s/v64OmDnTgvnzAUDdeC50fHY/0xnnd6fjpsf0Eo+Wx9u3b0dGRoZu\n4tHyeM6cOU6pD+5wbLFYsHjxYgBorpcOEXaqqqoSvXv3bj5+7733xLhx45qPCwoKRHp6ur2na9aG\nEFTxyy9C3HuvEDffLMS33178d598Uojx410TlxBCrF+/3nUvpnPMhYK5UDAXCkdrZ7tn+/j7+6O6\nurr5uLq6GgEBAe06lz1LOqulc2fZwklKkhuxrFlz4d/77jtg8WI5Y8hVmr7tibk4G3OhYC40WNL5\n3J5/Y2MjwsLCsG7dOnTv3h0DBgxAYWEhevbs2bYAXNDzb82nnwKpqUBUFDBjBhARITeLX7xY3iyW\nkuLa4k9EZC+X3OSVlJSEmJgY7N69G4GBgVi0aBG8vb0xb948xMXFISIiAmPGjGlz4W/iypH/2W6/\nHdi1Cxg0SG7EPno0cOONwKpVwFtvub7wa5EDvWIuFMyFgrkw8GYuznL4MFBQIL8QevfWJgYLN6po\nxlwomAsFc6Ew5JLORERGp/u1feyhVduHiMjdsO3jgfgnrYK5UDAXCuZC4REjfyIici1djPyzsrJg\nMpn4jU5EdAkWiwUWiwU5OTm84EtEZDRs+3gQXvRWMBcK5kLBXKiHxZ+IyIB00fZhz5+IyD7s+RMR\nGRh7/h6E/UwFc6FgLhTMhXpY/ImIDEgXbR/2/ImI7MOePxGRgbHn70HYz1QwFwrmQsFcqIfFn4jI\ngNj2ISJyQ2z7EBFRm+mi+HMzF4k5UDAXCuZCwVyot5mLt+OhOE6NN0JEZARN0+JzcnIcOg97/kRE\nbog9fyIiajMWfx1hP1PBXCiYCwVzoR4WfyIiA2LPn4jIDXlEz59TPYmI7KPWVE+O/HXEYrFwZdNf\nMRcK5kLBXCg8YuRPRESuxZE/EZEb4sifiIjajMVfR3jRW8FcKJgLBXOhHhZ/IiIDYs+fiMgN6brn\nb7FYEBsY8tEAAAAGIElEQVQbi/Hjx2PDhg3OfCkiImoDpxb/Dh064Morr8SpU6cQEBDgzJfyCOxn\nKpgLBXOhYC7UY1fxT0tLg5+fHyIjI1s8bjabER4ejtDQUOTn55/3vNjYWKxevRp5eXnIyspSJ2IP\ntn37dq1D0A3mQsFcKJgL9dhV/FNTU2E2m1s8ZrPZkJ6eDrPZjPLychQWFqKiogIFBQXIzMxEbW0t\nvLy8AAC+vr44deqU+tF7mCNHjmgdgm4wFwrmQsFcqMeunbxiY2NhtVpbPFZWVoaQkBAEBQUBABIT\nE1FcXIwpU6YgOTkZAPDBBx/g448/xpEjR/D444+rGjgREbVfu7dxrKmpQWBgYPNxQEAASktLW/zO\nyJEjMXLkyPZHZzDnfsEaGXOhYC4UzIV62l38m1o6jgoODlbtXJ5gyZIlWoegG8yFgrlQMBdScHCw\nQ89vd/H39/dHdXV183F1dXW7ZvTs2bOnvSEQEVE7tXuqZ3R0NCorK2G1WtHQ0ICioiIkJCSoGRsR\nETmJXcU/KSkJMTEx2L17NwIDA7Fo0SJ4e3tj3rx5iIuLQ0REBMaMGYOePXs6O14iIlKD0FBJSYkI\nCwsTISEhIi8vT8tQXO67774TJpNJREREiF69eom5c+cKIYSoq6sTd9xxhwgNDRVDhw4Vhw8f1jhS\n12lsbBRRUVFixIgRQgjj5uLw4cNi9OjRIjw8XPTs2VNs3brVsLnIzc0VERERonfv3iIpKUmcPHnS\nMLlITU0V3bp1E717925+7GLvPTc3V4SEhIiwsDDx8ccfX/L8mi3s1tp9Akbh4+OD2bNn45tvvsHW\nrVvxyiuvoKKiAnl5eRg6dCh2796NIUOGIC8vT+tQXWbu3LmIiIhongBg1Fw88cQTGD58OCoqKvDV\nV18hPDzckLmwWq144403sG3bNuzYsQM2mw1Lly41TC4udH9Va++9vLwcRUVFKC8vh9lsxoQJE3Dm\nzJmLv4BTvrLssHnzZhEXF9d8PGPGDDFjxgytwtHcX//6V7FmzRoRFhYmvv/+eyGEEAcOHBBhYWEa\nR+Ya1dXVYsiQIeLTTz9tHvkbMRdHjhwRN95443mPGzEXdXV1okePHuLQoUPi9OnTYsSIEeKTTz4x\nVC6qqqpajPxbe++5ubktuidxcXFiy5YtFz23ZiP/C90nUFNTo1U4mrJarfjiiy8wcOBA/PDDD/Dz\n8wMA+Pn54YcfftA4OtfIzMzECy+8gA4dlI+kEXNRVVWFrl27IjU1FTfddBMeeugh1NfXGzIXXbp0\nwaRJk3D99deje/fu8PX1xdChQw2Ziyatvffa2toWsy3tqaeaFX/O7ZeOHTuG0aNHY+7cubjyyitb\n/H9eXl6GyNNHH32Ebt26oV+/fq0uUWuUXDQ2NmLbtm2YMGECtm3bhiuuuOK8toZRcrF3717MmTMH\nVqsVtbW1OHbsGN5+++0Wv2OUXFzIpd77pfKiWfFX6z4Bd3b69GmMHj0aycnJuOuuuwDIb/Pvv/8e\nAHDgwAF069ZNyxBdYvPmzVi5ciVuvPFGJCUl4dNPP0VycrIhcxEQEICAgAD0798fAHD33Xdj27Zt\nuPbaaw2Xi88++wwxMTH4/e9/D29vb4waNQpbtmwxZC6atPZv4tx6un//fvj7+1/0XJoVf6PfJyCE\nwNixYxEREYGMjIzmxxMSEprvYFyyZEnzl4Iny83NRXV1NaqqqrB06VLcfvvtKCgoMGQurr32WgQG\nBmL37t0AgLVr16JXr16Ij483XC7Cw8OxdetWnDhxAkIIrF27FhEREYbMRZPW/k0kJCRg6dKlaGho\nQFVVFSorKzFgwICLn0ztCxRtsXr1atGjRw8RHBwscnNztQzF5TZu3Ci8vLxE3759RVRUlIiKihIl\nJSWirq5ODBkyxOOnsbXGYrGI+Ph4IYQwbC62b98uoqOjRZ8+fcTIkSPFkSNHDJuL/Pz85qmeDzzw\ngGhoaDBMLhITE8V1110nfHx8REBAgFi4cOFF3/v06dNFcHCwCAsLE2az+ZLn13wbRyIicj1u4E5E\nZEAs/kREBsTiT0RkQCz+REQGxOJPRGRALP5ERAbE4k9EZEAs/kREBvT/2bmwxQ4sH/cAAAAASUVO\nRK5CYII=\n", "text": [ "" ] } ], "prompt_number": 25 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's see how the Series, hx, looks:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "hx.head()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "0.5 0.005293\n", "1.5 0.000373\n", "2.5 0.000252\n", "3.5 0.000195\n", "4.5 0.000148\n", "Name: l(x), dtype: float64" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "hx.tail()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "96.5 0.278915\n", "97.5 0.305262\n", "98.5 0.333056\n", "99.5 0.362194\n", "100.5 NaN\n", "Name: l(x), dtype: float64" ] } ], "prompt_number": 27 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Fit a line for those over 30 years old." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import statsmodels.api as sm\n", "\n", "# more munging\n", "loghx = pd.Series(np.log(hx), name='loghx')[30:-1]\n", "am30 = pd.Series(hx.index, index=hx.index, name='am30')[30:-1] - 30.0\n", "\n", "# model fit\n", "model = sm.OLS(loghx, sm.add_constant(am30))\n", "result = model.fit()\n", "print result.params\n", "print 'R^2 : {:6.4f}'.format(result.rsquared)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "const -7.308683\n", "am30 0.087695\n", "dtype: float64\n", "R^2 : 0.9949\n" ] } ], "prompt_number": 28 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Finally, we graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# predicted value\n", "pred = model.predict(result.params).astype(np.float64, copy=False)\n", "hf = pd.Series(np.exp(pred), index=am30.index, name='hf')\n", "\n", "# plot\n", "hx.plot(logy=True)\n", "hf.plot(logy=True)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEDCAYAAADdpATdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlcVHX3wPEPCLmkRZZSCoUJgihppvXLXCbNhzS13NEe\nSlwyDRU1U1uYgVzAnty10h6XqHBJzUyZchsec8FKTRMUUSgUs8QtcUHg/v64wdVyYRnmDjPn/Xrx\nx70y9545TWe+nPu93+uiKIqCEEIIp+KqdwBCCCFsT4q/EEI4ISn+QgjhhKT4CyGEE5LiL4QQTkiK\nvxBCOCEp/kII4YSk+AshhBNyK8+D5+TkMGzYMCpXrozBYKBfv37leTohhBDFVK4j/1WrVtG7d2/m\nz5/PV199VZ6nEkIIUQIlLv4DBgzA09OToKCg6/abzWYCAgLw8/MjNjYWgOPHj+Pt7Q1ApUqVrBCu\nEEIIayhx8Q8LC8NsNl+3Lz8/n/DwcMxmM8nJycTHx5OSkoKXlxeZmZkAFBQUWCdiIYQQZVbi4t+6\ndWvuueee6/bt2rULX19ffHx8cHd3JyQkhDVr1tC9e3dWrlzJsGHD6Nq1q9WCFkIIUTZWueB7bXsH\nwMvLi6SkJKpVq8bChQutcQohhBBWZJXi7+LiUurX1q1bl6ysLGuEIYQQTqN+/fqkpaWV+vVWme1T\nt27dot4+QGZmJl5eXsV6bVZWFoqiyI+iYDQadY/BXn4kF5ILycWtf44cOVKmum2V4t+8eXMOHz5M\nRkYGubm5LFu2rEQ9fpPJhMVisUYoFVpGRobeIdgNyYVGcqGRXIDFYsFkMpX5OCUu/n379qVly5ak\npqbi7e3NokWLcHNzY86cOQQHBxMYGEifPn1o2LBhsY9pMpkwGAwlDUUIIZyOwWCwSvEvcc8/Pj7+\nhvs7duxIx44dSxVEYfF39i+A/v376x2C3ZBcaCQXGsmFOvK3RqfERVEUXZ/h6+Ligs4hCCFEhVPW\n2ikLu9kRue6hkVxoJBcayYX12EXxlwu+QghRPNa64CttHyGEqICk7SOEEKLE7KL4S9tHJTnQSC40\nkguN5MJ6bZ9yfZhLcVnjjQghhDMonBYfFRVVpuNIz18IISogh+j5S9tHCCGKR2b7OCCLxeL0dzkX\nklxoJBcayYXGIUb+QgghbEtG/kIIUQE5xMhfev5CCFE80vN3QNLP1EguNJILjeQCfvoJHnkEXF0d\nYOQvhBDi5hQFtmyB9u3hhRfg99/LfkwZ+QshhJ1SFNiwAaKi4NQpePNN6NcP3N3LXjvt4g5fIYQQ\nGkWBTZvAaITTp+Gdd6BPH6hUyXrnkLaPHZGL3hrJhUZyoXGGXHz3HRgM8NprMGwY/PyzOtqvVAmu\n5l9l2c/LKFAKynweuxj5y2MchRDObs8eeOstSE4Gkwn+/W9w+6tC5xXk8fn+z4lOjOauE3fxg8sP\nZT6f9PyFEEJHR47A229DYqJa/AcNgsqV1X/LL8hn2YFlRCVG4XmnJ9FPR2PwMQDS8xdCiArpjz/g\n3Xfhs88gIgIWLIDq1dV/K1AK+CL5C0wWEx5VPJjbaS7t67XHxcXFaueXnr8dcYZ+ZnFJLjSSC40j\n5OLSJYiJgYYN1e2UFPWCbvXqatFflbKKJh824f0d7zM9eDrbBmzjmYefsWrhBxn5CyGETSgKxMfD\nhAnw2GOwYwf4+RX+m8La1LUYLUZcXVyZ0n4Kz/k9Z/WCfy3p+QshRDnbuRNGjYKrV2HaNGjTRt2v\nKAoJaQlEbonkasFVog3RdPXvWqyiLz1/IYSwU8ePw7hx6t25kydDaCi4uqpF/9sj32K0GLmQewGT\nwUT3ht1xdbFdJ156/nbEEfqZ1iK50EguNBUlF1euwJQp0KQJPPQQHDoEL78MLi4Km45uotWiVkR8\nE0HE/0Wwb+g+egb2tGnhBzsZ+cs8fyGEo1i3DkaOhMaNISkJ6tdX9ydmJGK0GMn6MwuTwUSfRn2o\n5FryW3YtFotVvgSl5y+EEFZw9Kg6ZfPQIZg1C4KD1f3bft2G0WIk/Ww6kW0iefGRF3FzLfu4W3r+\nQgiho8uXITYWZs+G11+HFSvUm7SSjiURaYnk0KlDvNPmHV5q8hLuldz1DreI9PztSEXpZ9qC5EIj\nudDYWy6+/RaCgmDfPti9G8aPh5+zf6Tz553ptaIX3QO6kzo8lYHNBtpV4QcZ+QshRIllZaktnh9+\ngDlzoFMn2PvbXkYsNfF91vdMaDWBlb1XUtmtst6h3pT0/IUQopjy82HePHV9/VdfVdfiOfLnz5gs\nJrZlbmP8U+N55bFXqOpetdxjkZ6/EELYwJ498MorUK0abN0K3JdC2LooLBkWxrYcyyfdPqGaezW9\nwyw26fnbEXvrZ+pJcqGRXGj0yEVODowdq87eGToU5q9MZVLKv2m7uC2P3v8oaSPSGNNyTIUq/FDO\nxT89PZ1BgwbRq1ev8jyNEEKUi8ILullZ8PW2I/yvZn9aLXqKhvc1JG1EGuNajaP6HdX1DrNUbNLz\n79WrFytWrLhxANLzF0LYmVOnYPRotb1jmpHBVpeJrD64muGPD2fU/43i7ip36x1imWtnsUb+AwYM\nwNPTk6CgoOv2m81mAgIC8PPzIzY2ttRBCCGEPVAU+Pxz9e7cO2r9SrtprzL60GPcX/1+Dg8/jMlg\nsovCbw3FKv5hYWGYzebr9uXn5xMeHo7ZbCY5OZn4+HhSUlKIi4tj1KhRZGVllUvAjkx6uxrJhUZy\noSnPXBw7Bl26QPT047SeEs5qz0epXeMeDoUfYmK7idSsWrPczq2HYhX/1q1bc88991y3b9euXfj6\n+uLj44O7uzshISGsWbOG0NBQpk+fTp06dTh9+jSvvvoqe/fulb8MhBB2SVFg/nxo8tRvnGoRwe89\ng/CpW5WU11KY8swU7qt2n94hlotST/U8fvw43t7eRdteXl4kJSVd9zs1a9bkww8/vO2x+vfvj4+P\nDwAeHh40bdq0aJG3wm96Z9g2GAx2FY9s2892IXuJR6/twn3WOt7nn1uYMuMMfzTcRt6QhXi5teON\noI/p/q/udvF+r922WCwsXrwYoKhelkWxL/hmZGTQpUsX9u/fD8DKlSsxm80sWLAAgE8//ZSkpCRm\nz55dsgDkgq8QwsYKCiB29imiN72H62Mf0795P95qM4E6NeroHVqx2eSC743UrVuXzMzMou3MzEy8\nvLxKdSyTyfSPEY4zkhxoJBcayYXGGrn4Mfk0PgPeIvJ3f17oc56DEXuZ+9zsClP4LRYLJpOpzMcp\nddunefPmHD58mIyMDOrUqcOyZcuIj48v1bGs8UaEEOJWzl46x4tzppOQPYcWDV5gy5AfqX+vj95h\nlZjhrxZxVFRUmY5TrLZP3759SUxMJDs7m9q1axMdHU1YWBgJCQlERESQn5/PwIEDmTBhQskDkLaP\nEKIcnb9ynuhvZjFr10zu/q0znw55m+AW9fUOq8zKWjvtYmE3o9FY9G0mhBDWcCH3ArN2zmGKZRq5\nKcGEB71D7BsNcKvgK5pZ/nqSV1RUVMUv/jLyV107i8HZSS40kgtNcXJx8epF5n0/jymJ76GkP43/\nCSNx0xvi62ubGG3FIVb1lGf4CiHK6tLVS3z040fEfBdLjTOtcFm9ienjGxMaCi4uekdnPYUj/7KS\nkb8QokK7nHeZBT8uIGZbDF4uLfhlSRSdmjVh6lS4zzHvzwIcZOQvhBAllZufy8I9C5m0dRINajTl\n4Z1f8ce+x1j6IUgT4fbsYj1/meevkhxoJBcayYXGYrFwNf8qH+/+mAazG7Ay+Us6nv+Cn8av5blm\nj7Fvn+MXft3n+VuTzPMXQtxOXkEe5jQzA/cNpJ5HPQZ6fM5/I1ty12Pw44/w0EN6R2gbNp3nX56k\n5y+EuJX8gnzif44nKjGKujXq0u+BaD55tw05OTB9uuOP9G/GIeb5S/EXQvxdgVLA8gPLiUqM4t6q\n9zKqybusm/s0CQnw7rsQFgaVKukdpX50W9vHmqTnr5IcaCQXGmfLRYFSwBfJX/DIB48wM2km7z8z\ni66ntjIk+GlyciwcPAiDBjlv4ZeevxDCoSiKwppDazBajNxR6Q6mdpgKhzsyqqsL9evD9u3qs3Tv\ndowHaZWa9PyFEA5BURS+Tv0aU6KJAqWAaEM0Aa6dGT3ahYMHYcYMeO45vaO0PzLPXwhRISmKgjnN\njNFi5HLeZaIMUfzrwReYMsWFsA9h7Fj44guoXFnvSB2T9PztiORAI7nQOFouFEVh49GNPLXwKcZ8\nO4bXW77OniF7yT/QjcBAF44ehZ9+gnHj/ln4HS0XpSE9fyFEhWPJsBC5JZKTOScxtTXRu1Fv9u+r\nRLun4exZWLLEeaduFpf0/IUQFcbWX7ZitBj59dyvRLaNpF9QP37/zQ2jEdauhago557BUxrS8xdC\n2K0dmTswWoyknU7jnTbvENoklD/PufH2m7BgAQwcCAcPgoeH3pE6H7vo+QuV9DM1kgtNRczF98e/\np9NnnQhZGUKvwF4cDD9IL78wpsa40aABnDkD+/bB1KklK/wVMRf2Skb+Qgir2X1iN0aLkb2/7eXN\nVm+yus9qlLzKfDAHYmLUfv62bdCggd6RCrvo+ctjHIWo2Pad3IfRYiTpWBITWk1g8GODqVypCkuX\nwoQJEBQEEydCkyZ6R1rxyWMchRC6O/D7AUyJJrb+spU3nnqDoc2HUtW9Kt99B2PGQH4+TJsGbdro\nHanjcYi1fYRK+pkayYXGHnNx8NRB+q3sR7tP2vF4ncc5MuIIo58czbGMqvToAf36wfDhsGuXdQu/\nPeaiopLiL4QotrTTaby0+iXaLGpDUO0g0oanMfapsVw8dycjRsCTT0KLFnDoEPz73+AqFcZuSdtH\nCHFbR88cZeL/JvLVoa8Y+cRIRv7fSO6qfBfnzqltnTlz1NF+ZCTUqqV3tM5B5vkLIcrNL2d/YdLW\nSaxKWcVrLV4jbUQaHlU8uHgR3ntP/enUCX74AerV0ztaURLyR5kdkX6mRnKh0SMXx84fY9i6YTSb\n34z7qt1H6vBUop6OoqqLB7Nng68vJCXBli2weLHtCr98LqzHLkb+JpNJpnoKYQey/swi5rsYPt33\nKYObDeZQ+CHuq3Yfly7B3LkQG6tO11y3Dh59VO9onVPhVM+ykp6/EIKTF04Suy2WxXsX079pf8Y9\nNQ7P6p5cuAAffQTvv69eyH3rLXj8cb2jFSBTPYUQZfBHzh+8seENAucFkl+Qz4FhB5gWPI1Klz0x\nGtV2TlISrF8Pa9ZI4XckUvztiPQzNZILTXnkIvtiNm9uepOAuQHk5Obw06s/MbPjTPLPPcDIkery\nCydOqEsxLF8OTZtaPYRSkc+F9UjxF8KJnLl0hsgtkTSY04BTF0+xZ8ge5j43l7zTXrz6KjzyCLi7\nw4EDMH++rMHjyKTnL4QTOHf5HDOTZjIraRbP+z/P223ept499Th2TF1zZ8UKGDIERo2SefoVhczz\nF0Lc1J9X/mT2rtlM3zmdjr4d2TloJ741fTl5EiIiIC4OBg+G1FS49169oxW2JG0fOyL9TI3kQlOa\nXFzIvUDsd7H4zvblwB8H+C7sOz7p9gn3ufry9tsQGAiKorZ3YmIqTuGXz4X1lPvIf82aNaxbt47z\n588zcOBAOnToUN6nFMJpXbx6kQ++/4D3tr9HW5+2bHl5C4G1AsnJUefo/+c/0KUL7N4NDz2kd7RC\nTzbr+Z89e5bXX3+djz/++PoApOcvRJldzrvMRz98RMy2GJ70epIoQxRBnkFcvAgffqg+MatNG/VZ\nuQ0b6h2tsAabzfMfMGAAnp6eBAUFXbffbDYTEBCAn58fsbGxN339xIkTCQ8PL3WgQoh/upJ3hXnf\nz8N3li+bMzazvt96VvVZhU+1IP7zH3UZhm3b4Ntv1SmbUvhFoWIX/7CwMMxm83X78vPzCQ8Px2w2\nk5ycTHx8PCkpKcTFxTFq1CiysrJQFIVx48bRsWNHmtrLZGE7Jf1MjeRCc6Nc5Obn8tEPH9FgTgPW\nHV7HlyFfsiZkDd7ujxIZCQ8/rC62tn49rFypTuF0BPK5sJ5i9/xbt25NRkbGdft27dqFr68vPj4+\nAISEhLBmzRrGjx9PaGgoALNmzWLTpk2cP3+etLQ0hgwZYrXghXA2V/OvErcvjnf/9y7+9/qzvOdy\nnvB6grQ0GDYMli6Fnj1hxw511C/EzZTpgu/x48fx9vYu2vby8iIpKem63xkxYgQjRowoy2mchixs\np5FcaAwGA3kFeXy+/3OiE6N5yOMh4rrF0erBVmzfDj1Gwv/+B6++Cikp4Ompd8TlRz4X1lOm4u/i\n4mKVIPr371/014OHhwdNmzYt+o9c+GeebMu2M25v2ryJzembWXFxBQ/UeIDw2uEE1WrKiaRWPNkH\nfvnFQs+ekJFh4M471denpNhP/LJtvW2LxcLixYsBiuplmSglkJ6erjRu3Lhoe8eOHUpwcHDR9uTJ\nk5WYmJiSHFIpYQgObcuWLXqHYDecPRf5BfnK0v1LlYA5AUqjsY2UjUc2KufOFSgzZiiKj4+itGyp\nKCtXKkpent6R2pazfy6uVdbaWaaRf/PmzTl8+DAZGRnUqVOHZcuWER8fX+LjyHr+QqgKlAJWp6zG\nlGiiqltVZgTP4PyeO9gw/2n6fAzt2ql9/See0DtSoReLrdfz79u3L4mJiWRnZ1O7dm2io6MJCwsj\nISGBiIgI8vPzGThwIBMmTChZADLPXwgURWFt6lqMFiOuLq5EG6LxvtyJadNc+OorCA1Vl2OQRyWK\nQmWtnXaxsJvRaJSRv3BKiqKQkJZA5JZI8gryMLY1Uf3Y87z/vgs//QQjRqgLrtWsqXekwl4Ujvyj\noqIqfvGXkb/KYrHIF+BfHD0XiqLw7ZFvMVqMXMi9wDutori0pxszZ7iSmwuvvw79+kHlyo6fi5KQ\nXGhkVU8hKhBFUdicvhmjxUj2pWxGNzPx26ZejHrWlUaNYNIkePZZcJUlF0U5s4uRv7R9hDNIzEgk\n0hLJiT9P8G8vI2lrQli7phLduqnr6P9t5RQhbkjaPkJUENt+3YbRYuTo6XQMLpHs++xFsv9wY+hQ\nGDAA7rtP7whFRSQPcHcg1pi+5SgcIRdJx5J49tNn6Rn/Ilf3hHD63YOcsbxMtMmNtDR4443iFX5H\nyIW1SC6sxy56/jLPXziSH7N+ZJzZyA+Z+6jy/VtUPxzGv166g/j9UKeO3tGJis7m8/zLi4uLC8eO\nKfzxB8iin6Ii2/XrXl77wsT+7O9x/e5N+vgNYlBYZVq2BCuthCJEEYeY7bN1K3z5pXrnohAVzVc7\n9zPm6yiO5m6n/olxzOsQT583q3LnnXpHJsTN2UXP/+uvTRw5YtE7DN1JP1Nj77nIz4dZn6dQ+7U+\ndFvdgQdpyb5BaaTGjWTAS9Yt/PaeC1uSXKg5MJlMZT6OXRT/4cNNuLgY9A5DiNvKyYF3Zh7CY+CL\njPm5LR0aNePUO0fYNHE0jRpU0zs84QQMBoNVir9d9PwPHVLo3BlSU/WMRIibO38eouekMXf/uxT4\nrqe/fwT/6TWCGpVr6B2acFIO0fP38IAzZ/SOQoh/On8eTDMymPfzRAj4kkHdhzOxy2E8qnjoHZoQ\nZWIXbZ977oGzZ8HZ7/WSfqZG71xcvAhvxv6K58BXmZv3GAN6P8CJCYeZ08to88Kvdy7sieTCeuxi\n5D9pkgk3NwM5OQaqV9c7GuHMcnPhvQ+PM/m7KeT5xxPa4xViu6Zyb7V79Q5NCMDB5vkrioKXl/rQ\n6WseCSyEzeTlwezFJzBuiOFygzh6+w1gWo83qH1nbb1DE+KGHKLnD2rr58wZKf7CtvLzYf5nv/PW\n+qn86buIbsEvMat3MvdXv1/v0IQoV3bR8wet+Dsz6WdqyjsX+fnw8een8HxxHCMONaRNuytkvLGf\n5QOm213hl8+FRnJhPXYz8vfwUC/6ClGe8vJgwaeneWf9+5zz+5BnDX2YG7KXBz3kT07hXOyi+JtM\nJq5cMXDmjEHvUHQlC9tprJ2LK1fgg0VnifpmOhcazeVf7bozt89ufO55yKrnKQ/yudBILhzwgm9E\nBDz0kPpQCyGsJScHZn10nilbZnK5yUzaP9iZub0iefieh/UOTYgycZj1/KXtI/3Ma5U1F5cuQcy0\nC9zfM4aoM760fj6Vn0dtJ+GVxRWu8MvnQiO5sB67aPuAesH3yBG9oxAV3dWrMG/BRSLXzuPKY+/x\ndM92TOuaSMNaDfUOTQi7Yjdtn08+gQ0bIC5Oz2hERaUosGrtJYb+9yPONY6l1YOtmNnNSOPajfUO\nTYhy4TDz/KXtI0rr+z2X+fe0jzlaZwot2j3OvN5mmj7QRO+whLBrdtPzl3n+0s+8VnFykXniCq1H\nfcD/febHHQ2/YVv4WraPXO1whV8+FxrJhfXYzchfir8orj9zrjJg1mJWnZqI952BfPPiSp5p+Lje\nYQlRodhFz99oNNKokYGICAPHj+sZjbBnV67m8dpHn7I4PZp7eJhZ3aLp26ql3mEJYVOF8/yjoqLK\n1PO3i+KvKAo5OVCrlrqUrhDXyr2az5jF8cw/FE2Vq3WZ3CGK1zq30TssIXTlMPP8q1VTb72/ckXv\nSPQj/UyNxWLhSm4+w+Ytpcb4Riz5+SOMj33I2RlbnK7wy+dCI7mwHrvp+bu4aDN+PD31jkbo6fyf\nBcQstbD5y9eo4noX77aczevdn8HV1UXv0IRwGHbT9gHw94c1ayAgQM+IhF6OH1cY+cGXfHnOyF3V\nqhDZOpqRzwXj4iJFX4i/c5h5/iAzfpzV3r0Koz/6mv+5GrnnHvig52QGtXlOir4Q5chuev4gN3o5\nUz/z8mWIi1No9HwCjy94gpQH3mJh/3f4/d0fGdy2M4mJiXqHaDec6XNxO5IL65GRv7AZRYE9e2DJ\nJwqLEzfh0i6Su548x+KORkIe6Ymri12NRYRwaOXa8z948CAzZ84kOzub4OBgBg4c+M8ArulbDRsG\njRrBa6+VV0RCD0ePwvLl8OmnkF3Dwh3BkVS66ySTOpjo3ag3lVwr6R2iEBVOWXv+NrngW1BQQEhI\nCMuXL/9nANe8gTffhDvvhLfeKu+IRHlSFDhwANauhS++gGPH4Mk+35HpG8k5fsXY1kjfoL64udrV\nH55CVCg2mec/YMAAPD09CQoKum6/2WwmICAAPz8/YmNjb/jatWvX8txzzxESEnLb8zh726ci9zNP\nnoQVK2DwYHjwQejSRS36YZE7eOS9f/FT/VDCW4dyMPwgoU1Cb1v4K3IurE1yoZFcWE+xin9YWBhm\ns/m6ffn5+YSHh2M2m0lOTiY+Pp6UlBTi4uIYNWoUWVlZAHTp0oWEhASWLFly2/M4e/GvSH79VW3j\nDBmiTs0NCIBPPlHbdhs3wrKt35PeshPvpYfQu1EvUsNTCXs0TEb7QtiJYv2f2Lp1azIyMq7bt2vX\nLnx9ffHx8QEgJCSENWvWMH78eEJDQwFITExk1apVXL58maeffvq253H22T72+nzSvDzYvx+2b1d/\ntm1Tn5TVurX6M3QoBAVBpUqw+8RuXrcY2XNiD2+2fpPVfVZT2a1yic9pr7nQg+RCI7mwnlIPw44f\nP463t3fRtpeXF0lJSdf9Ttu2bWnbtm2xjykjf/0pijqq//FHSEqCnTth927w9oannoJnnoHISGjQ\nQL0ru9BPv/2EKdFE0rEkJrSawIpeK6jiVkW/NyKEuKVSF39r3oDTv39/fHx8OHECDh3ywGJpWvQN\nX9jjc4bta/uZ5X2+Nm0M/P47LF9u4ZdfoKDAwM8/Q1KSBXd3ePJJA088AZ07Wxg7Fjp31l5/4gT4\n+6vbi1YvYvFPizlU/RDjnhrHkHuHUOVSlaLCX9r4/p4Te/jvo9f23r17iYiIsJt49NyeMWMGTZs6\nb31YvHgxQFHHpSyKPdsnIyODLl26sH//fgB27tyJyWQquhYwZcoUXF1dGTduXMkCuOaK9dGj0L49\npKeX6BAOw2KxFP1Hv5VLl+DgQXWEnpOj/ly8CAUF6o+iqAvkXbqk/uTkwPnzcO6c+pdVVhacOAF3\n3QW+vtCwodqzb9QImjWDBx64fawHTx0kKjGKzembef3J1xnWYhh33nFn2ZPwl+LmwhlILjSSC41u\nyzs0b96cw4cPk5GRQZ06dVi2bBnx8fGlOpbJZMJgMNCkicGp2z43+1Bfvao+3/jzz2HXLsjMVIu2\njw9Ur65Oj61aFdzc1FaMiwtUrqzu8/BQV0y9+271x8MD6tRRf6qUoiuTdjqN6MRoEtISGP1/o5nf\neT41Ktco0/u+EfkfXCO50EgutPX8y6pYI/++ffuSmJhIdnY2tWvXJjo6mrCwMBISEoiIiCA/P5+B\nAwcyYcKEkgdwzbdXQQG4u0Nurnrx0NkdOwb/+Q/Ex8PDD8OLL8LTT4OfH9xxh21jOXrmKBP/N5G1\nqWsZ/vhwIv4vgrsq32XbIIQQRco68kfRGaAYjUZly5YtiqIoyt13K8rp0/rGpJfCHJw6pShjxihK\nzZqK8sYbinL4sH4xZZzJUAZ/NVi5N/ZeJXJzpHLm0hmbnLcwF0JycS3JhZoDo9GolLV828ViKoVt\nH3DuGT95efD+++rS1hcvws8/Q2ys2uKxtcxzmQz9eijN5jejVrVapA5PJerpKDyqeNg+GCFEEYPB\ngMlkKvNx7OKOm8LibzAYnLb479gBY8YY8PRU59I3aKBfLPkF+fzr03/xvP/zHAo/xH3V7rN5DNLb\n1UguNJILG/f8y9Pf+1bt2qlr+7Rvr2NQNnTuHIwbp66DM20a9O59/fx5vVzNv4p7JXe9wxBC3ITD\nPMO3kDON/BMS1DtjFQWSk8HT02IXhR/QvfBbY2TjKCQXGsmF9UjbRwfnz8Pw4bB1Kyxa5Dx/5Qgh\nys5h2z6vv64+wH3sWB2DKkcpKdC9u7omzrRp6jx9IYQoKWn7VCArV0KbNuoX2/z5UviFEPqR4m8D\nigLvvgujR6t9/gEDbvx70s/USC40kguN5MJ67K7n72jLOiuKOnvpq6/UVTLvv1/viIQQFZnD9vwT\nEmDmTPj7kSyrAAAN+klEQVTbs2MqJEVRr2Fs3qyuzXOf7afLCyEclG4Lu5UXR2n7KAqMGqU++GTT\nJqhZU++IhBBCY3c9f0dp+8ybpz7OcOPG4hd+6WdqJBcayYVGcmE9djHyd7R5/ps2qRd4t29Xl1EW\nQghrcdief26u+pCRU6cq5lTII0egZUtYulRdflkIIcqDw83zv+MO6NwZlizRO5KSy8mBrl3BaJTC\nL4Swb3ZX/AEiItQZPwUFekdSMhMnwiOPwNChpXu99DM1kguN5EIjubAeuyz+Tz2ltn7Wr9c7kuI7\neBA+/lhdssFeFmcTQoibsbuef6HPPlMXPdu4UYegSkhRoEMH6NIFRo7UOxohhDNwiJ6/yWT6x59z\nvXqpi6Dt369PTCWxfDn88Qe89prekQghHJ3FYrHKk7zsduQPMGkSHD0K//2vjYMqgT//hIYN1dk9\nrVqV7VgWi0WeVPQXyYVGcqGRXGgcYuR/M0OGwKpV8Pvvekdyc1Onquvxl7XwCyGELdn1yB/Uh55c\nvgwLFtgwqGK6cAF8fGDnTn0esi6EcF4OPfIHtfWzYYN9LvT23/+CwSCFXwhR8dh98b/rLrXIDh5s\nX2v+5OXB9OnWfeKYzGHWSC40kguN5MJ67L74g9pT79pVXSXTXnzxBTz4IDzxhN6RCCFEydl9z7/Q\nhQvQpIl652/nzjYI7BYUBZo3V5dx6NpV31iEEM7JIXr+N5rn/3fVq8PCherSCefP2yaum9myBS5e\n1P9LSAjhfJxinv+NDBoE1arBrFnlGNRtdOoE3bursViTzGHWSC40kguN5ELjECP/kpg6FVasgF27\n9Dn/0aPw/ffw4ov6nF8IIayhwo38QV3357331CLs7l5Ogd3EhAlw5Yq6gJsQQuilrCP/Cln8FQWe\nfRaeeca6Uy1vJzdXneFjsUBAgO3OK4QQf+d0bR9Ql0yeNw9iY2H8eNst/7BmjVr0y6vwyxxmjeRC\nI7nQSC6sp0IWf4D69eHHH9WZPwEB6j0A5f0l8NFH6npDQghR0VXIts/fZWVBTAwsW6ZeEH7pJes/\nUCUtTX02b2YmVK5s3WMLIURJ2X3bJycnhxYtWrBu3bpyO0edOurUz4QEmDEDgoMhPd2655g/H15+\nWQq/EMIxlHvxnzp1Kn369Cnv0wDQrJk6BbR9e/UO3MhI9aHqZXXlivpA+VdeKfuxbkX6mRrJhUZy\noZFcWE+xiv+AAQPw9PQkKCjouv1ms5mAgAD8/PyIjY39x+s2bNhAYGAgtWrVsk60xeDuDuPGwd69\naqsmIECdGlqWztLChfDoo+DnZ704hRBCT8Xq+W/dupXq1avz0ksvsf+v5yrm5+fj7+/Pxo0bqVu3\nLi1atCA+Pp4ffviB3bt3M3bsWObNm0dOTg7JyclUrVqV1atX4/K3Zrw1ev63sm0bhIerraGPP4YH\nHijZ63Ny1KL/9dfqXxZCCGEPbDbPPyMjgy5duhQV/x07dhAVFYX5r4X2Y2JiABg/fvw/XrtkyRJq\n1apFp06drP4GiiM3FyZOVGfrzJmjPh+4uCZNUp8jvHRp+cUnhBAlVdba6VbaFx4/fhxvb++ibS8v\nL5KSkm74uy+//PItj9W/f398fHwA8PDwoGnTpkXrdxT2+Mq6HR1toHNn6NnTwsKFsHq1gSpVbv36\n7GyYOtXCvHkA1o3nRtvX9jPL4/gVabtwn73Eo+f23r17iYiIsJt49NyeMWNGudSHirBtsVhYvHgx\nQFG9LBOlmNLT05XGjRsXbX/xxRfKoEGDirbj4uKU8PDw4h6uSAlCsIo//1SU3r0V5bHHFOWXX279\nu6NHK8rQobaJS1EUZcuWLbY7mZ2TXGgkFxrJhaastbPUs33q1q1LZmZm0XZmZiZeXl6lOlZxlnS2\nlurV1RZO377qg1g2bLjx7/36KyxerM4YspXCb3shubiW5EIjudBhSee/9/zz8vLw9/dn06ZN1KlT\nh8cff5z4+HgaNmxYsgBs0PO/mc2bISwMmjaFKVMgMFB9WPzixerNYv3727b4CyFEcdnkJq++ffvS\nsmVLUlNT8fb2ZtGiRbi5uTFnzhyCg4MJDAykT58+JS78hWw58r9Wu3Zw6BC0aaM+iL1HD6hXD9at\ng08+sX3h1yMH9kpyoZFcaCQXTvwwl/Jy5gzExalfCI0b6xODRR5UUURyoZFcaCQXGqdc0lkIIZyd\n3a/tUxx6tX2EEKKikbaPA5I/aTWSC43kQiO50DjEyF8IIYRt2cXI32g0YjAY5BtdCCFuw2KxYLFY\niIqKkgu+QgjhbKTt40DkordGcqGRXGgkF9YjxV8IIZyQXbR9pOcvhBDFIz1/IYRwYtLzdyDSz9RI\nLjSSC43kwnqk+AshhBOyi7aP9PyFEKJ4pOcvhBBOTHr+DkT6mRrJhUZyoZFcWI8UfyGEcELS9hFC\niApI2j5CCCFKzC6KvzzMRSU50EguNJILjeTCeg9zcSt7KGVnjTcihBDOoHBafFRUVJmOIz1/IYSo\ngKTnL4QQosSk+NsR6WdqJBcayYVGcmE9UvyFEMIJSc9fCCEqIIfo+ctUTyGEKB5rTfWUkb8dsVgs\nsrLpXyQXGsmFRnKhcYiRvxBCCNuSkb8QQlRAMvIXQghRYlL87Yhc9NZILjSSC43kwnqk+AshhBOS\nnr8QQlRAdt3zt1gstG7dmqFDh5KYmFiepxJCCFEC5Vr8XV1dqVGjBleuXMHLy6s8T+UQpJ+pkVxo\nJBcayYX1FKv4DxgwAE9PT4KCgq7bbzabCQgIwM/Pj9jY2H+8rnXr1qxfv56YmBiMRqN1InZge/fu\n1TsEuyG50EguNJIL6ylW8Q8LC8NsNl+3Lz8/n/DwcMxmM8nJycTHx5OSkkJcXByjRo0iKysLFxcX\nADw8PLhy5Yr1o3cwZ8+e1TsEuyG50EguNJIL6ynWk7xat25NRkbGdft27dqFr68vPj4+AISEhLBm\nzRrGjx9PaGgoAKtXr+abb77h7NmzDB8+3KqBCyGEKL1SP8bx+PHjeHt7F217eXmRlJR03e9069aN\nbt26lT46J/P3L1hnJrnQSC40kgvrKXXxL2zplFX9+vWtdixHsGTJEr1DsBuSC43kQiO5UNWvX79M\nry918a9bty6ZmZlF25mZmaWa0ZOWllbaEIQQQpRSqad6Nm/enMOHD5ORkUFubi7Lli2ja9eu1oxN\nCCFEOSlW8e/bty8tW7YkNTUVb29vFi1ahJubG3PmzCE4OJjAwED69OlDw4YNyzteIYQQ1qDoKCEh\nQfH391d8fX2VmJgYPUOxuV9//VUxGAxKYGCg0qhRI2XmzJmKoihKdna28swzzyh+fn5Khw4dlDNn\nzugcqe3k5eUpTZs2VTp37qwoivPm4syZM0qPHj2UgIAApWHDhsrOnTudNheTJ09WAgMDlcaNGyt9\n+/ZVLl++7DS5CAsLU2rXrq00bty4aN+t3vvkyZMVX19fxd/fX/nmm29ue3zdFna72X0CzsLd3Z3p\n06dz4MABdu7cydy5c0lJSSEmJoYOHTqQmppK+/btiYmJ0TtUm5k5cyaBgYFFEwCcNRcjR46kU6dO\npKSksG/fPgICApwyFxkZGSxYsIDdu3ezf/9+8vPzWbp0qdPk4kb3V93svScnJ7Ns2TKSk5Mxm80M\nGzaMgoKCW5+gXL6yimH79u1KcHBw0faUKVOUKVOm6BWO7p5//nllw4YNir+/v/Lbb78piqIoJ06c\nUPz9/XWOzDYyMzOV9u3bK5s3by4a+TtjLs6ePavUq1fvH/udMRfZ2dlKgwYNlNOnTytXr15VOnfu\nrHz77bdOlYv09PTrRv43e++TJ0++rnsSHBys7Nix45bH1m3kf6P7BI4fP65XOLrKyMhgz549PPHE\nE5w8eRJPT08APD09OXnypM7R2caoUaN47733cHXVPpLOmIv09HRq1apFWFgYzZo1Y/DgweTk5Dhl\nLmrWrMmYMWN48MEHqVOnDh4eHnTo0MEpc1HoZu89KyvrutmWxamnuhV/mduvunDhAj169GDmzJnU\nqFHjun9zcXFxijx9/fXX1K5dm0cfffSmS9Q6Sy7y8vLYvXs3w4YNY/fu3dx5553/aGs4Sy6OHDnC\njBkzyMjIICsriwsXLvDpp59e9zvOkosbud17v11edCv+1rpPoCK7evUqPXr0IDQ0lBdeeAFQv81/\n++03AE6cOEHt2rX1DNEmtm/fzldffUW9evXo27cvmzdvJjQ01Clz4eXlhZeXFy1atACgZ8+e7N69\nm/vvv9/pcvHDDz/QsmVL7r33Xtzc3OjevTs7duxwylwUutn/E3+vp8eOHaNu3bq3PJZuxd/Z7xNQ\nFIWBAwcSGBhIRERE0f6uXbsW3cG4ZMmSoi8FRzZ58mQyMzNJT09n6dKltGvXjri4OKfMxf3334+3\ntzepqakAbNy4kUaNGtGlSxeny0VAQAA7d+7k0qVLKIrCxo0bCQwMdMpcFLrZ/xNdu3Zl6dKl5Obm\nkp6ezuHDh3n88cdvfTBrX6AoifXr1ysNGjRQ6tevr0yePFnPUGxu69atiouLi9KkSROladOmStOm\nTZWEhAQlOztbad++vcNPY7sZi8WidOnSRVEUxWlzsXfvXqV58+bKI488onTr1k05e/as0+YiNja2\naKrnSy+9pOTm5jpNLkJCQpQHHnhAcXd3V7y8vJSFCxfe8r1PmjRJqV+/vuLv76+YzebbHl/3xzgK\nIYSwPXmAuxBCOCEp/kII4YSk+AshhBOS4i+EEE5Iir8QQjghKf5CCOGEpPgLIYQTkuIvhBBO6P8B\nSHPGl5LeQEMAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 29 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Quiz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are other life tables in the same [report](http://www.cdc.gov/nchs/data/nvsr/nvsr62/nvsr62_07.pdf):\n", "\n", "* Total population\n", "* Males\n", "* Females\n", "* and so on ..." ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "String Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Series` has the built-in string method equivalents. They, however:\n", "\n", "* are *vectorized*, so that it can be called with a whole `Series`;\n", "* made aware of the missing value (i.e., `np.nan`); and\n", "* have names that starts with `.str`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series(['Aaba', 'Baca', np.nan, 'CcDD'])\n", "s" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "0 Aaba\n", "1 Baca\n", "2 NaN\n", "3 CcDD\n", "dtype: object" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "lowered = s.str.lower()\n", "lowered" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "0 aaba\n", "1 baca\n", "2 NaN\n", "3 ccdd\n", "dtype: object" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`str.replace` and `str.findall` take regular expression, as well!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Finding out why Gracie could not medal in Sochi! :-) Notice that the `str.findall` returns a `Series`, whose elements are a list." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series(['Adelina', 'Yuna', 'Carolina', 'Gracie'])\n", "ends_with_na = s.str.findall(r'.+na$')\n", "ends_with_na" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 32, "text": [ "0 [Adelina]\n", "1 [Yuna]\n", "2 [Carolina]\n", "3 []\n", "dtype: object" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "* `str.replace()` relies on `re.sub()`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 33, "text": [ "0 Adelina\n", "1 Yuna\n", "2 Carolina\n", "3 Gracie\n", "dtype: object" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "s.str.replace(r'(.+na)', r'\\g<0> medals')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ "0 Adelina medals\n", "1 Yuna medals\n", "2 Carolina medals\n", "3 Gracie\n", "dtype: object" ] } ], "prompt_number": 34 }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Summary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Important data structures in pandas\n", "* `Series` has *indexed* values and with a *name.\n", "* `Series` is (like) a NumPy `ndarray`.\n", "* `Series` is (like) a dictionary, as well.\n", "* `Series` automatically aligns data based on index.\n", "* `Series` has many `.str` vectorized methods." ] } ], "metadata": {} } ] }