{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `numpy`\n", "## A multidimensional array framework and more" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0.]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.zeros((10,5)) # create an array filled with zeros\n", "A" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dtype # default data type is f8, aka double-precision floating point (8 bytes per number)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "B = np.zeros((10,5), dtype='i4') # 4 byte (32 bit) integer" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "dtype('int32')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B.dtype" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Some common data types:\n", "- `f` floating-point number: `f4` single precision, `f8` double precision\n", "- `i` (signed) integer number: `i4` 32-bit integer, `i8` 64-bit integer\n", "- `u` unsigned integer\n", "- `c` complex floating-point: `c16` double-precision for real and imaginary part\n", "- `S` string\n", "- `O` arbitrary Python objects (inefficent but flexible)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Arrays can be initialized using anything iterable. Most commonly this is a (nested) list." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C = np.array([[1,2,3],[4,5,6],[7,8,9]])\n", "C" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C[0,2] # indices always start at 0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Array Operations\n", "Many operators are overloaded so that operations are applied element-wise." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 4, 6],\n", " [ 8, 10, 12],\n", " [14, 16, 18]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 * C" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 6, 12],\n", " [20, 30, 42],\n", " [56, 72, 90]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C + C**2" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 6, 10],\n", " [ 6, 10, 14],\n", " [10, 14, 18]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C + C.T # .T transposes the array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Shapes\n", "Arrays can easily be flattened (converted to 1D) or reshaped, provided that the total size does not change." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 3)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.shape" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.flatten()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n", " 17, 18, 19])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D = np.arange(20) # like range but returns an array instead of an iterator\n", "D" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D.reshape((4,5))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Broadcasting\n", "For operations between two arrays to succeed the corresponding dimensions have to be equal or one of them has to be one. In the latter case the size-one dimension is broadcast over the entries of the other array in that dimension." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "E = np.array([10, 20, 30])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 20, 30])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[11, 22, 33],\n", " [14, 25, 36],\n", " [17, 28, 39]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C + E" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To facilitate efficient broadcasting, empty dimensions can be inserted using `None`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "F = E[None,:]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[10, 20, 30]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[11, 22, 33],\n", " [14, 25, 36],\n", " [17, 28, 39]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C + F" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Broadcasting can create large arrays from 1D arrays.\n", "\n", "For example, compute radius on a 2D grid from 1D coordinate arrays." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = np.arange(10)[:,None]\n", "Y = np.arange(10)[None,:]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((10, 1), (1, 10))" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.shape, Y.shape" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 10)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(X**2 + Y**2).shape" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 0. , 1. , 2. , 3. ,\n", " 4. , 5. , 6. , 7. ,\n", " 8. , 9. ],\n", " [ 1. , 1.41421356, 2.23606798, 3.16227766,\n", " 4.12310563, 5.09901951, 6.08276253, 7.07106781,\n", " 8.06225775, 9.05538514],\n", " [ 2. , 2.23606798, 2.82842712, 3.60555128,\n", " 4.47213595, 5.38516481, 6.32455532, 7.28010989,\n", " 8.24621125, 9.21954446],\n", " [ 3. , 3.16227766, 3.60555128, 4.24264069,\n", " 5. , 5.83095189, 6.70820393, 7.61577311,\n", " 8.54400375, 9.48683298],\n", " [ 4. , 4.12310563, 4.47213595, 5. ,\n", " 5.65685425, 6.40312424, 7.21110255, 8.06225775,\n", " 8.94427191, 9.8488578 ],\n", " [ 5. , 5.09901951, 5.38516481, 5.83095189,\n", " 6.40312424, 7.07106781, 7.81024968, 8.60232527,\n", " 9.43398113, 10.29563014],\n", " [ 6. , 6.08276253, 6.32455532, 6.70820393,\n", " 7.21110255, 7.81024968, 8.48528137, 9.21954446,\n", " 10. , 10.81665383],\n", " [ 7. , 7.07106781, 7.28010989, 7.61577311,\n", " 8.06225775, 8.60232527, 9.21954446, 9.89949494,\n", " 10.63014581, 11.40175425],\n", " [ 8. , 8.06225775, 8.24621125, 8.54400375,\n", " 8.94427191, 9.43398113, 10. , 10.63014581,\n", " 11.3137085 , 12.04159458],\n", " [ 9. , 9.05538514, 9.21954446, 9.48683298,\n", " 9.8488578 , 10.29563014, 10.81665383, 11.40175425,\n", " 12.04159458, 12.72792206]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sqrt(X**2 + Y**2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Fancy Indexing\n", "Indexes for `numpy` arrays can be more than simple numbers and slices." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G = np.arange(10) - 5\n", "G" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, False, False, False, True, True, True, True], dtype=bool)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = G > 0 # This creates a boolean array. It can be used as a mask.\n", "M" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G[M] # This picks only the elements, for which the mask is True.\n", "# Very easy way to filter data." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([-3, 2])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G[[2,7]] # Pick only few indices using a list or array as the index." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All this also works on multiple dimensions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Structured Arrays\n", "Arrays can hold different data types in their columns." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# a list of people\n", "names = [ \"Aaron\", \"Freddy\", \"Xavier\", \"Kyong\", \"Carole\", \"Arla\"]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Let's generate some sample data\n", "from numpy.random import normal # normal distribution\n", "height = normal(loc=1.75, scale=0.2, size=len(names))\n", "weight = normal(loc=75, scale=15, size=len(names))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[('Aaron', 1.6415660400240297, 50.060908347007967),\n", " ('Freddy', 1.377722441962725, 52.688480059609802),\n", " ('Xavier', 1.5195474593946803, 70.869882382122924),\n", " ('Kyong', 1.9918895922796507, 99.821972101433801),\n", " ('Carole', 1.7294163329715906, 90.685158799677012),\n", " ('Arla', 1.7767299770080927, 51.06830862621657)]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# generate a combined list\n", "L = list(zip(names, height, weight))\n", "L" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([['Aaron', '1.64156604002', '50.060908347'],\n", " ['Freddy', '1.37772244196', '52.6884800596'],\n", " ['Xavier', '1.51954745939', '70.8698823821'],\n", " ['Kyong', '1.99188959228', '99.8219721014'],\n", " ['Carole', '1.72941633297', '90.6851587997'],\n", " ['Arla', '1.77672997701', '51.0683086262']], \n", " dtype='<U13')" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array(L)\n", "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This goes back to the most general common datatype, a string in this case." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We want a mixed data type to be able to treat numbers as numbers. Also fields should have some kind of description." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "A = np.array(L, dtype=[('name','O'), ('height','f8'), ('weight','f8')])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([('Aaron', 1.64156604, 50.06090835),\n", " ('Freddy', 1.37772244, 52.68848006),\n", " ('Xavier', 1.51954746, 70.86988238),\n", " ('Kyong', 1.99188959, 99.8219721 ),\n", " ('Carole', 1.72941633, 90.6851588 ),\n", " ('Arla', 1.77672998, 51.06830863)], \n", " dtype=[('name', 'O'), ('height', '<f8'), ('weight', '<f8')])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We store the strings as generic \"objects\". This avoids setting an upper limit on the string length beforehand and also solves several portability issues between Python 2 and 3.\n", "\n", "Reconsider this choice for very large datasets, where total memory is an issue." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "('Freddy', 1.37772244, 52.68848006)" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[1] # Prints all entries from the second row. " ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 1.64156604, 1.37772244, 1.51954746, 1.99188959, 1.72941633,\n", " 1.77672998])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A['height'] # just the height column" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kyong\n", "Kyong\n" ] } ], "source": [ "# Arbitrary combinations are possible.\n", "print(A[3]['name'])\n", "print(A['name'][3])" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([('Aaron', 1.64156604), ('Freddy', 1.37772244),\n", " ('Xavier', 1.51954746)], \n", " dtype=[('name', 'O'), ('height', '<f8')])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Indexing rules work as expected.\n", "A[['name','height']][:3]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Sorting\n", "You can sort by individual fields." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "array([('Freddy', 1.37772244, 52.68848006),\n", " ('Xavier', 1.51954746, 70.86988238),\n", " ('Aaron', 1.64156604, 50.06090835),\n", " ('Carole', 1.72941633, 90.6851588 ),\n", " ('Arla', 1.77672998, 51.06830863),\n", " ('Kyong', 1.99188959, 99.8219721 )], \n", " dtype=[('name', 'O'), ('height', '<f8'), ('weight', '<f8')])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sort(A, order='height') # This creates a new sorted array and leaves the original one untouched." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "A.sort(order='weight') # This changes the order of the original array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Extending Structured Arrays\n", "Arrays have a fixed types. To extend the fields we need to create a new array." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 18.57727489, 16.17739593, 27.7582578 , 30.69256431,\n", " 30.32055213, 25.15913009])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "BMI = A['weight'] / A['height']**2\n", "BMI" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/tux/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: FutureWarning: Assignment between structured arrays with different field names will change in numpy 1.13.\n", "\n", "Previously fields in the dst would be set to the value of the identically-named field in the src. In numpy 1.13 fields will instead be assigned 'by position': The Nth field of the dst will be set to the Nth field of the src array.\n", "\n", "See the release notes for details\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] } ], "source": [ "B = np.array(A, dtype=[('name','O'), ('height','f8'), ('weight','f8'), ('BMI','f8')])" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([('Aaron', 1.64156604, 50.06090835, 0.),\n", " ('Arla', 1.77672998, 51.06830863, 0.),\n", " ('Freddy', 1.37772244, 52.68848006, 0.),\n", " ('Xavier', 1.51954746, 70.86988238, 0.),\n", " ('Carole', 1.72941633, 90.6851588 , 0.),\n", " ('Kyong', 1.99188959, 99.8219721 , 0.)], \n", " dtype=[('name', 'O'), ('height', '<f8'), ('weight', '<f8'), ('BMI', '<f8')])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "B['BMI'] = BMI" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array(['Freddy', 'Xavier', 'Carole', 'Kyong'], dtype=object)" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B['name'][B['BMI'] > 25.] # Selecting data is easy." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Record Arrays\n", "These provide a convenient interface to structured arrays." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "C = np.rec.array(B)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 18.57727489, 16.17739593, 27.7582578 , 30.69256431,\n", " 30.32055213, 25.15913009])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.weight / C.height**2 # access via attributes possible" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Input / Output\n", "`numpy` has many routines for reading and writing data in plain text and binary form." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "R = np.random.rand(30, 4) * 100 - 50" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.savetxt('mydata.dat', R) # writes the table into a plain text file" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def printlines(fname, n=4):\n", " \"\"\"Print the first n lines of the file fname.\"\"\"\n", " with open('mydata.dat','r') as f:\n", " for i in range(n):\n", " print(f.readline().strip())" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.784805495288934196e+01 4.173971821362576407e+01 4.998161922760758102e+01 1.783026450910197980e+01\n", "-4.022866923187805810e+01 1.001728759925784829e+01 -4.126332461183306322e+01 -1.907670533747464248e+01\n", "-3.051301990960874022e+01 1.498539902764959209e+01 2.293480495792766760e+01 4.709002459397736118e+01\n", "-3.253390068379028577e+01 5.470338543828347611e+00 -2.605682913043973059e+01 1.833121045153937700e+00\n" ] } ], "source": [ "printlines('mydata.dat')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The default format is `'%.18e'`, 18 decimal digits in exponential notation." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "np.savetxt('mydata.dat', R, fmt='%.2g', delimiter='\\t')" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18\t42\t50\t18\n", "-40\t10\t-41\t-19\n", "-31\t15\t23\t47\n", "-33\t5.5\t-26\t1.8\n" ] } ], "source": [ "printlines('mydata.dat')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This can be used to create simple CSV (comma-separated values). The `csv` package offers more comprehensive support, specifically also for data exchange with spreadsheet software." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`savetxt` also takes open file handles as an argument." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "with open('mydata.dat','wb') as f:\n", " # First write a header field.\n", " f.write(b\"#A\\tB\\tfield3\\tfour\\n\")\n", " # Now save the data to the open file.\n", " np.savetxt(f, R, fmt=\"%.2g\", delimiter='\\t')" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#A\tB\tfield3\tfour\n", "18\t42\t50\t18\n", "-40\t10\t-41\t-19\n", "-31\t15\t23\t47\n" ] } ], "source": [ "printlines('mydata.dat')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Reading plain text files" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 18. , 42. , 50. , 18. ],\n", " [-40. , 10. , -41. , -19. ],\n", " [-31. , 15. , 23. , 47. ],\n", " [-33. , 5.5 , -26. , 1.8 ],\n", " [ 49. , 31. , 0.92, -18. ],\n", " [ 11. , 37. , 4.4 , -5.4 ],\n", " [ 25. , 46. , -20. , 5.9 ],\n", " [ 18. , -38. , -19. , -5.7 ],\n", " [ 29. , 21. , 44. , 17. ],\n", " [-36. , 15. , 13. , -9.1 ],\n", " [-30. , -9.9 , -26. , 20. ],\n", " [-12. , 4.8 , -48. , 31. ],\n", " [ -4.2 , 4. , -34. , 32. ],\n", " [-31. , 29. , 48. , 18. ],\n", " [ 13. , -25. , 45. , 35. ],\n", " [ 39. , -19. , 11. , -8. ],\n", " [-44. , -39. , -37. , -38. ],\n", " [ 5.1 , -23. , 11. , -19. ],\n", " [-44. , -34. , 20. , -13. ],\n", " [ -6.1 , -5.8 , 6.6 , 26. ],\n", " [ 33. , -48. , -43. , 11. ],\n", " [ 19. , 36. , -37. , 29. ],\n", " [-18. , -34. , -14. , -14. ],\n", " [ 42. , 38. , 48. , -25. ],\n", " [ 14. , -43. , -20. , 3.6 ],\n", " [-33. , -35. , 11. , 24. ],\n", " [ 48. , -33. , 31. , -24. ],\n", " [ 17. , 34. , 6.5 , 27. ],\n", " [-48. , -49. , -24. , 23. ],\n", " [ 26. , 40. , 18. , 37. ]])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E = np.loadtxt('mydata.dat')\n", "E" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The comment character (`#`) can also be changed. The keyword argument `skiprows` is useful when skipping information at the beginning of a file." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`genfromtxt` is a more powerful version of loadtxt. It can read field directly from a header line and can apply arbitrary conversions to columns." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "F = np.genfromtxt('mydata.dat', names=True)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([( 18. , 42. , 50. , 18. ), (-40. , 10. , -41. , -19. ),\n", " (-31. , 15. , 23. , 47. ), (-33. , 5.5, -26. , 1.8),\n", " ( 49. , 31. , 0.92, -18. ), ( 11. , 37. , 4.4 , -5.4),\n", " ( 25. , 46. , -20. , 5.9), ( 18. , -38. , -19. , -5.7),\n", " ( 29. , 21. , 44. , 17. ), (-36. , 15. , 13. , -9.1),\n", " (-30. , -9.9, -26. , 20. ), (-12. , 4.8, -48. , 31. ),\n", " ( -4.2, 4. , -34. , 32. ), (-31. , 29. , 48. , 18. ),\n", " ( 13. , -25. , 45. , 35. ), ( 39. , -19. , 11. , -8. ),\n", " (-44. , -39. , -37. , -38. ), ( 5.1, -23. , 11. , -19. ),\n", " (-44. , -34. , 20. , -13. ), ( -6.1, -5.8, 6.6 , 26. ),\n", " ( 33. , -48. , -43. , 11. ), ( 19. , 36. , -37. , 29. ),\n", " (-18. , -34. , -14. , -14. ), ( 42. , 38. , 48. , -25. ),\n", " ( 14. , -43. , -20. , 3.6), (-33. , -35. , 11. , 24. ),\n", " ( 48. , -33. , 31. , -24. ), ( 17. , 34. , 6.5 , 27. ),\n", " (-48. , -49. , -24. , 23. ), ( 26. , 40. , 18. , 37. )], \n", " dtype=[('A', '<f8'), ('B', '<f8'), ('field3', '<f8'), ('four', '<f8')])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Binary Data\n", "Larger datasets take a lot of memory and a long time to read and write if saves as plain text. Using binary is much more efficient but we need additional information to read it." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 2 }