{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading and Writing Audio Files with wave\n", "\n", "[back to overview page](index.ipynb)\n", "\n", "The `wave` module is part of the Python standard library.\n", "\n", "Documentation:\n", "\n", "* http://docs.python.org/2/library/wave.html (Python 2.x)\n", "* http://docs.python.org/3/library/wave.html (Python 3.x)\n", "\n", "Audio data is handled with the Python type `str` (Python 2.x) or `bytes` (Python 3.x).\n", "\n", "Advantages:\n", "\n", "* part of the standard library, no further dependencies\n", "* 24-bit files can be used (but manual conversion is necessary)\n", "* partial reading is possible\n", "* works with both Python 2 and 3\n", "\n", "Disadvantages:\n", "\n", "* 32-bit float not supported\n", "* WAVEX doesn't work\n", "* manual de-interleaving and conversion is necessary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading\n", "\n", "Reading a 16-bit WAV file into a NumPy array is not hard, but it requires a few lines of code:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import wave\n", "import numpy as np\n", "import utility\n", "\n", "with wave.open('data/test_wav_pcm16.wav') as w:\n", " channels = w.getnchannels()\n", " assert w.getsampwidth() == 2\n", " data = w.readframes(w.getnframes())\n", "\n", "sig = np.frombuffer(data, dtype='" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(sig);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that neither `frombuffer()` nor `reshape()` made a copy of the data. We are still using the buffer of bytes we got from `readframes()`. By using the `.base` attribute of the array we get the result of `frombuffer()` (before `reshape()`) and by using `.base` a second time, we get a reference to the original buffer object:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sig.base.base is data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the `flags` attribute we get a few details about the buffer. `C_CONTIGUOUS` means that the rows are contiguous (in row-major format, like it's used in C). We also see that `sig` doesn't \"own\" the data (it's rather \"borrowed\" from the `data` object) and that it's not writable (because the `bytes` object returned by `readframes()` above happens to be read-only):" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " C_CONTIGUOUS : True\n", " F_CONTIGUOUS : False\n", " OWNDATA : False\n", " WRITEABLE : False\n", " ALIGNED : True\n", " WRITEBACKIFCOPY : False\n", " UPDATEIFCOPY : False" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sig.flags" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've already got the correct values but if we want to do some signal processing with the data, it's normally easier to convert the numers to floating point format and to normalize them to a range from -1 to 1.\n", "\n", "To do that, I wrote a little helper function called `utility.pcm2float()`, located in the file [utility.py](https://github.com/mgeier/python-audio/blob/master/audio-files/utility.py), let's load it and apply it to our signal:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import utility\n", "\n", "normalized = utility.pcm2float(sig, 'float32')\n", "\n", "plt.plot(normalized);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we change the data type from int16 to float32, a copy of the array is created, which now is writable and \"owns\" its data:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " C_CONTIGUOUS : True\n", " F_CONTIGUOUS : False\n", " OWNDATA : True\n", " WRITEABLE : True\n", " ALIGNED : True\n", " WRITEBACKIFCOPY : False\n", " UPDATEIFCOPY : False" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "normalized.flags" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "24-bit files can be opened with `wave.open()` as well, but the conversion is a little more complicated." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sampling rate: 44100 Hz\n", "length: 15 samples\n", "channels: 7\n", "sample width: 3 bytes\n" ] } ], "source": [ "with wave.open('data/test_wav_pcm24.wav') as w:\n", " framerate = w.getframerate()\n", " frames = w.getnframes()\n", " channels = w.getnchannels()\n", " width = w.getsampwidth()\n", " print('sampling rate:', framerate, 'Hz')\n", " print('length:', frames, 'samples')\n", " print('channels:', channels)\n", " print('sample width:', width, 'bytes')\n", " \n", " data = w.readframes(frames)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the sample width is 3 bytes (because we loaded a 24-bit file). Sadly, there is no 3-byte integer type in NumPy. Therefore, we have to fill each 3-byte number with a further byte and then convert it to a 4-byte integer.\n", "We can add this byte (filled with zero-bits) either as most significant or a least significant byte. This doesn't change the precision of the data, we just have to remember which one it was when we do calculations with the stored values.\n", "If we add the zero-byte as LSB, the resulting values will have the full range of a 4-byte integer, therefore we can use the `utility.pcm2float()` function from above.\n", "If we would add the zero-byte as MSB, the range would be limited to a 3-byte integer and we would have to write a new function for normalization." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 255, 255, 127],\n", " [ 0, 219, 182, 109],\n", " [ 0, 182, 109, 91],\n", " [ 0, 146, 36, 73],\n", " [ 0, 109, 219, 54],\n", " [ 0, 73, 146, 36],\n", " [ 0, 36, 73, 18],\n", " [ 0, 242, 82, 115],\n", " [ 0, 222, 103, 68],\n", " [ 0, 67, 88, 20],\n", " [ 0, 100, 185, 239],\n", " [ 0, 17, 204, 221],\n", " [ 0, 223, 12, 223],\n", " [ 0, 220, 182, 237],\n", " [ 0, 131, 206, 79],\n", " [ 0, 22, 150, 231],\n", " [ 0, 47, 160, 173],\n", " [ 0, 191, 25, 190],\n", " [ 0, 11, 203, 243],\n", " [ 0, 74, 205, 22],\n", " [ 0, 36, 73, 18],\n", " [ 0, 145, 123, 28],\n", " [ 0, 158, 38, 157],\n", " [ 0, 199, 254, 198],\n", " [ 0, 148, 154, 45],\n", " [ 0, 177, 108, 49],\n", " [ 0, 178, 220, 247],\n", " [ 0, 220, 182, 237],\n", " [ 0, 111, 132, 227],\n", " [ 0, 158, 38, 157],\n", " [ 0, 57, 1, 57],\n", " [ 0, 148, 154, 45],\n", " [ 0, 79, 147, 206],\n", " [ 0, 178, 220, 247],\n", " [ 0, 36, 73, 18],\n", " [ 0, 125, 49, 176],\n", " [ 0, 22, 150, 231],\n", " [ 0, 209, 95, 82],\n", " [ 0, 191, 25, 190],\n", " [ 0, 245, 52, 12],\n", " [ 0, 74, 205, 22],\n", " [ 0, 220, 182, 237],\n", " [ 0, 14, 173, 140],\n", " [ 0, 222, 103, 68],\n", " [ 0, 189, 167, 235],\n", " [ 0, 100, 185, 239],\n", " [ 0, 239, 51, 34],\n", " [ 0, 223, 12, 223],\n", " [ 0, 36, 73, 18],\n", " [ 0, 1, 0, 128],\n", " [ 0, 219, 182, 109],\n", " [ 0, 74, 146, 164],\n", " [ 0, 146, 36, 73],\n", " [ 0, 147, 36, 201],\n", " [ 0, 73, 146, 36],\n", " [ 0, 220, 182, 237],\n", " [ 0, 14, 173, 140],\n", " [ 0, 222, 103, 68],\n", " [ 0, 189, 167, 235],\n", " [ 0, 100, 185, 239],\n", " [ 0, 239, 51, 34],\n", " [ 0, 223, 12, 223],\n", " [ 0, 36, 73, 18],\n", " [ 0, 125, 49, 176],\n", " [ 0, 22, 150, 231],\n", " [ 0, 209, 95, 82],\n", " [ 0, 191, 25, 190],\n", " [ 0, 245, 52, 12],\n", " [ 0, 74, 205, 22],\n", " [ 0, 220, 182, 237],\n", " [ 0, 111, 132, 227],\n", " [ 0, 158, 38, 157],\n", " [ 0, 57, 1, 57],\n", " [ 0, 148, 154, 45],\n", " [ 0, 79, 147, 206],\n", " [ 0, 178, 220, 247],\n", " [ 0, 36, 73, 18],\n", " [ 0, 145, 123, 28],\n", " [ 0, 158, 38, 157],\n", " [ 0, 199, 254, 198],\n", " [ 0, 148, 154, 45],\n", " [ 0, 177, 108, 49],\n", " [ 0, 178, 220, 247],\n", " [ 0, 220, 182, 237],\n", " [ 0, 131, 206, 79],\n", " [ 0, 22, 150, 231],\n", " [ 0, 47, 160, 173],\n", " [ 0, 191, 25, 190],\n", " [ 0, 11, 203, 243],\n", " [ 0, 74, 205, 22],\n", " [ 0, 36, 73, 18],\n", " [ 0, 242, 82, 115],\n", " [ 0, 222, 103, 68],\n", " [ 0, 67, 88, 20],\n", " [ 0, 100, 185, 239],\n", " [ 0, 17, 204, 221],\n", " [ 0, 223, 12, 223],\n", " [ 0, 220, 182, 237],\n", " [ 0, 255, 255, 127],\n", " [ 0, 219, 182, 109],\n", " [ 0, 182, 109, 91],\n", " [ 0, 146, 36, 73],\n", " [ 0, 109, 219, 54],\n", " [ 0, 73, 146, 36],\n", " [ 0, 36, 73, 18]], dtype=uint8)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "assert width == 3\n", "\n", "temp = bytearray()\n", "\n", "for i in range(0, len(data), 3):\n", " temp.append(0)\n", " temp.extend(data[i:i+3])\n", "\n", "# Using += instead of .extend() may be faster\n", "# (see https://youtu.be/z9Hmys8ojno?t=35m50s).\n", "# But starting with an empty bytearray and\n", "# extending it on each iteration might be slow, anyway.\n", "# See further below for how to reserve all necessary memory in the beginning.\n", "\n", "four_bytes = np.frombuffer(temp, dtype='B').reshape(-1, 4)\n", "four_bytes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a bytearray where each group of 4 bytes represents an integer (in little-endian order).\n", "\n", "Next, let's convert it to actual integers and reshape the channels into columns." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2147483392, 1840700160, 1533916672, 1227133440, 920349952,\n", " 613566720, 306783232],\n", " [ 1934815744, 1147657728, 341328640, -273062912, -573828864,\n", " -552804608, -306783232],\n", " [ 1338934016, -409594368, -1382011136, -1105608960, -204797184,\n", " 382552576, 306783232],\n", " [ 477860096, -1658413568, -956381440, 765105152, 829206784,\n", " -136531456, -306783232],\n", " [ -477860096, -1658413568, 956381440, 765105152, -829206784,\n", " -136531456, 306783232],\n", " [-1338934016, -409594368, 1382011136, -1105608960, 204797184,\n", " 382552576, -306783232],\n", " [-1934815744, 1147657728, -341328640, -273062912, 573828864,\n", " -552804608, 306783232],\n", " [-2147483392, 1840700160, -1533916672, 1227133440, -920349952,\n", " 613566720, -306783232],\n", " [-1934815744, 1147657728, -341328640, -273062912, 573828864,\n", " -552804608, 306783232],\n", " [-1338934016, -409594368, 1382011136, -1105608960, 204797184,\n", " 382552576, -306783232],\n", " [ -477860096, -1658413568, 956381440, 765105152, -829206784,\n", " -136531456, 306783232],\n", " [ 477860096, -1658413568, -956381440, 765105152, 829206784,\n", " -136531456, -306783232],\n", " [ 1338934016, -409594368, -1382011136, -1105608960, -204797184,\n", " 382552576, 306783232],\n", " [ 1934815744, 1147657728, 341328640, -273062912, -573828864,\n", " -552804608, -306783232],\n", " [ 2147483392, 1840700160, 1533916672, 1227133440, 920349952,\n", " 613566720, 306783232]], dtype=int32)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sig = np.frombuffer(temp, dtype='" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "normalized = utility.pcm2float(sig, 'float32')\n", "plt.plot(normalized);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "WAVEX doesn't work:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"\", line 3, in \n", " w = wave.open('data/test_wavex_pcm16.wav')\n", " File \"/usr/lib/python3.7/wave.py\", line 510, in open\n", " return Wave_read(f)\n", " File \"/usr/lib/python3.7/wave.py\", line 164, in __init__\n", " self.initfp(f)\n", " File \"/usr/lib/python3.7/wave.py\", line 144, in initfp\n", " self._read_fmt_chunk(chunk)\n", " File \"/usr/lib/python3.7/wave.py\", line 269, in _read_fmt_chunk\n", " raise Error('unknown format: %r' % (wFormatTag,))\n", "wave.Error: unknown format: 65534\n" ] } ], "source": [ "import traceback\n", "try:\n", " w = wave.open('data/test_wavex_pcm16.wav')\n", "except:\n", " traceback.print_exc()\n", "else:\n", " print('It works (unexpectedly)!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try 32-bit float:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"\", line 2, in \n", " w = wave.open('data/test_wav_float32.wav')\n", " File \"/usr/lib/python3.7/wave.py\", line 510, in open\n", " return Wave_read(f)\n", " File \"/usr/lib/python3.7/wave.py\", line 164, in __init__\n", " self.initfp(f)\n", " File \"/usr/lib/python3.7/wave.py\", line 144, in initfp\n", " self._read_fmt_chunk(chunk)\n", " File \"/usr/lib/python3.7/wave.py\", line 269, in _read_fmt_chunk\n", " raise Error('unknown format: %r' % (wFormatTag,))\n", "wave.Error: unknown format: 3\n" ] } ], "source": [ "try:\n", " w = wave.open('data/test_wav_float32.wav')\n", "except:\n", " traceback.print_exc()\n", "else:\n", " print('It works (unexpectedly)!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Writing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TODO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way (without NumPy): http://soledadpenades.com/2009/10/29/fastest-way-to-generate-wav-files-in-python-using-the-wave-module/\n", "\n", "Another way for 24-bit WAV files (with NumPy): https://github.com/WarrenWeckesser/wavio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Version Info" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Versions: NumPy = 1.16.2; IPython = 7.5.0.dev\n", "Python interpreter:\n", "3.7.4 (default, Jul 11 2019, 10:43:21) \n", "[GCC 8.3.0]\n" ] } ], "source": [ "import sys, IPython\n", "print('Versions: NumPy = {}; IPython = {}'.format(np.__version__, IPython.__version__))\n", "\n", "print('Python interpreter:')\n", "print(sys.version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " \n", " \"CC0\"\n", " \n", "
\n", " To the extent possible under law,\n", " the person who associated CC0\n", " with this work has waived all copyright and related or neighboring\n", " rights to this work.\n", "

" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }