{ "metadata": { "language": "haskell", "name": "", "signature": "sha256:4f6ce8d83dc489a314b6f638e3d40e8714c39e475f01a6360934c4c32ba318da" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "So here's a puzzle for you.\n", "\n", "You start with an empty string and clipboard. You can perform four operations:\n", "\n", "* **A**ppend a character to the string\n", "* **D**elete a character from the end of the string\n", "* **C**opy the entire string into the clipboard\n", "* **P**aste the contents of the clipboard at the end of the string\n", "\n", "Each operation takes a different amount of time:\n", "\n", "* **A**ppend takes 1 tick\n", "* **D**elete takes 1 tick\n", "* **C**opy takes 3 ticks (`CTRL-A-C`)\n", "* **P**aste takes 2 ticks (`CTRL-P`)\n", "\n", "The question is *\"What's the shortest string length that requires a **delete** to create most efficiently?\"*.\n", "\n", "If you wanted to figure this out for youself, well, you came to the wrong place." ] }, { "cell_type": "code", "collapsed": false, "input": [ "-- we'll need these later\n", "import Data.MemoCombinators (memo2, integral) \n", "import Data.Monoid (mempty) " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 52 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So let's encode what we know so far" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data Operation = Append | Delete | Copy | Paste\n", " deriving (Eq,Show)\n", "\n", "type Length = Int\n", "type Ticks = Int\n", "type Chain = [ Operation ]\n", "\n", "opCost :: Operation -> Int\n", "opCost Append = 1\n", "opCost Delete = 1\n", "opCost Copy = 3\n", "opCost Paste = 2\n", "\n", "chainCost :: Chain -> Int\n", "chainCost = sum . map opCost" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's simplify our problem space. As the problem is stated, we're considering all chains of operations:\n", "\n", " AACAAPAAADADCDDP\n", " PDPPCA\n", " AAAAACDP\n", "\n", "Note a couple things:\n", "\n", "1. **P**asting before we've **C**opied anything doesn't increase the size of the string.\n", "2. Since we're only considering length, not content, **P**asting and **A**ppending commute. Both `PA` and `AP` increase the size of the string by one plus the clipboard length. \n", "3. Similarly, **P**asting and **D**eleting commute. Both `PD` and `DP` increase the size of the string by one minus the clipboard length.\n", "\n", "This lets us restrict us to chains of a certain form. We can rule out all the **P**astes before a **C**opy since we're searching for the most efficient solution, and this only adds ticks. We can restrict pastes to only be after a copy. This gives these meta-operations:\n", "\n", "* **A**ppend a character to the string\n", "* **D**elete a character from the end of the string\n", "* **C**opy the entire string into the clipboard and **P**asting it zero or more times.\n", "\n", "Now all our chains look like this:\n", "\n", " AAACPPPACPPD\n", " AADCPADDD\n", " CPPPACPDD\n", "\n", "There are futher optimizations that could be made, but this turns out to be sufficient for our purposes.\n", "\n", "If we work backwards from a string of length `n` , it could have been made by\n", "\n", "* **A**ppending to a string of length `n-1` in `1` tick,\n", "* **D**eleting from a string of length `n+1` in `t-1` tick,\n", "* or, for all `d` and `q` such that `q * d = n`, by **C**opying a string of length `d` and **P**asting it `q -1` times in a total of `2*q + 1` ticks.\n", "\n", "That last bit deserves an example. A string of 12 could have been by \n", "\n", "* copying a string of length 3 and pasting it 3 times, in 9 ticks.\n", "* copying a string of length 4 and pasting it 2 times, in 7 ticks. \n", "* copying a string of length 6 and pasting it 1 time, in 5 ticks.\n", "\n", "So at this point a helper function to discover all the divisors of a number seems useful:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "-- find all the pairs that multiply to the given value\n", "divisors :: Int -> [ (Int, Int) ]\n", "divisors = integral $ \\n -> [ (d,q) | d <- [1..n], let (q,r) = n `quotRem` d, r == 0 ]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That `integral` above is my first use of the \n", "[`Data.MemoCombinators`](http://hackage.haskell.org/package/data-memocombinators-0.5.1/docs/Data-MemoCombinators.html)\n", "library. It [memoizes](http://en.wikipedia.org/wiki/Memoization) the function, so that the divisors for each input are only computed once (the first time they're requested) and then cached to be reused for future calls.\n", "\n", "This saves me some time. For example, if I'm used `divisors` inside an `m`-fold loop, the loop would be O(m * n) without memoization, but only O(m + n) with memoization.\n", "\n", "Consider again working backwards from a string of length `n`. Though the last meta-operation may have operated upon a shorter (**A**ppend, **C**opy/**P**aste) or longer (**D**elete) string, if were looking at the chain operations that leads\n", "most efficiently to our string of length `n`, than the string that the meta-operation acted upon must be, by definition, more efficient, since we had to spend further ticks after reaching it. So if we know how to efficiently construct strings in less than `t` ticks, we can use that to efficiently construct strings in exactly `t` ticks.\n", "\n", "This opens us up to [Dynamic Programming](http://en.wikipedia.org/wiki/Dynamic_programming), which is the real reason I broke out `Data.MemoCombinators`. Below we use `memo2` to memoize a two argument function that calculates the chains required to generate a string of length `n` in exactly `t` ticks using the exact working backward method discussed above.\n", "\n", "Note that `chainsToIn` calls itself at least four times, so if we used straight recursion, this would lead to a combinatorial explosion of work. Memoization saves that, letting us calculate a single `chainsToIn n t` in O(t2) time and all `chainsToIn n t` for `0 <= n < N` and `0 <= t < T` in O(NT + T2) time." ] }, { "cell_type": "code", "collapsed": false, "input": [ "-- figure out how to get a string of the given length\n", "-- in exactly the given amount of ticks\n", "chainsToIn :: Length -> Ticks -> [ Chain ]\n", "chainsToIn = memo2 integral integral $ \\n t -> case (n,t) of\n", " (0,0) -> return []\n", " (_,t) | t <= 0 -> mempty\n", " (c,_) | c <= 0 -> mempty\n", " _ -> -- helper function to extend chains that generate\n", " -- the given length by the given suffix\n", " -- to create chains to the current location\n", " let moveFrom :: Length -> Chain -> [ Chain ]\n", " moveFrom n' c = map (++c) . chainsToIn n' $ t - chainCost c\n", " -- find all the ways to get to the current location\n", " in concat [ moveFrom (n-1) [Append]\n", " , moveFrom (n+1) [Delete]\n", " , do\n", " (d,q) <- divisors n\n", " moveFrom d $ Copy : replicate (q-1) Paste\n", " ]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 55 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we can find all the ways (if any) of generating a string of length `n` in `t` ticks, we can determine the most efficient ways to generate a string of length `n` by simply iterating from `t=0` up until we find a value of `t` that gives us a non-empty set of ways to find a string of that length." ] }, { "cell_type": "code", "collapsed": false, "input": [ "-- find the optimal chain to e\n", "optimalChainsTo :: Length -> [ Chain ]\n", "optimalChainsTo = integral $ \\n ->\n", " head . dropWhile null $ map (chainsToIn n) [ 0.. ]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our solution, then, is shortest length where all the chains of operations that generate that length in optimal time include a **D**elete." ] }, { "cell_type": "code", "collapsed": false, "input": [ "-- our solution only has chains with a Delete in it\n", "solution :: Length\n", "solution = head $ filter (all (elem Delete) . optimalChainsTo) [ 0.. ]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 57 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see what it is!" ] }, { "cell_type": "code", "collapsed": false, "input": [ "solution" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "text": [ "53" ] } ], "prompt_number": 58 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And what was so special about the chain of operations that generate it so efficiently?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "optimalChainsTo solution" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "text": [ "[[Append,Append,Append,Append,Append,Append,Copy,Paste,Paste,Copy,Paste,Paste,Delete]]" ] } ], "prompt_number": 59 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And how many ticks did it require?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "map chainCost it" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "text": [ "[21]" ] } ], "prompt_number": 60 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Out of curiousity, lets see what the optimal chains look like for strings up to length 64." ] }, { "cell_type": "code", "collapsed": false, "input": [ "-- need a few more libraries\n", "import Text.Printf (printf)\n", "import Control.Monad (forM_)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 61 }, { "cell_type": "code", "collapsed": false, "input": [ "putStrLn \"num ticks chains\"\n", "forM_ [ 0 .. 64 ] $ \\i ->\n", " let cs@(c:_) = optimalChainsTo i\n", " in printf \" %2d %2d %s\\n\" i (chainCost c) (unwords $ map (map $ head . show) cs)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "text": [ "num ticks chains" ] }, { "metadata": {}, "output_type": "display_data", "text": [ " 0 0 \n", " 1 1 A\n", " 2 2 AA\n", " 3 3 AAA\n", " 4 4 AAAA\n", " 5 5 AAAAA\n", " 6 6 AAAAAA\n", " 7 7 AAAAAAA\n", " 8 8 AAAAAAAA\n", " 9 9 AAAAAAAAA\n", " 10 10 AAAAAAAAAA AAAAACP\n", " 11 11 AAAAAAAAAAA AAAAACPA\n", " 12 11 AAAACPP AAAAAACP\n", " 13 12 AAAACPPA AAAAAACPA\n", " 14 12 AAAAAAACP\n", " 15 12 AAAAACPP\n", " 16 13 AAAAACPPA AAAACPPP AAAAAAAACP\n", " 17 14 AAAAACPPAA AAAACPPPA AAAAAAAACPA AAAAAACPPD\n", " 18 13 AAAAAACPP\n", " 19 14 AAAAAACPPA\n", " 20 14 AAAAACPPP\n", " 21 14 AAAAAAACPP\n", " 22 15 AAAAAAACPPA\n", " 23 16 AAAAAAACPPAA AAAAAACPPPD AAAAAAAACPPD\n", " 24 15 AAAAAACPPP AAAAAAAACPP\n", " 25 16 AAAAAACPPPA AAAAAAAACPPA AAAAACPPPP\n", " 26 17 AAAAAACPPPAA AAAAAAAACPPAA AAAAACPPPPA AAAAAAAAACPPD AAAACPPACP AAAAAACPACP\n", " 27 16 AAAAAAAAACPP\n", " 28 16 AAAAAAACPPP\n", " 29 17 AAAAAAACPPPA\n", " 30 17 AAAAAACPPPP AAAAAAAAAACPP AAAAACPCPP AAAAACPPCP\n", " 31 18 AAAAAACPPPPA AAAAAAAAAACPPA AAAAACPCPPA AAAAACPPCPA AAAAAAAACPPPD\n", " 32 17 AAAAAAAACPPP\n", " 33 18 AAAAAAAACPPPA AAAAAAAAAAACPP AAAAACPACPP\n", " 34 19 AAAAAAAACPPPAA AAAAAAAAAAACPPA AAAAACPACPPA AAAAAAACPPPPD AAAAACPPAACP AAAACPPPACP AAAAAAAACPACP AAAAAACPPDCP\n", " 35 18 AAAAAAACPPPP\n", " 36 18 AAAAAAAAACPPP AAAACPPCPP AAAAAACPCPP AAAAAACPPCP\n", " 37 19 AAAAAAAAACPPPA AAAACPPCPPA AAAAAACPCPPA AAAAAACPPCPA\n", " 38 19 AAAAAACPPACP\n", " 39 19 AAAACPPACPP AAAAAACPACPP\n", " 40 19 AAAAAAAACPPPP AAAAAAAAAACPPP AAAAACPCPPP AAAAACPPPCP\n", " 41 20 AAAAAAAACPPPPA AAAAAAAAAACPPPA AAAAACPCPPPA AAAAACPPPCPA AAAAAAACPCPPD AAAAAAACPPCPD\n", " 42 19 AAAAAAACPCPP AAAAAAACPPCP\n", " 43 20 AAAAAAACPCPPA AAAAAAACPPCPA\n", " 44 20 AAAAACPPCPPD AAAAAAAAAAACPPP AAAAACPACPPP AAAAAAACPPACP\n", " 45 19 AAAAACPPCPP\n", " 46 20 AAAAACPPCPPA\n", " 47 21 AAAAACPPCPPAA AAAACPPCPPPD AAAAAACPCPPPD AAAAACPPACPPD AAAACPPPCPPD AAAAAAAACPCPPD AAAAAACPPPCPD AAAAAAAACPPCPD\n", " 48 20 AAAACPPCPPP AAAAAACPCPPP AAAAACPPACPP AAAACPPPCPP AAAAAAAACPCPP AAAAAACPPPCP AAAAAAAACPPCP\n", " 49 21 AAAACPPCPPPA AAAAAACPCPPPA AAAAACPPACPPA AAAACPPPCPPA AAAAAAAACPCPPA AAAAAACPPPCPA AAAAAAAACPPCPA\n", " 50 21 AAAAAAAAAACPPPP AAAAACPCPPPP AAAAAACPPPACP AAAAAAAACPPACP AAAAACPPPPCP\n", " 51 21 AAAAACPPAACPP AAAACPPPACPP AAAAAAAACPACPP AAAAAACPPDCPP\n", " 52 21 AAAACPPACPPP AAAAAACPACPPP\n", " 53 21 AAAAAACPPCPPD\n", " 54 20 AAAAAACPPCPP\n", " 55 21 AAAAAACPPCPPA\n", " 56 21 AAAAAAACPCPPP AAAAAAACPPPCP\n", " 57 21 AAAAAACPPACPP\n", " 58 22 AAAAAACPPACPPA AAAAAAACPPPACP\n", " 59 22 AAAAACPPCPPPD AAAAACPPPCPPD\n", " 60 21 AAAAACPPCPPP AAAAACPPPCPP\n", " 61 22 AAAAACPPCPPPA AAAAACPPPCPPA\n", " 62 22 AAAAAAACPPCPPD\n", " 63 21 AAAAAAACPPCPP\n", " 64 22 AAAAAAACPPCPPA AAAAACPPACPPP AAAACPPPCPPP AAAAAAAACPCPPP AAAAAAAACPPPCP" ] } ], "prompt_number": 66 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This post was written using [IHaskell](https://github.com/gibiansky/IHaskell). Feel free to \n", "[download and play with the IHaskell notebook version of this post](\n", "https://raw.github.com/rampion/CopyPastePuzzle/master/CopyPastePuzzle.ipynb), or [join the discussion on Reddit](http://www.reddit.com/r/haskell/comments/1zy3g8/whats_the_shortest_string_length_that_requires_a/)." ] } ], "metadata": {} } ] }