=encoding utf8 =head1 TITLE Synopsis 7: Lists and Iteration =head1 VERSION Created: 07 September 2015 Last Modified: 07 September 2015 Version: 1 =head1 Design Overview Perl 6 provides a wide array of list-related features, including eager, lazy, and parallel evaluation of sequences of values, and arrays offering compact and multi-dimensional storage. Laziness in particular needs careful handling, so as to provide the power advanced users desire while not creating surprises for typical language users who have the (reasonable) expectation that an assignment into an array will have immediate effect. Additionally, it is important to give the programmer control of when values will and won't be retained. Finally, all of this needs to be done in a way that provides the convenience that a Perl is expected to provide, while still having a model that can be understood through understanding a small number of rules. =head2 Sequences vs. Lists In Perl 6, we use the term "sequence" to refer to something that can, when requested, produce a sequence of values. Of note, it can only be asked to produce them once. We use the term "list" to refer to things that hold (and so remember) values. There are various concrete types that represent various kinds of list and sequence with different semantics: (1, 2, 3) # a List, the simplest kind of list [1, 2, 3] # an Array, a list of (assignable) Scalar containers |(1, 2) # a Slip, a list which flattens into a surrounding List $*IN.lines # a Seq, a sequence that can be processed serially (^1000).race # a HyperSeq, a sequence that can be processed in parallel =head2 The single argument rule The C<@> sigil in Perl indicates "these", while C<$> indicates "the". This kind of plural/single distinction shows up in various places in the language, and much convenience in Perl comes from it. Flattening is the idea that an C<@>-like thing will, in certain contexts, have its values automatically incorporated into the surrounding list. Traditionally this has been a source of both great power and great confusion in Perl. Perl 6 has been through a number of models relating to flattening during its evolution, before settling on a straightforward one known as the "single argument rule". The single argument rule is best understood by considering the number of iterations that a C loop will do. The thing to iterate over is always treated as a single argument to the C loop, thus the name of the rule. for 1, 2, 3 { } # List of 3 things; 3 iterations for (1, 2, 3) { } # List of 3 things; 3 iterations for [1, 2, 3] { } # Array of 3 things (put in Scalars); 3 iterations for @a, @b { } # List of 2 things; 2 iterations for (@a,) { } # List of 1 thing; 1 iteration for (@a) { } # List of @a.elems things; @a.elems iterations for @a { } # List of @a.elems things; @a.elems iterations The first two are equivalent because parentheses do not actually construct a list, but only group. It is the C<< infix:<,> >> operator that forms a list. The third also performs three iterations, since in Perl 6 C<[...]> constructs an C but does not wrap it into a C container. The fourth will do two iterations, since the argument is a list of two things; that they both happen to have the C<@>-sigil does not, alone, lead to any kind of flattening. The same goes for the fifth; C<< infix:<,> >> will happily form a list of one thing. The single argument rule does respect C containers. Therefore: for $(1, 2, 3) { } # List in a Scalar; 1 iteration for $[1, 2, 3] { } # Array in a Scalar; 1 iteration for $@a { } # Array in a Scalar; 1 iteration The single argument rule is implemented consistently throughout the language. For example, consider the C method: @a.append: 1, 2, 3; # appends 3 values to @a @a.append: [1, 2, 3]; # appends 3 values to @a @a.append: @b; # appends @b.elems values to @a @a.append: @b,; # same, trailing comma doesn't make > 1 argument @a.append: $(1, 2, 3); # appends 1 value (a List) to @a @a.append: $[1, 2, 3]; # appends 1 value (an Array) to @a Additionally, the list constructor (the C<< infix:<,> >> operator) and the array composer (the C<[...]> circumfix) follow the rule: [1, 2, 3] # Array of 3 elements [@a, @b] # Array of 2 elements [@a, 1..10] # Array of 2 elements [@a] # Array with the elements of @a copied into it [1..10] # Array with 10 elements [$@a] # Array with 1 element (@a) [@a,] # Array with 1 element (@a) [[1]] # Same as [1] [[1],] # Array with a single element that is [1] [$[1]] # Array with a single element that is [1] The only one of these that is likely to provide a surprise is C<[[1]]>, but it is deemed sufficiently rare that it does not warrant an exception to the very general single argument rule. =head1 User-level Types =head2 List A C is an immutable, potentially infinite, list of values. The simplest way to form a List is with the C<< infix:<,> >> operator: 1, 2, 3 A C can be indexed and, provided it is finite, asked for its number of elements: say (1, 2, 3)[1]; # 2 say (1, 2, 3).elems; # 3 As it is immutable, it is not possible to C, C, C, C, or C a C. The C and C operations return new Cs. While a C itself is immutable, it may freely contain mutable things, including C containers. Thus: my $a = 2; my $b = 4; ($a, $b)[0]++; ($a, $b)[1] *= 2; say $a; # 3 say $b; # 8 Trying to assign to an immutable value in a C is an error, however. (1, 2, 3)[0]++; # Dies: Cannot assign to an immutable value =head2 Slip The C type is a subclass of C. A C will have its values incorporated into a surrounding C. (1, (2, 3), 4).elems # 3 (1, slip(2, 3), 4).elems # 4 It is possible to coerce a C to a C, so the above can also be written as: (1, (2, 3).Slip, 4).elems # 4 This is a common way to get flattening in places it will not magically take place: my @a = 1, 2, 3; my @b = 4, 5; for @a.Slip, @b.Slip { } # 5 iterations It's also a bit verbose, which is why the C<< prefix:<|> >> operator will, anywhere other than a function call argument list, do a C coercion: my @a = 1, 2, 3; my @b = 4, 5; for |@a, |@b { } # 5 iterations It can also be useful in forms such as: my @prefixed-values = 0, |@values; Where the single argument rule would otherwise make C<@prefixed-values> have two elements, the zero and C<@values>. The C type is also respected by C, C/C, and lazy loops. It is the way a C can place multiple values into its result stream: my @a = 1, 2; say @a.map({ $_ xx 2 }).elems; # 2 say @a.map({ |($_ xx 2) }).elems; # 4 =head2 Array An C is a subclass of C that places values assigned to it into C containers, meaning they can be mutated. It is the default type that an C<@>-sigil variable gets. my @a = 1, 2, 3; say @a.WHAT; # (Array) @a[1]++; say @a; # 1 3 3 In the absence of a shape specification, it will grow automatically. my @a; @a[5] = 42; say @a.elems; # 6 An C supports C, C, C, C, and C. Assignment to an array is eager by default, and creates a new set of Scalar containers: my @a = 1, 2, 3; my @b = @a; @a[1]++; say @b; # 1, 2, 3 Note that the C<[...]> C constructor is equivalent to creating and then assigning to an anonymous C, and so has the same semantics with regard to eagerness and fresh containers. =head2 Seq A C is a one-shot producer of values. Most list processing operations return a C, as do most synchronous sources of multiple values. say (1, 2, 3).map(* + 1).^name; # Seq say (1, 2 Z 'a', 'b').^name; # Seq say (1, 1, * + * ... *).^name; # Seq say $*IN.lines.^name; # Seq Since a C will not by default remember its values, it can only be consumed once. For example, if a C is stored: my \seq = (1, 2, 3).map(* + 1); Then only one attempt to iterate it will work; subsequent attempts will die as the values have already been consumed: for seq { .say } # 2\n3\n4\n for seq { .say } # Dies: This Seq has already been iterated This means you can be confident that a loop going over a file's lines: for open('data').lines { .say if /beer/; } Will not be retaining the lines of the file in memory. Additionally, it is easy to set up processing pipelines that also will not retain all of the lines in memory: my \lines = open('products').lines; my \beer = lines.grep(/beer/); my \excited = beer.map(&uc); .say for excited; However, any attempt to re-use C, C, or C will result in an error. This program is equivalent in performance to: .say for open('products').lines.grep(/beer/).map(&uc); But provides a chance to name the stages. Note that it's possible to use C variables instead, but the single argument rule means that the final loop would have to be: .say for |$excited; Assigning a C to an C will - so long as the sequence is not marked lazy - eagerly perform the operation and store the results into the C. Therefore, there are no surprises to anyone writing: my @lines = open('products').lines; my @beer = @lines.grep(/beer/); my @excited = @beer.map(&uc); .say for @excited; Re-using any of these arrays will work out fine. Of course, the memory behavior of this program is radically different, and it will be slower due to all of the extra C containers created (resulting in extra garbage collection) and poor locality of reference (we have to talk about the same string many times over the programs lifetime). Occasionally it can be useful to request that a C cache itself. This can be done by calling the C method on a C, which makes a lazy C from the C and returns it. Subsequent calls to C will return the same lazy list. Note that the first call to C counts as consuming the C, and so it will not work out if any prior iteration has taken place, and any later attempt to iterate the C after calling C will also fail. It is only C<.cache> which may be called more than once. A C does not do the C role like a C. Therefore, it can not be bound to an C<@>-sigil variable: my @lines := $*IN.lines; # Dies One consequence of this is that, naively, you could not pass a C as an argument to be bound to an C<@>-sigil parameter: sub process(@data) { } process($*IN.lines); This would be rather too inconvenient. Therefore, the signature binder (which actually uses C<::=> assignment semantics rather than C<:=>) will spot failure to bind the <@>-sigil parameter, and then check if the argument does the C role. If it does, then it will call the C method on the argument and bind the result of that instead. =head2 Iterable Both C and C, along with various other types in Perl 6, do the C role. The primary purpose of this role is to promise that an C method is available. The average Perl 6 user will rarely need to care about the C method and what it returns. The secondary purpose of C is to mark out things that will flatten on demand, when the C method or function is used on them. my @a = 1, 2, 3; my @b = 4, 5; for flat @a, @b { } # 5 iterations say [flat @a, @b].elems; # 5 Another use of C is to flatten nested C structure. For example, the C (zip) operator produces a C of C: say (1, 2 Z 'a', 'b').perl; # ((1, "a"), (2, "b")).Seq A C can be used to flatten these, which is useful in conjunction with a C loop using a pointy block with multiple parameters: for flat 1, 2 Z 'a', 'b' -> $num, $letter { } Note that C respects C containers, and so: for flat $(1, 2) { } Will only do one iteration. Remember that an C stores everything in a C container, and so C on an C - short of weird tricks with binding - will always be the same as iterating over the C itself. In fact, the C method on an C returns identity. =head2 HyperSeq =head2 array =head1 The Iterator API and Implementation Types =head2 The Iterator role =head2 The IterationBuffer class =head1 Parallelism =head1 AUTHORS Jonathan Worthington =for vim:set expandtab sw=4: