=encoding utf8 =head1 TITLE Synopsis 6: Subroutines =head1 VERSION Created: 21 Mar 2003 Last Modified: 16 Oct 2015 Version: 169 This document summarizes Apocalypse 6, which covers subroutines and the new type system. =head1 Subroutines and other code objects C is the parent type of all keyword-declared code blocks. All routines are born with undefined values of C<$_>, C<$!>, and C<$/>, unless the routine declares them otherwise explicitly. A compilation unit, such as a module file or an C string, is also considered a routine, or you would not be able to reference C<$!> or C<$/> in them. Non-routine code Cs, declared with C<< -> >> or with bare curlies, are born only with C<$_>, which is aliased to its OUTER::<$_> unless bound as a parameter. A block generally uses the C<$!> and C<$/> defined by the innermost enclosing routine, unless C<$!> or C<$/> is explicitly declared in the block. A thunk is a piece of code that may not execute immediately, for instance because it is part of a conditional operator, or a default initialization of an attribute. It has no scope of its own, so any new variables defined in a thunk, will leak to the scope that they're in. Note however that any and all lazy constructs, whether block-based or thunk-based, such as gather or start or C<< ==> >> should declare their own C<$/> and C<$!> so that the user's values for those variables cannot be clobbered asynchronously. B (keyword: C) are non-inheritable routines with parameter lists. B (keyword: C) are inheritable routines which always have an associated object (known as their invocant) and belong to a particular kind or class. B (keyword: C) are non-inheritable methods, or subroutines masquerading as methods. They have an invocant and belong to a particular kind or class. B (keyword: C) are methods (of a grammar) that perform pattern matching. Their associated block has a special syntax (see Synopsis 5). (We also use the term "regex" for anonymous patterns of the traditional form.) B (keyword: C) are regexes that perform low-level non-backtracking (by default) pattern matching. B (keyword: C) are regexes that perform non-backtracking (by default) pattern matching (and also enable rules to do whitespace dwimmery). B (keyword: C or C) are routines or methods that are installed such that they will be called as part of the compilation process, and which can therefore take temporary control of the subsequent compilation to cheat in any of the ways that a compiler might cheat. =head1 Routine modifiers B (keyword: C) are routines that can have multiple variants that share the same name, selected by arity, types, or some other constraints. B (keyword: C) specify the commonalities (such as parameter names, fixity, and associativity) shared by all multis of that name in the scope of the C declaration. Abstractly, the C is a generic wrapper around the dispatch to the Cs. Each C is instantiated into an actual dispatcher for each scope that needs a different candidate list. B (keyword: C) routines do not share their short names with other routines. This is the default modifier for all routines, unless a C of the same name was already in scope. (For subs, the governing C must have been declared in the same file, so C declarations from the setting or other modules don't have this effect unless explicitly imported.) A modifier keyword may occur before the routine keyword in a named routine: only sub foo {...} proto sub foo {...} dispatch sub foo {...} # internal multi sub foo {...} only method bar {...} proto method bar {...} dispatch method bar {...} # internal multi method bar {...} If the routine keyword is omitted, it defaults to C. Modifier keywords cannot apply to anonymous routines. A C is a generic dispatcher, which any given scope with a unique candidate list will instantiate into a C routine. Hence a C is never called directly, much like a C can't be used as an instantiated object. When you call any routine (or method, or rule) that may have multiple candidates, the basic dispatcher is really only calling an "only" sub or method--but if there are multiple candidates, the "only" that will be found is really a dispatcher. This instantiated C is always called first (at least in the abstract--this can often be optimized away). In essence, a C is dispatched exactly like an C sub, but the C itself may delegate to any of the candidates it is "managing". It is the C's responsibility to first vet the arguments for all the candidates; any call that does not successfully bind the C's signature fails outright. (Its signature is a copy of one belonging to the C from which it was instantiated.) The C does not necessarily send the original capture to its candidates, however. Named arguments that bind to positionals in the C sig will become positionals for all subsequent calls to its managed multis. The dispatch then considers its list of managed candidates from the viewpoint of the caller or object, sorts them into some order, and dispatches them according to the rules of multiple dispatch as defined for each of the various dispatchers. In the case of multi subs, the candidate list is known at compile time. In the case of multi methods, it may be necessary to generate (or regenerate) the candidate list at run time, depending on what is known when about the inheritance tree. This default dispatch behavior is symbolized within the original C by a block containing of a single C<*> (that is, a "whatever"). Hence the typical C will simply have a body of C<{*}>. proto method bar {*} (We don't use C<...> for that because it would fail at run time, and the proto's instantiated C blocks are not stubs, but are intended to be executed.) Other statements may be inserted before and after the C<{*}> statement to capture control before or after the multi dispatch: proto foo ($a,$b) { say "Called with $a $b"; {*}; say "Returning"; } (That C is only good for Cs with side effects and no return value, since it returns the result of C, which might not be what you want. See below for how to fix that.) The syntactic form C<&foo> (without a modifying signature) can never refer to a C candidate or a generic C. It may only refer to the single C or C routine that would first be called by C. Individual Cs may be named by appending a signature to the noun form: C<&foo:($,$,*@)>. We used the term "managed" loosely above to indicate the set of Cs in question; the "managed set" is more accurately defined as the intersection of all the Cs in the C's downward scope with all the Cs that are visible to the caller's upward-looking scope. For ordinary routines this means looking down lexical scopes and looking up lexical scopes. [This is more or less how Cs already behave.] For methods this means looking down or up the inheritance tree; "managed set" in this case translates to the intersection of all methods in the C's class or its subclasses with all C methods visible to the object in its parent classes, that is, the parent classes of the object's actual type on whose behalf the method was called. [Note, this is a change from prior multi method semantics, which restricted multimethods to a single class; the old semantics is equivalent to defining a C in every class that has multimethods. The new way gives the user the ability to intermix Cs at different inheritance levels]. Also, the old semantics of C providing the most-default C body is hereby deprecated. Default Cs should be marked with "C". It is still possible to provide default behavior in the C, however, by using it as a wrapper: my proto sub foo (@args) { do-something-before(@args); {*} # call into the managed set, then come back do-something-after(@args); } Note that this returns the value of do-something-after(), not the C. There are two ways to get around that. Here's one way: my proto sub foo (@args) { ENTER do-something-before(@args); {*} LEAVE do-something-after(@args); } Alternately, you can spell out what C<{*}> is actually sugar for, which would be some dispatcher macro such as: my proto sub foo (|cap (@args)) { do-something-before(@args); my \retcap = MULTI-DISPATCH-CALLWITH(&?ROUTINE, cap); do-something-after(@args); return retcap; } which optimizes (we hope) to an inlined multidispatcher to locate all the candidates for these arguments (hopefully memoized), create the dynamic scope of a dispatch, start the dispatch, manage C and C semantics, and return the result of whichever C succeeded, if any. Which is why we have C<{*}> instead. Another common variant would be to propagate control to the outer/higher routine that would have been found if this one didn't exist: my proto method foo { {*}; UNDO nextsame; } # failover to super foo Note that, in addition to making Cs work similarly to each other, the new C semantics greatly simplify top-level dispatchers, which never have to worry about Cs, because Cs are always in the second half of the double dispatch (again, just in the abstract, since the first dispatch can often be optimized away, as if the C were inlined). So in the abstract, C only ever calls a single C/C routine, and we know which one it is at compile time. This is less of a shift for method dispatch, which already assumed that there is something like a single proto in each class that redispatches inside the class. Here the change is that multi-method dispatcher needs to look more widely for its candidates than the current class. But note that our semantics were inconsistent before, insofar as regex methods already had to look for this larger managed set in order to do transitive LTM correctly. Now the semantics of normal method Cs and regex Cs are nearly identical, apart from the fact that regex candidate lists naturally have fancier tiebreaking rules involving longest token matching. A C must be generated for every scope that contains one or more C declaration. This is done by searching backwards and outwards (or up the inheritance chain for methods) for a C to instantiate. If no such C is found, a "most generic" C will be generated, something like: proto sub foo (*@, *%) {*} proto method foo (*@, *%) {*} Obviously, no named-to-positional remapping can be done in this case. [Conjecture: we could instead autogen a more specific signature for each such autogenerated C once we know its exact candidate set, such that consistent use of positional parameter names is rewarded with positional names in the generated signature, which could remap named parameters.] =head2 Named subroutines The general syntax for named subroutines is any of: my RETTYPE sub NAME ( PARAMS ) TRAITS {...} # lexical only sub NAME ( PARAMS ) TRAITS {...} # same as "my" our RETTYPE sub NAME ( PARAMS ) TRAITS {...} # package-scoped The return type may also be put inside the parentheses: sub NAME (PARAMS --> RETTYPE) {...} Unlike in Perl 5, named subroutines are considered expressions, so this is valid Perl 6: my @subs = (sub foo { ... }, sub bar { ... }); Another difference is that subroutines default to C scope rather than C scope. However, subroutine dispatch searches lexical scopes outward, and subroutines are also allowed to be I after their use, so you won't notice this much. A subroutine that is not declared yet may be called using parentheses around the arguments, in the absence of parentheses, the subroutine call is assumed to take multiple arguments in the form of a list operator. =head2 Anonymous subroutines The general syntax for anonymous subroutines is: sub ( PARAMS ) TRAITS {...} But one can also use the C scope modifier to introduce the return type first: anon RETTYPE sub ( PARAMS ) TRAITS {...} When an anonymous subroutine will be assigned to a scalar variable, the variable can be declared with the signature of the routines that will be assigned to it: my $grammar_factory:(Str, int, int --> Grammar); $grammar_factory = sub (Str $name, int $n, int $x --> Grammar) { ... }; Covariance allows a routine (that has a more derived return type than what is defined in the scalar's signature) to be assigned to that scalar. Contravariance allows a routine (with parameter types that are less derived than those in the scalar's signature) to be assigned to that scalar. The compiler may choose to enforce (by type-checking) such assignments at compile-time, if possible. Such type annotations are intended to help the compiler optimize code to the extent such annotations are included and/or to the extent they aid in type inference. The same signature can be used to mark the type of a closure parameter to another subroutine: sub (int $n, &g_fact:(Str, int, int --> Grammar) --> Str) { ... } B is the name for a compile-time (C) property. See L<"Properties and traits">. =head2 Perl5ish subroutine declarations You can declare a sub without parameter list, as in Perl 5: sub foo {...} This is equivalent to one of: sub foo () {...} sub foo (*@_) {...} sub foo (*%_) {...} sub foo (*@_, *%_) {...} depending on whether either or both of those variables are used in the body of the routine. Positional arguments implicitly come in via the C<@_> array, but unlike in Perl 5 they are C aliases to actual arguments: sub say { print qq{"@_[]"\n}; } # args appear in @_ sub cap { $_ = uc $_ for @_ } # Error: elements of @_ are read-only Also unlike in Perl 5, Perl 6 has true named arguments, which come in via C<%_> instead of C<@_>. If you need to modify the elements of C<@_> or C<%_>, declare the array or hash explicitly with the C trait: sub swap (*@_ is rw, *%_ is rw) { @_[0,1] = @_[1,0]; %_ = "Q:S"; } Note: the C container trait is automatically distributed to the individual elements by the slurpy star even though there is no actual array or hash passed in. More precisely, the slurpy star means the declared formal parameter is I considered readonly; only its elements are. See L below. Note also that if the sub's block contains placeholder variables (such as C<$^foo> or C<$:bar>), those are considered to be formal parameters already, so in that case C<@_> or C<%_> fill the role of sopping up unmatched arguments. That is, if those containers are explicitly mentioned within the body, they are added as slurpy parameters. This allows you to easily customize your error message on unrecognized parameters. If they are not mentioned in the body, they are not added to the signature, and normal dispatch rules will simply fail if the signature cannot be bound. =head2 Blocks Raw blocks are also executable code structures in Perl 6. Every block defines an object of type C (which C), which may either be executed immediately or passed on as a C object. How a block is parsed is context dependent. A bare block where an operator is expected terminates the current expression and will presumably be parsed as a block by the current statement-level construct, such as an C or C. (If no statement construct is looking for a block there, it's a syntax error.) This form of bare block requires leading whitespace because a bare block where a postfix is expected is treated as a hash subscript. A bare block where a term is expected merely produces a C object. If the term bare block occurs in a list, it is considered the final element of that list unless followed immediately by a comma or colon (intervening C<\h*> or "unspace" is allowed). =head2 "Pointy blocks" Semantically the arrow operator C<< -> >> is almost a synonym for the C keyword as used to declare an anonymous subroutine, insofar as it allows you to declare a signature for a block of code. However, the parameter list of a pointy block does not require parentheses, and a pointy block may not be given traits. In most respects, though, a pointy block is treated more like a bare block than like an official subroutine. Syntactically, a pointy block may be used anywhere a bare block could be used: my $sq = -> $val { $val**2 }; say $sq(10); # 100 my @list = 1..3; for @list -> $elem { say $elem; # prints "1\n2\n3\n" } It also behaves like a block with respect to control exceptions. If you C from within a pointy block, the block is transparent to the return; it will return from the innermost enclosing C or C (et al.), not from the block itself. It is referenced by C<&?BLOCK>, not C<&?ROUTINE>. A normal pointy block's parameters default to C, just like parameters to a normal sub declaration. However, the double-pointy variant defaults parameters to C: for @list <-> $elem { $elem++; } This form applies C to all the arguments: for @kv <-> $key, $value { $key ~= ".jpg"; $value *= 2 if $key ~~ :e; } =head2 Stub declarations To predeclare a subroutine without actually defining it, use a "stub block": sub foo {...} # Yes, those three dots are part of the actual syntax The old Perl 5 form: sub foo; is a compile-time error in Perl 6 (because it would imply that the body of the subroutine extends from that statement to the end of the file, as C and C declarations do). The only allowed use of the semicolon form is to declare a C
sub--see L below. (And this form requires the C declarator in front.) Redefining a stub subroutine does not produce an error, but redefining an already-defined subroutine does. If you wish to redefine a defined sub, you must explicitly use the "C" declarator. (The compiler may refuse to do this if it has already committed to the previous definition.) The C<...> is the "yadayadayada" operator, which is executable but returns a failure. You can also use C to fail with a warning (a lazy one, to be issued only if the value is actually used), or C to always die. These also officially define stub blocks. Any of these yada operators will be taken as a stub if used as the main operator of the first statement in the block. (Statement modifiers are allowed on that statement.) The yada operators differ from their respective named functions in that they all default to a message such as: "Unimplemented stub of sub foo was executed". It has been argued that C<...> as literal syntax is confusing when you might also want to use it for metasyntax within a document. Generally this is not an issue in context; it's never an issue in the program itself, and the few places where it could be an issue in the documentation, a comment will serve to clarify the intent, as above. The rest of the time, it doesn't really matter whether the reader takes C<...> as literal or not, since the purpose of C<...> is to indicate that something is missing whichever way you take it. =head2 Globally scoped subroutines Subroutines and variables can be declared in the global namespace (or any package in the global namespace), and are thereafter visible everywhere in the program via the GLOBAL package (or one of its subpackages). They may be made directly visible by importation, but may not otherwise be called with a bare identifier, since subroutine dispatch only looks in lexical scopes. Global subroutines and variables are normally referred to by prefixing their identifiers with the C<*> twigil, to allow dynamically scoped overrides. GLOBAL::<$next_id> = 0; sub GLOBAL::saith($text) { say "Yea verily, $text" } module A { my $next_id = 2; # hides any global or package $next_id &*saith($next_id); # print the lexical $next_id; &*saith($*next_id); # print the dynamic $next_id; } To disallow dynamic overrides, you must access the globals directly: GLOBAL::saith($GLOBAL::next_id); The fact that this is verbose is construed to be a feature. Alternately, you may play aliasing tricks like this: module B { import GLOBAL <&saith $next_id>; saith($next_id); # Unambiguously the global definitions } Despite the fact that subroutine dispatch only looks in lexical scopes, you can always call a package subroutine directly if there's a lexical alias to it, as the C declarator does: unit module C; our sub saith($text) { say "Yea verily, $text" } saith("I do!") # okay C::saith("I do!") # also okay =head2 Dynamically scoped subroutines Similarly, you may define dynamically scoped subroutines: my sub myfunc ($x) is dynamic { ... } my sub &*myfunc ($x) { ... } # same thing This may then be invoked via the syntax for dynamic variables: &*myfunc(42); =head2 Lvalue subroutines Lvalue subroutines return a "proxy" object that can be assigned to. It's known as a proxy because the object usually represents the purpose or outcome of the subroutine call. Subroutines are specified as being lvalue using the C trait. An lvalue subroutine may return a variable: my $lastval; sub lastval () is rw { return $lastval } or the result of some nested call to an lvalue subroutine: sub prevval () is rw { return lastval() } or a specially tied proxy object, with suitably programmed C and C methods: sub checklastval ($passwd) is rw { return Proxy.new: FETCH => method { return lastval(); }, STORE => method ($val) { die unless check($passwd); lastval() = $val; }; } Other methods may be defined for specialized purposes such as temporizing the value of the proxy. =head2 Raw subroutines If the subroutine doesn't care whether the returned value is a container or not, it may declare this with C, to indicate that the return value should be returned raw, without attempting any decontainerization. This can be useful for routines that wish to process mixed containers and non-containers without distinction. =head2 Operator overloading Operators are just subroutines with special names and scoping. An operator name consists of a grammatical category name followed by a single colon followed by an operator name specified as if it were one or more strings. So any of these indicates the same binary addition operator: infix:<+> infix:«+» infix:<<+>> infix:['+'] infix:["+"] Use the C<&> sigil just as you would on ordinary subs. Unary operators are defined as C or C: sub prefix: ($operand) {...} sub postfix: ($operand) {...} Binary operators are defined as C: sub infix: ($leftop, $rightop) {...} Bracketing operators are defined as C where a term is expected or C where a postfix is expected. A two-element slice containing the leading and trailing delimiters is the name of the operator. sub circumfix: ($contents) {...} sub circumfix:['LEFTDELIM','RIGHTDELIM'] ($contents) {...} Contrary to Apocalypse 6, there is no longer any rule about splitting an even number of characters. You must use a two-element slice. Such names are canonicalized to a single form within the symbol table, so you must use the canonical name if you wish to subscript the symbol table directly (as in C<< PKG::{'infix:<+>'} >>). Otherwise any form will do. (Symbolic references do not count as direct subscripts since they go through a parsing process.) The canonical form always uses angle brackets and a single space between slice elements. The elements are escaped on brackets, so C<< PKG::circumfix:['<','>'] >> is canonicalized to C<<< PKG::{'circumfix:<\< \>>'} >>>, and decanonicalizing may always be done left-to-right. Operator names can be any sequence of non-whitespace characters including Unicode characters. For example: sub infix:<(c)> ($text, $owner) { return $text but Copyright($owner) } method prefix:<±> (Num $x --> Num) { return +$x | -$x } multi sub postfix: (Int $n) { $n < 2 ?? 1 !! $n*($n-1)! } my $document = $text (c) $me; my $tolerance = ±7!; Whitespace may never be part of the name (except as separator within a C<< <...> >> or C<«...»> slice subscript, as in the example above). A null operator name does not define a null or whitespace operator, but a default matching subrule for that syntactic category, which is useful when there is no fixed string that can be recognized, such as tokens beginning with digits. Such an operator I supply an C trait. The Perl grammar uses a default subrule for the C<:1st>, C<:2nd>, C<:3rd>, etc. regex modifiers, something like this: sub regex_mod_external:<> ($x) is parsed(token { \d+[st|nd|rd|th] }) {...} Such default rules are attempted in the order declared. (They always follow any rules with a known prefix, by the longest-token-first rule.) Although the name of an operator can be installed into any package or lexical namespace, the syntactic effects of an operator declaration are always lexically scoped. Operators other than the standard ones should not be installed into the C namespace. Always use exportation to make non-standard syntax available to other scopes. =head1 Calling conventions In Perl 6 culture, we distinguish the terms I and I; a parameter is the formal name that will attach to an incoming argument during the course of execution, while an argument is the actual value that will be bound to the formal parameter. The process of attaching these values (arguments) to their temporary names (parameters) is known as I. (Some C.S. literature uses the terms "formal argument" and "actual argument" for these two concepts, but here we try to avoid using the term "argument" for formal parameters.) Various Perl 6 code objects (either routines or blocks) may be declared with parameter lists, either explicitly by use of a signature declaration, or implicitly by use of placeholder variables within the body of code. (Use of both for the same code block is not allowed.) =head1 Signatures A signature consists of a list of zero or more parameter declarations, separated by commas. (These are described below.) Signatures are usually found inside parentheses (within routine declarations), or after an arrow C<< -> >> (within block declarations), but other forms are possible for specialized cases. A signature may also indicate what the code returns, either generally or specifically. This is indicated by placing the return specification after a C<< --> >> token. If the return specification names a type (that is, an indefinite object), then a successful call to the code must always return a value of that type. If the return specification returns a definite object, then that value is always returned from a successful call. (For this purpose the C value is treated as definite.) An unsuccessful call may always call C to return a C object regardless of the return specification. Ordinarily, if the return is specified as a type (or is unspecified), the final statement of the block will be evaluated for its return value, and this will be the return value of the code block as a whole. (It must conform to the return type specification, if provided.) An explicit C may be used instead to evaluate the C's arguments as the code block's return value, and leave the code block immediately, short-circuiting the rest of the block's execution. If the return specification is a definite immutable value (or C) rather than a type, then all top-level statements in the code block are evaluated only for their side effects; in other words, all of the statements are evaluated in sink context, including the final statement. An explicit C statement is allowed, but only in argumentless form, to indicate that execution is to be short-circuited and the I return value is to be returned. No other value may be returned in its place. If the return specification is definite but not an immutable value, then it must be a mutable container (variable) of some sort. The container variable is declared as any other parameter would be, but no incoming argument will ever be bound to it. It is permitted to supply a default value, in which case the return variable will always be initialized with that default value. Like other variables declared in a signature, a new variable will B be created; any existing variable will automatically be shadowed. If you want to have the return variable reference an existing variable, you must resort to C<< OUTER:: >> hackery. As with value return, all top-level statements are evaluated in sink context, and only argumentless C is allowed, indicating that the current contents of the return value should be returned. Note that the default return policy assumes functional semantics, with the result that a loop as the final statement would be evaluated as a map, which may surprise some people. An implementation is allowed to warn when it finds such a loop; this warning may be suppressed by supplying a return specification, which will also determine whether the final loop statement is evaluated in sink context. =head1 Parameters and arguments By default, all Scalar parameters are readonly. When a value is passed, it is simply directly bound to the parameter name. When a scalar is passed, the value held in the scalar is obtained. It is then assigned into another scalar container that will, from that point on, be readonly (that is, no further assignments can be made to it). Implementations may, as an optimization, also simply bind the value obtained from a passed Scalar if they can prove it is not Iterable (and therefore elimination of the container would not affect flattening behavior). Array and hash parameters are simply bound "as is". (Conjectural: future versions of Perl 6 may do static analysis and forbid assignments to array and hash parameters that can be caught by it. This will, however, only happen with the appropriate "use" declaration to opt in to that language version.) To allow modification, use the C trait. This requires a mutable object or container as an argument (or some kind of type object that can be converted to a mutable object, such as might be returned by an array or hash that knows how to autovivify new elements). Otherwise the signature fails to bind, and this candidate routine cannot be considered for servicing this particular call. (Other multi candidates, if any, may succeed if they don't require C for this parameter.) In any case, failure to bind does not by itself cause an exception to be thrown; that is completely up to the dispatcher. To pass-by-copy, use the C trait. An object container will be cloned whether or not the original is mutable, while an (immutable) value will be copied into a suitably mutable container. The parameter may bind to any argument that meets the other typological constraints of the parameter. If you have a readonly scalar parameter C<$ro>, it may never be passed on to a C scalar parameter of a subcall, since the rw-ness was already eliminated. A C<$ro> parameter may also not be rebound; trying to do so results in a compile time error. Aliases of C<$ro> are also readonly, whether generated explicitly with C<:=> or implicitly within a C object (which are themselves immutable). Also, C<$ro> may not be returned from an lvalue subroutine or method. Parameters may be required or optional. They may be passed by position, or by name. Individual parameters may confer an item or list context on their corresponding arguments, but unlike in Perl 5, this is decided lazily at parameter binding time. Arguments destined for required positional parameters must come before those bound to optional positional parameters. Arguments destined for named parameters may come before and/or after the positional parameters. (To avoid confusion it is highly recommended that all positional parameters be kept contiguous in the call syntax, but this is not enforced, and custom arg list processors are certainly possible on those arguments that are bound to a final slurpy or arglist variable.) A signature containing a name collision is considered a compile time error. A name collision can occur between positional parameters, between named parameters, or between a positional parameter and a named one. The sigil is not considered in such a comparison, except in the case of two positional parameters -- in other words, a signature in which two or more parameters are identical except for the sigil is still OK (but you won't be able to pass values by that name). :($a, $a) # wrong, two $a :($a, @a) # OK (but don't do that) :($a, :a($b)) # wrong, one $a from positional, one $a from named parameter :($a, :a(@b)) # wrong, same :(:$a, :@a) # wrong, can only have one named parameter "a" =head2 Named arguments Named arguments are recognized syntactically at the "comma" level. Since parameters are identified using identifiers, the recognized syntaxes are those where the identifier in question is obvious. You may use either the adverbial form, C<:name($value)>, or the autoquoted arrow form, C<< name => $value >>. These must occur at the top "comma" level, and no other forms are taken as named pairs by default. Pairs intended as positional arguments rather than named arguments may be indicated by extra parens or by explicitly quoting the key to suppress autoquoting: doit :when,1,2,3; # always a named arg doit (:when),1,2,3; # always a positional arg doit when => 'now',1,2,3; # always a named arg doit (when => 'now'),1,2,3; # always a positional arg doit 'when' => 'now',1,2,3; # always a positional arg Only bare keys with valid identifier names are recognized as named arguments: doit when => 'now'; # always a named arg doit 'when' => 'now'; # always a positional arg doit 123 => 'now'; # always a positional arg doit :123; # always a positional arg Going the other way, pairs intended as named arguments that don't look like pairs must be introduced with the C<|> prefix operator: $pair = :when; doit $pair,1,2,3; # always a positional arg doit |$pair,1,2,3; # always a named arg doit |get_pair(),1,2,3; # always a named arg doit |('when' => 'now'),1,2,3; # always a named arg Note the parens are necessary on the last one due to precedence. Likewise, if you wish to pass a hash and have its entries treated as named arguments, you must dereference it with a C<|>: %pairs = (:when, :what); doit %pairs,1,2,3; # always a positional arg doit |%pairs,1,2,3; # always named args doit |%(get_pair()),1,2,3; # always a named arg doit |%('when' => 'now'),1,2,3; # always a named arg Variables with a C<:> prefix in rvalue context autogenerate pairs, so you can also say this: $when = 'now'; doit $when,1,2,3; # always a positional arg of 'now' doit :$when,1,2,3; # always a named arg of :when In other words C<:$when> is shorthand for C<:when($when)>. This works for any sigil: :$what :what($what) :@what :what(@what) :%what :what(%what) :&what :what(&what) Ordinary hash notation will just pass the value of the hash entry as a positional argument regardless of whether it is a pair or not. To pass both key and value out of hash as a positional pair, use C<:p> instead: doit %hash:p,1,2,3; doit %hash{'b'}:p,1,2,3; The C<:p> stands for "pairs", not "positional"--the C<:p> adverb may be placed on any C access subscript to make it mean "pairs" instead of "values". If you want the pair (or pairs) to be interpreted as named arguments, you may do so by prefixing with the C<< prefix:<|> >> operator: doit |(%hash:p),1,2,3; doit |(%hash{'b'}:p),1,2,3; (The parens are required to keep the C<:p> adverb from attaching to C<< prefix:<|> >> operator.) C constructors are recognized syntactically at the call level and put into the named slot of the C structure. Hence they may be bound to positionals only by name, not as ordinary positional C objects. Leftover named arguments can be slurped into a slurpy hash. Because named and positional arguments can be freely mixed, the programmer always needs to disambiguate pairs literals from named arguments with parentheses or quotes: # Named argument "a" push @array, 1, 2, :a; # Pair object (a=>'b') push @array, 1, 2, (:a); push @array, 1, 2, 'a' => 'b'; Perl 6 allows multiple same-named arguments, and records the relative order of arguments with the same name. When there are more than one argument, the C<@> sigil in the parameter list causes the arguments to be appended: sub fun (Int :@x) { ... } fun( x => 1, x => 2 ); # @x := (1, 2) fun( x => (1, 2), x => (3, 4) ); # @x := (1, 2, 3, 4) Other sigils bind only to the I argument with that name: sub fun (Int :$x) { ... } fun( x => 1, x => 2 ); # $x := 2 fun( x => (1, 2), x => (3, 4) ); # $x := (3, 4) This means a hash holding default values must come I known named parameters, similar to how hash constructors work: # Allow "x" and "y" in %defaults to be overridden f( |%defaults, x => 1, y => 2 ); =head2 Invocant parameters A method invocant may be specified as the first parameter in the parameter list, with a colon (rather than a comma) immediately after it: method get_name ($self:) {...} method set_name ($_: $newname) {...} The corresponding argument (the invocant) is evaluated in item context and is passed as the left operand of the method call operator: print $obj.get_name(); $obj.set_name("Sam"); The invocant is actually stored as the first positional argument of a C object. It is special only to the dispatcher, otherwise it's just a normal positional argument. Single-dispatch semantics may also be requested by using the indirect object syntax, with a colon after the invocant argument. The colon is just a special form of the comma, and has the same precedence: set_name $obj: "Sam"; $obj.set_name("Sam"); # same as the above An invocant is the topic of the corresponding method if that formal parameter is declared with the name C<$_>. If you have a call of the form: foo(|$capture) the compiler must defer the decision on whether to treat it as a method or function dispatch based on whether the supplied C's first argument is marked as an invocant. For ordinary calls this can always be determined at compile time, however. =head2 Parameters with type constraints Parameters can be constraint to other types than the default simply by using the type name in from of the parameter: sub double(Numeric $x) { 2 * $x } If no explicit type constraint is given, it defaults to the type of the surrounding package for method invocants, and to C everywhere else. A bare C<:D>, C<:U> or C<:_> instead of a type constraint limits the default type to definite objects (aka instances), undefined objects (aka type objects), or any object, respectively. The default still applies, so in class Con { method man(:U: :D $x) } the signature is equivalent to C<(Con:U: Any:D $x)>. =head2 Longname parameters A routine marked with C can mark part of its parameters to be considered in the multi dispatch. These are called I; see S12 for more about the semantics of multiple dispatch. You can choose part of a C's parameters to be its longname, by putting a double semicolon after the last one: multi sub handle_event ($window, $event;; $mode) {...} multi method set_name ($self: $name;; $nick) {...} A parameter list may have at most one double semicolon; parameters after it are never considered for multiple dispatch (except of course that they can still "veto" if their number or types mismatch). [Conjecture: It might be possible for a routine to advertise multiple long names, delimited by single semicolons. See S12 for details.] If the parameter list for a C contains no semicolons to delimit the list of important parameters, then all positional parameters are considered important. If it's a C or C, an additional implicit unnamed C invocant is added to the signature list unless the first parameter is explicitly marked with a colon. =head2 Required parameters Required parameters are specified at the start of a subroutine's parameter list: sub numcmp ($x, $y) { return $x <=> $y } Required parameters may optionally be declared with a trailing C, though that's already the default for positional parameters: sub numcmp ($x!, $y!) { return $x <=> $y } Not passing all of the required arguments to a normal subroutine is a fatal error. Passing a named argument that cannot be bound to a normal subroutine is also a fatal error. (Methods are different.) The number of required parameters a subroutine has can be determined by calling its C<.arity> method: $args_required = &foo.arity; =head2 Optional parameters Optional positional parameters are specified after all the required parameters and each is marked with a C after the parameter: sub my_substr ($str, $from?, $len?) {...} Alternately, optional fields may be marked by supplying a default value. The C<=> sign introduces a default value: sub my_substr ($str, $from = 0, $len = Inf) {...} Default values can be calculated at run-time. They may even use the values of preceding parameters: sub xml_tag ($tag, $endtag = matching_tag($tag) ) {...} Arguments that correspond to optional parameters are evaluated in item context. They can be omitted, passed positionally, or passed by name: my_substr("foobar"); # $from is 0, $len is infinite my_substr("foobar",1); # $from is 1, $len is infinite my_substr("foobar",1,3); # $from is 1, $len is 3 my_substr("foobar",len=>3); # $from is 0, $len is 3 Missing optional arguments default to their default values, or to an undefined value if they have no default. (A supplied argument that is undefined is not considered to be missing, and hence does not trigger the default. Use C within the body for that.) You may check whether an optional parameter was bound to anything by calling C. =head2 Named parameters Named-only parameters follow any required or optional parameters in the signature. They are marked by a prefix C<:>: sub formalize($text, :$case, :$justify) {...} This is actually shorthand for: sub formalize($text, :case($case), :justify($justify)) {...} If the longhand form is used, the label name and variable name can be different: sub formalize($text, :case($required_case), :justify($justification)) {...} so that you can use more descriptive internal parameter names without imposing inconveniently long external labels on named arguments. Multiple name wrappings may be given; this allows you to give both a short and a long external name: sub globalize (:g(:global($gl))) {...} Or equivalently: sub globalize (:g(:$global)) {...} Arguments that correspond to named parameters are evaluated in item context. They can only be passed by name, so it doesn't matter what order you pass them in: $formal = formalize($title, case=>'upper'); $formal = formalize($title, justify=>'left'); $formal = formalize($title, :justify, :case); See S02 for the correspondence between adverbial form and arrow notation. While named and position arguments may be intermixed, it is suggested that you keep all the positionals in one place for clarity unless you have a good reason not to. This is likely bad style: $formal = formalize(:justify<right>, $title, :case<title>, $date); Named parameters are optional unless marked with a following C<!>. Default values for optional named parameters are defined in the same way as for positional parameters, but may depend only on existing values, including the values of parameters that have already been bound. Named optional parameters default to C<Nil> (that is, they set the default of the container) if they have no default. Named required parameters fail unless an argument pair of that name is supplied. Bindings logically happen in declaration order, not call order, so any default may reliably depend on formal parameters to its left in the signature. =head2 List parameters List parameters capture a variable length list of data. They're used in subroutines like C<print>, where the number of arguments needs to be flexible. They're also called "variadic parameters", because they take a I<variable> number of arguments. But generally we call them "slurpy" parameters because they slurp up arguments. Slurpy parameters follow any required or optional parameters. They are marked by a C<*> before the parameter: sub duplicate($n, *%flag, *@data) {...} Named arguments are bound to the slurpy hash (C<*%flag> in the above example). Such arguments are evaluated in item context. Any remaining variadic arguments at the end of the argument list are bound to the slurpy array (C<*@data> above) and are evaluated in list context. For example: duplicate(3, reverse => 1, collate => 0, 2, 3, 5, 7, 11, 14); duplicate(3, :reverse, :!collate, 2, 3, 5, 7, 11, 14); # same # The @data parameter receives [2, 3, 5, 7, 11, 14] # The %flag parameter receives { reverse => 1, collate => 0 } Slurpy scalar parameters capture what would otherwise be the first elements of the variadic array: sub head(*$head, *@tail) { return $head } sub neck(*$head, *$neck, *@tail) { return $neck } sub tail(*$head, *@tail) { return @tail } head(1, 2, 3, 4, 5); # $head parameter receives 1 # @tail parameter receives [2, 3, 4, 5] neck(1, 2, 3, 4, 5); # $head parameter receives 1 # $neck parameter receives 2 # @tail parameter receives [3, 4, 5] Slurpy scalars still impose list context on their arguments. Single slurpy parameters are treated lazily -- the list is only flattened into an array when individual elements are actually accessed: @fromtwo = tail(1..Inf); # @fromtwo contains a lazy [2..Inf] [Conjecture: However, if you use two or more slurpy arrays in a signature, the list is instead evaluated in hyper context, and will be asked to split itself into the number of lists corresponding to the number of slurpies so declared. A non-hyperable list will return failure for this splitting operation, so the signature should only bind on parallelizable list operations. Likewise a list that is "too short to split" fails to bind, so a separate signature may match empty lists, and perhaps singletons, if we define "too short" that way.] You can't bind to the name of a slurpy parameter: the name is just there so you can refer to it within the body. sub foo(*%flag, *@data) {...} foo(:flag{ a => 1 }, :data[ 1, 2, 3 ]); # %flag has elements (flag => (a => 1)) and (data => [1,2,3]) # @data has nothing [Conjecture: a future Perl 6 version will allow typed slurpy parameters, which will validate the types of the passed arguments.] =head2 Slurpy block It's also possible to declare a slurpy block: C<*&block>. It slurps up any nameless block, specified by C<{...}>, at either the current positional location or the end of the syntactic list. Put it first if you want the option of putting a block either first or last in the arguments. Put it last if you want to force it to come in as the last argument. =head2 Argument list binding The underlying C<Capture> object may be bound to a single name marked with a C<|>. sub bar ($a,$b,$c,:$mice) { say $mice } sub foo (|args) { say args.perl; &bar.nextwith(|args); } This prints: foo 1,2,3,:mice<blind>; # says "\(1,2,3,:mice<blind>)" then "blind" As demonstrated above, the capture may be interpolated into another call's arguments. (The C<|> prefix is described below.) Use of C<nextwith> allows the routine to be called without introducing an official C<CALLER> frame. For more see "Wrapping" below. The C<|> parameter takes a snapshot of the current binding state, but does not consume any arguments from it. It is allowed to have more parameters within the signature: sub compare (|args, Num $x, Num $y --> Bool) { ... } For all normal declarative purposes (invocants and multiple dispatch types, for instance), capture parameters are ignored. method addto (|args, $self: @x) { trace(args); $self += [+] @x } The extra signature is not required for non-C<multi>s since there can only be one candidate, but for multiple dispatch the extra signature is required at least for its types, or the declaration would not know what signature to match against. multi foo (|args, Int, Bool?, *@, *%)) { reallyintfoo(args) } multi foo (|args, Str, Bool?, *@, *%)) { reallystrfoo(args) } =head2 Term binding When you bind an argument to a sigiled variable, it enforces the contract of that sigil, but sometimes you don't want that. It is possible to bind an argument to a simple name instead, which represents that argument in its rawest form, with no commitment to structure or mutability. sub foo (\x, \y) { x = y; } # might or might not succeed A C<\> parameter effectively declares a new term in the language for the rest of the current scope, so when you use that term, it is not parsed as a list operator, so it will not look for any subsequent arguments. sub foo (\x) { x 42; } # syntax error; two terms in a row Raw parameters make it relatively easy to program in a "sigilless" style, if you desire: sub say-sins (\angles) { for angles -> \𝜃 { say sin 𝜃 } } or my \𝑖 = some-integer; say 𝑖 + 2; Note how C<𝑖> would be misinterpreted if it treated C<+ 2> as an argument, but since it's a simple term, it doesn't. The term does act like a function call in one way, however. Since it returns a raw value, whether it flattens or not depends on the context in which it is eventually used. You can use C<< prefix:<|> >> to force flattening into an outer argument list, if needed. It is possible to alias to a non-identifier by using the C<term> syntactic category: my \term:<∞> = Inf; We sometimes call these "sigilless variables", but they differ from normal variables in one significant way. Unlike a variable binding, a term binding is fixed for the rest of the lexical scope. The term may only be rebound be re-entering the scope. This is useful to enforce a programming style known as SSA, Single Static Assignment (really binding), which, among other benefits, gives the optimizer more guarantees about which symbols can be considered "temporarily immutable". =head2 Flattening argument lists The unary C<|> operator casts its argument to a C<Capture> object, then splices that capture into the argument list it occurs in. To get the same effect on multiple arguments you can use the C<< |« >> hyperoperator. C<Pair> and C<Hash> become named arguments: |(x=>1); # Pair, becomes \(x=>1) |{x=>1, y=>2}; # Hash, becomes \(x=>1, y=>2) Anything else that is C<Iterable> is simply turned into positional arguments: |(1,2,3); # List, becomes \(1,2,3) |(1..3); # Range, becomes \(1,2,3) |(1..2, 3); # List, becomes \(1..2,3) |([x=>1, x=>2]); # List (from an Array), becomes \((x=>1), (x=>2)) For example: sub foo($x, $y, $z) {...} # expects three scalars @onetothree = 1..3; # array stores three scalars foo(1,2,3); # okay: three args found foo(@onetothree); # error: only one arg foo(|@onetothree); # okay: @onetothree flattened to three args The C<|> operator flattens lazily -- the array is flattened only if flattening is actually required within the subroutine. To flatten before the list is even passed into the subroutine, use the C<flat> list operator: foo(|flat 1,2,3 Z 4,5,6); # zip list flattened before interpolation foo |(1,2,3 Z 4,5,6).flat # same thing =head2 Multidimensional argument list binding Some functions take more than one list of positional and/or named arguments, that they wish not to be flattened into one list. For instance, C<zip()> wants to iterate several lists in parallel, while array and hash subscripts want to process a multidimensional slice. The set of underlying argument lists may be bound to a single array parameter declared with a double C<**> marker: sub foo (**@slice) { ... } Note that this is different from sub foo (|slice) { ... } insofar as C<|slice> is bound to a single argument-list object that makes no commitment to processing its structure (and maybe doesn't even know its own structure yet), while C<**@slice> has to create an array that binds the incoming dimensional lists to the array's dimensions, and make that commitment visible to the rest of the scope via the sigil so that constructs expecting multidimensional lists know that multidimensionality is the intention. It is allowed to specify a return type: sub foo (**@slice --> Num) { ... } The invocant does not participate in multi-dimensional argument lists, so C<self> is not present in the C<**@slice> below: method foo (**@slice) { ... } The C<**> marker is just a variant of the C<*> marker that ends up requesting the arguments at the comma-separated syntax level, rather than requesting individual elements as the flattening C<*> does. =head2 Zero-dimensional argument list If you call a function without parens and supply no arguments, the argument list becomes a zero-dimensional slice. It differs from C<\()> in several ways: sub foo (**@slice) {...} foo; # +@slice == 0 foo(); # +@slice == 1 sub bar (|args = \(1,2,3)) {...} bar; # $args === \(1,2,3) bar(); # $args === \() =head2 One-argument slurpy binding If you use a C<+> character to indicate a slurpy parameter, then it is assumed that the rest of the positional arguments are intended to be iterated with "single argument" semantics. That is, we will take whatever is passed in and iterate the top level of that. If the top level is a comma list, then it works just as a C<**> slurpy. If the top level is a single argument, then that will be iterated. sub foo (+args) {...} foo; # +args == 0 foo(); # +args == 0 foo((1,2),3,4) # +args == 3 foo((1,2)) # +args == 2 foo($(1,2)) # +args == 1 foo((1,2).item) # +args == 1 foo([1,2],3,4) # +args == 3 foo([1,2]) # +args == 2 foo($[1,2]) # +args == 1 foo([1,2].item) # +args == 1 Basically, this slurpy doesn't care whether the list you hand it was made by syntactic commas or by some other means. The other slurpies treat syntactic commas as dominant even when there aren't any! =head2 Feed operators The variadic list of a subroutine call can be passed in separately from the normal argument list, by using either of the I<feed> operators: C<< <== >> or C<< ==> >>. Syntactically, feed operators expect to find a statement on either end. Any statement can occur on the source end; however not all statements are suitable for use on the sink end of a feed. Each operator expects to find a call to a variadic receiver on its "sharp" end, and a list of values on its "blunt" end: grep { $_ % 2 } <== @data; @data ==> grep { $_ % 2 }; It binds the (potentially lazy) list from the blunt end to the slurpy parameter(s) of the receiver on the sharp end. In the case of a receiver that is a variadic function, the feed is received as part of its slurpy list. So both of the calls above are equivalent to: grep { $_ % 2 }, @data; Note that all such feeds (and indeed all lazy argument lists) supply an implicit promise that the code producing the lists may execute in parallel with the code receiving the lists. (Feeds, hyperops, and junctions all have this promise of parallelizability in common, but differ in interface. Code which violates these promises is erroneous, and will produce undefined results when parallelized.) However, feeds go a bit further than ordinary lazy lists in enforcing the parallel discipline: they explicitly treat the blunt end as a cloned closure that starts a subthread (presumably cooperative). The only variables shared by the inner scope with the outer scope are those lexical variables declared in the outer scope that are visible at the time the closure is cloned and the subthread spawned. Use of such shared variables will automatically be subject to transactional protection (and associated overhead). Package variables are not cloned unless predeclared as lexical names with C<our>. Variables declared within the blunt end are not visible outside, and in fact it is illegal to declare a lexical on the blunt end that is not enclosed in curlies somehow. Because feeds are defined as lazy pipes, a chain of feeds may not begin and end with the same array without some kind of eager sequence point. That is, this isn't guaranteed to work: @data <== grep { $_ % 2 } <== @data; either of these do: @data <== grep { $_ % 2 } <== eager @data; @data <== eager grep { $_ % 2 } <== @data; Conjecture: if the cloning process eagerly duplicates C<@data>, it could be forced to work. Not clear if this is desirable, since ordinary clones just clone the container, not the value. Leftward feeds are a convenient way of explicitly indicating the typical right-to-left flow of data through a chain of operations: @oddsquares = map { $_**2 }, sort grep { $_ % 2 }, @nums; # perhaps more clearly written as... @oddsquares = do { map { $_**2 } <== sort <== grep { $_ % 2 } <== @nums; } Rightward feeds are a convenient way of reversing the normal data flow in a chain of operations, to make it read left-to-right: @oddsquares = do { @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 }; } Note that something like the C<do> is necessary because feeds operate at the statement level. Parens would also work, since a statement is expected inside: @oddsquares = ( @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 }; ); But as described below, you can also just write: @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 } ==> @oddsquares; If the operand on the sharp end of a feed is not a call to a variadic operation, it must be something else that can be interpreted as a list receiver, or a scalar expression that can be evaluated to produce an object that does the C<KitchenSink> role, such as an C<IO> object. Such an object provides C<.clear> and C<.push> methods that will be called as appropriate to send data. (Note that an C<IO> object used as a sink will force eager evaluation on its pipeline, so the next statement is guaranteed not to run till the file is closed. In contrast, an C<Array> object used as a sink turns into a lazy array.) Any non-variadic object (such as an C<Array> or C<IO> object) used as a filter between two feeds is treated specially as a I<tap> that merely captures data I<en passant>. You can safely install such a tap in an extended pipeline without changing the semantics. An C<IO> object used as a tap does not force eager evaluation since the eagerness is controlled instead by the downstream feed. Any prefix list operator is considered a variadic operation, so ordinarily a list operator adds any feed input to the end of its list. But sometimes you want to interpolate elsewhere, so any contextualizer with C<*> as an argument may be used to indicate the target of a feed without the use of a temporary array: foo() ==> say @(*), " is what I meant"; bar() ==> @(*).baz(); Likewise, an C<Array> used as a tap may be distinguished from an C<Array> used as a translation function: numbers() ==> @array ==> bar() # tap numbers() ==> @array[@(*)] ==> bar() # translation To append multiple sources to the next sink, double the angle: my $sink; 0..* ==> $sink; 'a'..* ==>> $sink; pidigits() ==>> $sink; # outputs "(0, 'a', 3)\n"... for $sink.zip { .perl.say } Each such append adds another slice element to the sink. You may use a variable (or variable declaration) as a receiver, in which case the list value is bound as the "todo" of the variable. (The append form binds addition todos to the receiver's todo list.) Do not think of it as an assignment, nor as an ordinary binding. Think of it as iterator creation. In the case of a scalar variable, that variable contains the newly created iterator itself. In the case of an array, the new iterator is installed as the method for extending the array. As with assignment, the old todo list is clobbered; use the append form to avoid that and get push semantics. In any case, feeding an array always flattens. You must use the scalar form to preserve slice information. In general you can simply think of a receiver array as representing the results of the chain, so you can equivalently write any of: my @oddsquares <== map { $_**2 } <== sort <== grep { $_ % 2 } <== @nums; my @oddsquares <== map { $_**2 } <== sort <== grep { $_ % 2 } <== @nums; @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 } ==> my @oddsquares; @nums ==> grep { $_ % 2 } ==> sort ==> map { $_**2 } ==> my @oddsquares; Since the feed iterator is bound into the final variable, the variable can be just as lazy as the feed that is producing the values. When feeds are bound to arrays with "push" semantics, you can have a receiver for multiple feeds: my @foo; 0..2 ==> @foo; 'a'..'c' ==>> @foo; say @foo; # 0,1,2,'a','b','c' Note how the feeds are concatenated in C<@foo> so that C<@foo> is a list of 6 elements. This is the default behavior. However, sometimes you want to capture the outputs as a list of two iterators, namely the two iterators that represent the two input feeds. You can get at those two iterators by using a scalar instead, which will preserve the slice structure, which can be fed to any operation that knows how to deal with a list of values as a slice, such as C<zip>: 0..* ==> $foo; 'a'..* ==>> $foo; pidigits() ==>> $foo; for $foo.zip { .say } [0,'a',3] [1,'b',1] [2,'c',4] [3,'d',1] [4,'e',5] [5,'f',9] ... Here C<$foo> is a list of three lists, so $foo.zip is equivalent to my (@a,@b,@c) := |$foo; zip(@a; @b; @c) A named receiver array is useful when you wish to feed into an expression that is not an ordinary list operator, and you wish to be clear where the feed's destination is supposed to be: picklist() ==> my @baz; my @foo = @bar[@baz]; Various contexts may or may not be expecting multi-dimensional slices or feeds. By default, ordinary arrays are flattened in slurpy context, that is, they have "list" semantics. If you say zip(0..2; 'a'..'c') ==> my @tmp; for @tmp { .say } then you get 0,1,2,'a','b','c'. If you have a multidimensional array, you can ask for flattening semantics explicitly with C<flat>: zip(0..2; 'a'..'c') ==> my $tmp; for $tmp.flat { .say } As we saw earlier, "zip" produces an interleaved result by taking one element from each list in turn, so zip(0..2; 'a'..'c') ==> my $tmp; for $tmp.zip { .say } produces 0,'a',1,'b',2,'c'. If you want the zip's result as a list of subarrays, then you need to put the zip itself into a "chunky" C<LoL> context instead: zip(0..2; 'a'..'c') ==> my $tmp; for $tmp.zip.lol { .say } This produces two values on each line. But usually you want the flat form so you can just bind it directly to a signature: for $tmp.zip -> $i, $a { say "$i: $a" } Otherwise you'd have to say this: for $tmp.zip.lol -> [$i, $a] { say "$i: $a" } Note that with the current definition, the order of feeds is preserved left to right in general regardless of the position of the receiver. So ('a'..*; 0..*) ==> $feed; for $feed.zip <== @foo) -> $a, $i, $x { ... } is the same as 'a'..* ==> $feed; 0..* ==>> $feed; for $feed.zip <== @foo) -> $a, $i, $x { ... } which is the same as for zip('a'..*; 0..*; @foo) -> $a, $i, $x { ... } Also note that these come out to be identical for ordinary arrays: @foo.zip @foo.cat =head2 Closure parameters Parameters declared with the C<&> sigil take blocks, closures, or subroutines as their arguments. Closure parameters can be required, optional, named, or slurpy. sub limited_grep (Int $count, &block, *@list) {...} # and later... @first_three = limited_grep 3, {$_<10}, @data; (The comma is required after the closure.) Within the subroutine, the closure parameter can be used like any other lexically scoped subroutine: sub limited_grep (Int $count, &block, *@list) { ... if block($nextelem) {...} ... } The closure parameter can have its own signature in a type specification written with C<:(...)>: sub limited_Dog_grep ($count, &block:(Dog), *@list) {...} and even a return type: sub limited_Dog_grep ($count, &block:(Dog --> Bool), *@list) {...} When an argument is passed to a closure parameter that has this kind of signature, the argument must be a C<Code> object with a compatible parameter list and return type. =head2 En passant type capture Unlike normal parameters, type parameters often come in piggybacked on the actual value as "kind", and you'd like a way to capture both the value and its kind at once. (A "kind" is a storage type, that is, a class or type that an object is allowed to be. An object is not officially allowed to take on a constrained or contravariant type.) A type variable can be used anywhere a type name can, but instead of asserting that the value must conform to a particular type, it captures the actual "kind" of the object and also declares a package/type name by which you can refer to that kind later in the signature or body. In addition, it captures the nominal typing of any associated nominal type. For instance, if you wanted to match any two Dogs as long as they were of the same kind, you can say: sub matchedset (Dog ::T $fido, T $spot) {...} This actually turns into something more like sub matchedset (Dog ::T $fido, Dog $spot where T) {...} Note that C<::T> is not required to contain C<Dog>, only a type that is compatible with C<Dog>. Note also that the nominal type, C<Dog>, is also included in the meaning of C<T>, along with the notion that the actual type must match the storage type of C<$fido>. The C<::> quasi-sigil is short for "subset" in much the same way that C<&> is short for "sub". Just as C<&> can be used to name any kind of code, so too C<::> can be used to name any kind of type. Both of them insert a bare identifier into the symbol table, though they fill different syntactic spots. Note that it is not required to capture the object associated with the class unless you want it. The sub above could be written as sub matchedset (Dog ::T, T) {...} if we're not interested in C<$fido> or C<$spot>. Or just sub matchedset (::T, T) {...} if we don't care about anything but the matching. Note here that the second parameter may be more derived than the first. If you need them to be identical, you must say something like sub matchedset (::T, $ where { $_.WHAT === T } =head2 Unpacking array parameters Instead of specifying an array parameter as an array: sub quicksort (@data, $reverse?, $inplace?) { my $pivot := shift @data; ... } it may be broken up into components in the signature, by specifying the parameter as if it were an anonymous array of parameters: sub quicksort ([$pivot, *@data], $reverse?, $inplace?) { ... } This subroutine still expects an array as its first argument, just like the first version. =head2 Unpacking a single list argument To match the first element of the slurpy list, use a "slurpy" scalar: sub quicksort (:$reverse, :$inplace, *$pivot, *@data) =head2 Unpacking tree node parameters You can unpack hash values and tree nodes in various dwimmy ways by enclosing the bindings of child nodes and attributes in parentheses following the declaration of the node itself: sub traverse ( BinTree $top ( $left, $right ) ) { traverse($left); traverse($right); } In this, C<$left> and C<$right> are automatically bound to the left and right nodes of the tree. If C<$top> is an ordinary object, it binds the C<$top.left> and C<$top.right> attributes. If it's a hash, it binds C<< $top<left> >> and C<< $top<right> >>. If C<BinTree> is a signature type and $top is a C<Capture> (argument list) object, the child types of the signature are applied to the actual arguments in the argument list object. (Signature types have the benefit that you can view them inside-out as constructors with positional arguments, such that the transformations can be reversible.) However, the full power of signatures can be applied to pattern match just about any argument or set of arguments, even though in some cases the reverse transformation is not derivable. For instance, to bind to an array of children named C<.kids> or C<< .<kids> >>, use something like: proto traverse ($) {*} multi traverse ( NAry $top ( :kids [$eldest, *@siblings] ) ) { traverse($eldest); traverse(:kids(@siblings)); # (binds @siblings to $top) } multi traverse ( $leaf ) {...} The second candidate is called only if the parameter cannot be bound to both C<$top> and to the "kids" parsing subparameter. Likewise, to bind to a hash element of the node and then bind to keys in that hash by name: sub traverse ( AttrNode $top ( :%attr{ :$vocalic, :$tense } ) ) { say "Has {+%attr} attributes, of which"; say "vocalic = $vocalic"; say "tense = $tense"; } You may omit the top variable if you prefix the parentheses with a colon to indicate a signature. Otherwise you must at least put the sigil of the variable, or we can't correctly differentiate: my Dog ($fido, $spot) := twodogs(); # list of two dogs my Dog $ ($fido, $spot) := twodogs(); # one twodog object my Dog :($fido, $spot) := twodogs(); # one twodog object Sub signatures can be matched directly within regexes by using C<:(...)> notation. push @a, "foo"; push @a, \(1,2,3); push @a, "bar"; ... my ($i, $j, $k); @a ~~ rx/ <,> # match initial elem boundary :(Int $i,Int $j,Int? $k) # match lists with 2 or 3 ints <,> # match final elem boundary /; say "i = $<i>"; say "j = $<j>"; say "k = $<k>" if defined $<k>; If you want a parameter bound into C<$/>, you have to say C<< $<i> >> within the signature. Otherwise it will try to bind an external C<$i> instead, and fail if no such variable is declared. Note that unlike a sub declaration, a regex-embedded signature has no associated "returns" syntactic slot, so you have to use C<< --> >> within the signature to specify the C<of> type of the signature, or match as an arglist: :(Num, Num --> Coord) :(\Coord(Num, Num)) A consequence of the latter form is that you can match the type of an object with C<:(\Dog)> without actually breaking it into its components. Note, however, that it's not equivalent to say :(--> Dog) which would be equivalent to :(\Dog()) that is, match a nullary function of type C<Dog>. Nor is it equivalent to :(Dog) which would be equivalent to :(\Any(Dog)) and match a function taking a single parameter of type Dog. Note also that bare C<\(1,2,3)> is never legal in a regex since the first (escaped) paren would try to match literally. =head2 Attributive parameters If a submethod's parameter is declared with a C<.> or C<!> after the sigil (like an attribute): submethod initialize($.name, $!age) {} then the argument is bound directly to the object's attribute of the same name. This avoids the frequent need to write code like: submethod initialize($name, $age) { $.name = $name; $!age = $age; } The initialization of attributes requires special care to preserve encapsulation; therefore the default for attributive parameters is value semantics, that is, as if specified with C<is copy>. Hence, the submethod above is really more like: submethod initialize($name is copy, $age is copy) { $.name := $name; # or maybe = here, since it's a parent's attr $!age := $age; # or maybe only $! parameters work really } If you wish to allow the user to initialize an attribute by reference, you may either write your own initializer submethod explicitly, or simply mark the attributes you want to work that way with C<is raw>: has $!age is raw; # BUILD will automatically use binding, not copy To rename an attribute parameter you can use the explicit pair form: submethod initialize(:moniker($.name), :youth($!age)) {} The C<:$name> shortcut may be combined with the C<$.name> shortcut, but the twigil is ignored for the parameter name, so submethod initialize(:$.name, :$!age) {} is the same as: submethod initialize(:name($.name), :age($!age)) {} Note that C<$!age> actually refers to the private "C<has>" variable that can be referred to as either C<$age> or C<$!age>. =head2 Placeholder variables Even though every bare block is a closure, bare blocks can't have explicit parameter lists. Instead, they use "placeholder" variables, marked by a caret (C<^>) or a colon (C<:>) after their sigils. The caret marks positional placeholders, while the colon marks named placeholders. Using placeholders in a block defines an implicit parameter list. The signature is the list of distinct positional placeholder names, sorted in Unicode order, following by the named placeholder names in any order. So: { say "woof" if $:dog; $^y < $^z && $^x != 2 } is a shorthand for: -> $x,$y,$z,:$dog { say "woof" if $dog; $y < $z && $x != 2 } Note that placeholder variables syntactically cannot have type constraints. Also, it is illegal to use placeholder variables in a block that already has a signature, because the autogenerated signature would conflict with that. Positional placeholder names consisting of a single uppercase letter are disallowed, not because we're mean, but because it helps us catch references to obsolete Perl 5 variables such as C<$^O>. The C<$_> variable functions as a placeholder in a block without any other placeholders or signature. Any bare block without placeholders really has a parameter like this: -> $_ is raw = OUTER::<$_> { .mumble } (However, statement control C<if> notices this and passes no argument, so C<$_> ends up being bound to the outer C<$_> anyway.) A block may also refer to either C<@_> or C<%_> or both, each of which will be added to generated signature as a normal readonly slurpy parameter: { say $:what; warn "bad option: $_\n" for keys %_; } turns into -> :$what, *%_ { say $what; warn "bad option: $_\n" for keys %_; } If not used, they are not added, and a dispatch with mispatched parameters will fail. The use of a P5ish C<@_> in a signatureless sub falls naturally out of this: sub sayall { .say for @_ } Note that in this case, C<$_> is not treated as a placeholder because there is already the C<@_> placeholder. And C<@_> is a placeholder only because the sub has no official signature. Otherwise it would be illegal (unless explicitly declared). Placeholders may also be used in method bodies that have no formal signature. The invocant is always removed first, so the first placeholder argument always refers to the first non-invocant argument. C<@_> will never contain the invocant. The invocant is always available via C<self>, of course. Since the placeholder declares a parameter variable without the twigil, the twigil is needed only on the first occurrence of the variable within the block. Subsequent mentions of that variable may omit the twigil. Within an internal nested block the twigil I<must> be omitted, since it would wrongly attach to the inner block. Note that, unlike in Perl 5, C<@_> may not be used within an inner block to refer to the outer block's arguments: sub say-or-print { if $SAYIT { say @_; # WRONG } else { print @_; # WRONG } } because this desugars to: sub say-or-print { if $SAYIT -> *@_ { say @_; } else -> *@_ { print @_; } } Translators of Perl 5 will need to bear this in mind. =head1 Properties and traits Compile-time properties are called "traits". The C<is I<NAME> (I<DATA>)> syntax defines traits on containers and subroutines, as part of their declaration: constant $pi is Approximated = 3; # variable $pi has Approximated trait my $key is Persistent(:file<.key>); sub fib is cached {...} The C<will I<NAME> I<BLOCK>> syntax is a synonym for C<is I<NAME> (I<BLOCK>)>: my $fh will undo { close $fh }; # Same as: my $fh is undo({ close $fh }); The C<but I<NAME> (I<DATA>)> syntax specifies run-time properties on values: constant $pi = 3 but Inexact; # value 3 has Inexact property sub system { ... return $error but False if $error; return 0 but True; } Properties are predeclared as roles and implemented as mixins--see S12. =head2 Subroutine traits These traits may be declared on the subroutine as a whole (individual parameters take other traits). Trait syntax depends on the particular auxiliary you use, but for C<is>, the subsequent syntax is identical to adverbial syntax, except that that colon may be omitted or doubled depending on the degree of ambiguity desired: sub x() is ::Foo[...] # definitely a parameterized typename sub x() is :Foo[...] # definitely a pair with a list sub x() is Foo[...] # depends on whether Foo is predeclared as type =over =item C<is signature> The signature of a subroutine. Normally declared implicitly, by providing a parameter list and/or return type. =item C<as>/C<is as> The C<inner> type constraint that a routine imposes on its return value. =item C<of>/C<is of> The C<of> type that is the official return type of the routine. Or you can think of "of" as outer/formal. If there is no inner type, the outer type also serves as the inner type to constrain the return value. =item C<will do> The block of code executed when the subroutine is called. Normally declared implicitly, by providing a block after the subroutine's signature definition. =item C<is rw> Marks a subroutine as returning an lvalue. =item C<is raw> Marks a subroutine as returning a raw alias. =item C<is parsed> Specifies the subrule by which a macro call is parsed. The parse always starts after the macro's initial token. If the operator has two parts (circumfix or postcircumfix), the final token is also automatically matched, and should not be matched by the supplied regex. [This trait and the following are likely to be deprecated in favor of slang macros that are aware of the grammar and category in which they are installed, and that therefore already know how to parse like normal grammar rules. The actions of slang macros, however, will be more targeted toward user-level AST production and manipulation through use of quasi quoting and unquoting, as well as through direct access to some as-yet-unspecified high-level, VM-independent AST representation.] =item C<is reparsed> Also specifies the subrule by which a macro call is parsed, but restarts the parse before the macro's initial token, usually because you want to parse using an existing rule that expects to traverse the initial token. If the operator has two parts (circumfix or postcircumfix), the final token must also be explicitly matched by the supplied regex. =item C<is cached> Marks a subroutine as being memoized, or at least memoizable. In the abstract, this cache is just a hash where incoming argument C<Capture>s are mapped to return values. If the C<Capture> is found in the hash, the return value need not be recalculated. If you use this trait, the compiler will assume two things: =over =item * A given C<Capture> would always calculate the same return value. That is, there is no state hidden within the dynamic scope of the call. =item * The cache lookup is likely to be more efficient than recalculating the value in at least some cases, because either most uncached calls would be slower (and reduce throughput), or you're trying to avoid a significant number of pathological cases that are unacceptably slow (and increase latency). =back This trait is a suggestion to the compiler that caching is okay. The compiler is free to choose any kind of caching algorithm (including non-expiring, random, lru, pseudo-lru, or adaptive algorithms, or even no caching algorithm at all). The run-time system is free to choose any kind of maximum cache size depending on the availability of memory and trends in usage patterns. You may suggest a particular cache size by passing a numeric argument (representing the maximum number of unique C<Capture> values allowed), and some of the possible algorithms may pay attention to it. You may also pass C<*> for the size to request a non-expiring cache (complete memoization). The compiler is free to ignore this too. The intent of this trait is to specify performance hints without mandating any exact behavior. Proper use of this trait should not change semantics of the program; it functions as a kind of "pragma". This trait will not be extended to reinvent other existing ways of achieving the same effect. To gain more control, write your own trait handler to allow the use of a more specific trait, such as "C<is lru(42)>". Alternately, just use a state hash keyed on the sub's argument capture to write your own memoization with complete control from within the subroutine itself, or from within a wrapper around your subroutine. =item C<is tighter>/C<is looser>/C<is equiv> Specifies the precedence of an operator relative to an existing operator. C<tighter> and C<looser> precedence levels default to being left associative. They define a new precedence level slightly tighter or looser than the precedence level on which they're based. Both C<tighter> and C<looser> may be specified, in which case the new precedence level is generated midway between the specified levels. Two different declarations using the same precedence derivation end up at the same precedence level, as if C<equiv> was specified instead of C<tighter>/C<looser>, and the second will clone the associativity of the first. If the second explicitly specifies an associativity that differs from the first, unexpected parsing conflicts may result. (See S03.) In addition to cloning the precedence level, C<equiv> also clones other traits, so it specifies the default associativity to be the same as the operator to which the new operator is equivalent. The following are the default equivalents for various syntactic categories if neither C<equiv> nor C<assoc> is specified. (Many of these have no need of precedence or associativity because they are parsed specially. Nevertheless, C<equiv> may be useful for cloning other traits of these operators.) category:<prefix> circumfix:<( )> dotty:<.> infix:<+> infix_circumfix_meta_operator:['»','«'] infix_postfix_meta_operator:<=> infix_prefix_meta_operator:<!> package_declarator:<class> postcircumfix:<( )> postfix:<++> postfix_prefix_meta_operator:['»'] prefix:<++> prefix_circumfix_meta_operator:['[',']'] prefix_postfix_meta_operator:['«'] q_backslash:<\\> qq_backslash:<n> quote_mod:<c> quote:<q> regex_assertion:<?> regex_backslash:<w> regex_metachar:<.> regex_mod_internal:<i> routine_declarator:<sub> scope_declarator:<my> sigil:<$> special_variable:<$!> statement_control:<if> statement_mod_cond:<if> statement_mod_loop:<while> statement_prefix:<do> term:<*> trait_mod:<is> trait_verb:<of> twigil:<?> type_declarator:<subset> version:<v> The existing operator may be specified either as a function object or as a string argument equivalent to the one that would be used in the complete function name. In string form the syntactic category will be assumed to be the same as the new declaration. Therefore these all have the same effect: sub postfix:<!> ($x) is equiv(&postfix:<++>) {...} sub postfix:<!> ($x) is equiv<++> {...} sub postfix:<!> ($x) {...} # since equiv<++> is the default Prefix operators that are identifiers are handled specially. The form with one argument defaults to named unary precedence instead of autoincrement precedence: sub prefix:<foo> ($x) {...} foo 1, 2, 3; # means foo(1), 2, 3 Likewise postfix operators that look like method calls are forced to default to the precedence of method calls. Any prefix operator that requires multiple arguments defaults to listop precedence, even if it is not an identifier: sub prefix:<☎> ($x,$y) {...} ☎ 1; # ERROR, too few arguments ☎ 1, 2; # okay ☎ 1, 2, 3; # ERROR, too many arguments You must use the C<< prefix:<foo> >> form in order to mutate the grammar to parse as a named unary operator. Normal function definitions never change the grammar, and when called always parse as listops, even if defined with a single argument: sub foo ($x) {...} # a listop foo(1), 2, 3; # okay (foo 1), 2, 3; # okay foo 1, 2, 3; # ERROR, too many arguments Likewise 0-ary functions parse as listops. Use C<< term:<foo> >> (or a constant or enum declaration) to declare a term that expects no arguments. Because these traits have an immediate declarative effect, it is illegal to apply them to a C<multi>, or to any post-declared function. More generally, any such language-bending declaration must follow the same lexical scoping rules that a macro does. =item C<is assoc> Specifies the associativity of an operator explicitly. Valid values are: Tag Examples Meaning of $a op $b op $c Default equiv === ======== ========================= ============= left + - * / x ($a op $b) op $c + right ** = $a op ($b op $c) ** non cmp <=> .. ILLEGAL cmp chain == eq ~~ ($a op $b) and ($b op $c) eqv list | & ^ Z op($a; $b; $c) | Note that operators "C<equiv>" to relationals are automatically considered chaining operators. When creating a new precedence level, the chaining is determined by the presence or absence of "C<< is assoc<chain> >>", and other operators defined at that level are required to be the same. Specifying an C<assoc> without an explicit C<equiv> substitutes a default C<equiv> consistent with the associativity, as shown in the final column above. Because this trait has an immediate declarative effect, it is illegal to apply it to a C<multi>, or to any post-declared function. =item C<PRE>/C<POST> These phasers declare statements or blocks that are to be unconditionally executed before/after the subroutine's C<do> block. They must return a true value, otherwise an exception is thrown. When applied to a method, the semantics provide support for the "Design by Contract" style of OO programming: a precondition of a particular method is met if all the C<PRE> phasers associated with that method return true. Otherwise, the precondition is met if C<all> of the parent classes' preconditions are met (which may include the preconditions of I<their> parent classes if they fail, and so on recursively.) In contrast, a method's postcondition is met if all the method's C<POST> phasers return true I<and> all its parents' postconditions are also met recursively. C<POST> phasers (and "C<will post>" phaser traits) declared within a C<PRE> or C<ENTER> block are automatically hoisted outward to be called at the same time as other C<POST> phasers. This conveniently gives "circum" semantics by virtue of wrapping the post lexical scope within the pre lexical scope. That is, the C<POST> closes over its outer scope, even if that scope is gone by the time the C<POST> is run. method push ($new_item) { ENTER { my $old_height = self.height; POST { self.height == $old_height + 1 } } $new_item ==> push @.items; } method pop () { ENTER { my $old_height = self.height; POST { self.height == $old_height - 1 } } return pop @.items; } Note that C<self> is available in phasers defined within methods. Class invariants are declared with C<PRE>/C<POST> submethods instead of phasers. Module invariants are declared with C<PRE>/C<POST> subs or protos. [Conjecture: class and module invariants can applied more selectively by marking C<PRE>/C<POST> declarations with a C<selective> trait that stops it from running on internal calls (which might allow temporary violations of invariants), but enforces the invariants when any routine of this module is called from "outside" the current module or type, however that's defined. There could be arguments to this trait that could refine the concept of what is foreign.] =item C<ENTER>/C<LEAVE>/C<KEEP>/C<UNDO>/etc. These phasers supply code that is to be conditionally executed before or after the subroutine's C<do> block (only if used at the outermost level within the subroutine; technically, these are added to the block traits on the C<do> block, not the subroutine object). These phasers are generally used only for their side effects, since most return values will be ignored. (Phasers that run before normal execution may be used for their values, however.) =back =head2 Parameter traits The following traits can be applied to many types of parameters. =over =item C<is readonly> Specifies that the parameter cannot be modified (e.g. assigned to, incremented). It is the default for parameters. On arguments which are already immutable values it is a no-op at run time; on mutable containers it may need to create an immutable alias to the mutable object if the constraint cannot be enforced entirely at compile time. Binding to a readonly parameter never triggers autovivification. =item C<is rw> Specifies that the parameter can be modified (assigned to, incremented, etc). Requires that the corresponding argument is an lvalue or can be converted to one. Since this option forces an argument to be required, it cannot coexist with the C<?> mark to make an argument optional. (It may, however, be used with C<=> indicating a default, but only if the default expression represents something that is nameable at compile time and that can bind as an lvalue, such as C<< CALLER::<$/> >> or C<< OUTER::<$_> >>.) When applied to a variadic parameter, the C<rw> trait applies to each element of the list: sub incr (*@vars is rw) { $_++ for @vars } (The variadic array as a whole is always modifiable, but such modifications have no effect on the original argument list.) =item C<is raw> Specifies that the parameter is passed as a raw alias, an argument object that has not yet had a context imposed. In other words, this provides for lazy contextualization even through function calls. This is important if you wish to pass the parameter onward to something else that will determine its context later. You may modify the argument, but only if argument is already a suitable lvalue since, unlike C<rw>, no attempt at autovivification is made, so unsuitable lvalues will throw an exception if you try to modify them within the body of the routine. That is, if autovivification happens, it happens at the point of use, not at the point of binding. For better visual distinction, such a parameter is declared by prefixing with a backslash rather than by using C<is raw> directly. The backslash is also more succinct; the trait is there primarily for introspection. =item C<is copy> Specifies that the parameter receives a distinct, read-writable copy of the original argument. This is commonly known as "pass-by-value". sub reprint ($text, $count is copy) { print $text while $count-- > 0; } Binding to a copy parameter never triggers autovivification. =item C<is dynamic> Specifies that the parameter is to be treated as an "environmental" variable, that is, a lexical that is accessible from the dynamic scope (see S02). =item C<as> [DEPRECATED] Specifies that the parameter is to be coerced to the given type. method link(IO::File:D: $name as Str) { ... } sub homedir($path as Str, :$test = <r w x>) { ... } =back =head2 Signature Introspection A C<Signature> object can be introspected to find out the details of the parameters it is defined as expected. The C<.params> method will return a C<List> of C<Parameter> objects, which have the following readonly properties: name The name of the lexical variable to bind to, if any type The main type (the one multi-dispatch sorts by) constraints Any further type constraints type_captures List of names the argument type is captured into readonly True if the parameter has C<is readonly> trait rw True if the parameter has C<is rw> trait copy True if the parameter has C<is copy> trait named True if the parameter is to be passed named named_names List of names a named parameter can be passed as capture True if the parameter binds the caller's Capture raw True if the parameter is too lazy to contextualize slurpy True if the parameter is slurpy optional True if the parameter is optional default A closure returning the default value invocant True if the parameter is a method invocant multi_invocant True if the parameter is a multi invocant signature A nested signature to bind the argument against Note that C<constraints> will be something that can be smart-matched against if it is defined; if there are many constraints it may be a C<Junction> of some sort, but if there is just one it may be simply that one thing. Further, various things that appear in an original written signature will have been deconstructed a bit. For example, a signature like: :(1) Will introspect the same way as: :(Int $ where 1) And if we have: subset Odd of Int where { $^n % 2 }; sub foo(Odd $x) { ... } Then the signature of foo will be equivalent to something like: :(Int $x where { $^n % 2 }) That is, the refinement type will have been deconstructed into the part that nominal type that multiple dispatch uses for sorting the candidates and an additional constraint. =head1 Advanced subroutine features =head2 Processing of returned values It is a general policy that lvalues should only be returned up the dynamic call stack if specifically requested. Therefore, by default the returned arguments are processed to enforce this, both for the implicit return from the last statement of any block, as well as the explicit return done by operators such as: return leave take Specifically, this processing involves examining the returned value's arguments and dereferencing any container that could be used as an lvalue, replacing it with the container's value. To override this processing for a routine, it must be declared C<rw>, or the form C<return-rw> must be used. To override for a C<gather>, use C<gather-rw> instead, or C<take-rw> on the individual take. Since blocks don't generally have traits, you must use C<leave-rw> to pass an lvalue out of a block. =head2 The C<return> function The C<return> function notionally throws a control exception that is caught by the current lexically enclosing C<Routine> to force a return through the control logic code of any intermediate block constructs. (That is, it must unwind the stack of dynamic scopes to the proper lexical scope belonging to this routine.) With normal blocks (those that are autoexecuted in place because they're known to the compiler) this unwinding can likely be optimized away to a "goto". All C<Routine> declarations have an explicit declarator such as C<sub> or C<method>; bare blocks and "pointy" blocks are never considered to be routines in that sense. To return from a block, use C<leave> instead--see below. The C<return> function preserves its argument list as a C<Capture> object, and responds to the left-hand C<Signature> in a binding. This allows named return values if the caller expects one: sub f () { return :x<1> } sub g ($x) { print $x } my $x := |(f); # binds 1 to $x, via a named argument g(|(f)); # prints 1, via a named argument To return a literal C<Pair> object, always put it in an additional set of parentheses: return( (:x<1>), (:y<2>) ); # two positional Pair objects Note that the postfix parentheses on the function call don't count as being "additional". However, as with any function, whitespace after the C<return> keyword prevents that interpretation and turns it instead into a list operator: return :x<1>, :y<2>; # two named arguments (if caller uses |) return ( :x<1>, :y<2> ); # two positional Pair objects If the function ends with an expression without an explicit C<return>, that expression is also taken to be a C<Capture>, just as if the expression were the argument to a C<return> list operator (with whitespace): sub f { :x<1> } # named-argument binding (if caller uses |) sub f { (:x<1>) } # always just one positional Pair object On the caller's end, the C<Capture> is interpolated into any new argument list much like an array would be, that is, as an item in item context, and as a list in list context. This is the default behavior, but the caller may use C<< prefix:<|> >> to inline the returned values as part of the new argument list. The caller may also bind the returned C<Capture> directly. A function is called only once at the time the C<Capture> object is generated, not when it is later bound (which could happen more than once). =head2 The C<callframe> and C<caller> functions The C<callframe> function takes a list of matchers and interprets them as a navigation path from the current call frame to a location in the call stack, either the current call frame itself or some frame from which the current frame was called. It returns an object that describes that particular call frame, or a false value if there is no such scope. Numeric arguments are interpreted as number of frames to skip, while non-numeric arguments scan outward for a frame matching the argument as a smartmatch. The current frame is accessed with a null argument list. say " file ", callframe().file, " line ", callframe().line; which is equivalent to: say " file ", DYNAMIC::<$?FILE>, " line ", DYNAMIC::<$?LINE>; The immediate caller of this frame is accessed by skipping one level: say " file ", callframe(1).file, " line ", callframe(1).line; You might think that that must be the current function's caller, but that's not necessarily so. This might return an outer block in our own routine, or even some function elsewhere that implements a control operator on behalf of our block. To get outside your current routine, see C<caller> below. The C<callframe> function may be given arguments telling it which higher scope to look for. Each argument is processed in order, left to right. Note that C<Any> and C<0> are no-ops: $ctx = callframe(); # currently running frame for &?BLOCK $ctx = callframe(Any); # currently running frame for &?BLOCK $ctx = callframe(Any,Any); # currently running frame for &?BLOCK $ctx = callframe(1); # my frame's caller $ctx = callframe(2); # my frame's caller's caller $ctx = callframe(3); # my frame's caller's caller's caller $ctx = callframe(1,0,1,1); # my frame's caller's caller's caller $ctx = callframe($i); # $i'th caller Note also that negative numbers are allowed as long as you stay within the existing call stack: $ctx = callframe(4,-1); # my frame's caller's caller's caller Repeating any smartmatch just matches the same frame again unless you intersperse a 1 to skip the current level: $ctx = callframe(Method); # nearest frame that is method $ctx = callframe(Method,Method); # nearest frame that is method $ctx = callframe(Method,1,Method); # 2nd nearest method frame $ctx = callframe(Method,1,Method,1) # caller of that 2nd nearest method $ctx = callframe(1,Block); # nearest outer frame that is block $ctx = callframe(Sub,1,Sub,1,Sub); # 3rd nearest sub frame $ctx = callframe({ .labels.any eq 'Foo' }); # nearest frame labeled 'Foo' Note that this last potentially differs from the answer returned by Foo.callframe which returns the frame of the innermost C<Foo> block in the lexical scope rather than the dynamic scope. A call frame also responds to the C<.callframe> method, so a given frame may be used as the basis for further navigation: $ctx = callframe(Method,1,Method); $ctx = callframe(Method).callframe(1).callframe(Method); # same You must supply args to get anywhere else, since C<.callframe> is the identity operator when called on something that is already a C<Context>: $ctx = callframe; $ctx = callframe.callframe.callframe.callframe; # same The C<caller> function is special-cased to go outward just far enough to escape from the current routine scope, after first ignoring any inner blocks that are embedded, or are otherwise pretending to be "inline": &caller ::= &callframe.assuming({ !.inline }, 1); Note that this is usually the same as C<callframe(&?ROUTINE,1)>, but not always. A call to a returned closure might not even have C<&?ROUTINE> in its dynamic scope anymore, but it still has a caller. So to find where the current routine was called you can say: say " file ", caller.file, " line ", caller.line; which is equivalent to: say " file ", CALLER::<$?FILE>, " line ", CALLER::<$?LINE>; Additional arguments to C<caller> are treated as navigational from the calling frame. One frame out from your current routine is I<not> guaranteed to be a C<Routine> frame. You must say C<caller(Routine)> to get to the next-most-inner routine. Note that C<caller(Routine).line> is not necessarily going to give you the line number that your current routine was called from; you're rather likely to get the line number of the topmost block that is executing within that outer routine, where that block contains the call to your routine. For either C<callframe> or C<caller>, the returned CallFrame object supports at least the following methods: .callframe .caller .leave .inline .package .file .line .my .hints .args The C<.callframe> and C<.caller> methods work the same as the functions except that they are relative to the frame supplied as invocant. The C<.leave> method can force an immediate return from the specified call frame. The C<.inline> method says whether this block was entered implicitly by some surrounding control structure. Any time you invoke a block or routine explicitly with C<.()> this is false. However, it is defined to be true for any block entered using dispatcher-level primitives such as C<.callwith>, C<.callsame>, C<.nextwith>, or C<.nextsame>. The C<.my> method provides access to the lexical namespace associated with the given call frame's current position. It may be used to look up ordinary lexical variables in that lexical scope. It must not be used to change any lexical variable that is marked as readonly. The C<.hints> method gives access to a snapshot of compiler symbols in effect at the point of the call when the call was originally compiled. (For instance, C<caller.hints('&?ROUTINE')> will give you the caller's routine object.) Such values are always read-only, though in the case of some (like the caller's routine above) may return a fixed object that is nevertheless mutable. =head2 The C<want> function The C<want> function is gone. If you want context specific behavior, return an object instead that responds accordingly to the various contextual methods. (Conjecture: in future we might want to provide some syntactic sugar that makes it easier to create such objects. Or maybe a type that takes values or code references for the various contexts, so that you can write return ContextProxy.new: Int => 3, item => { @.list.join(', ') }, list => { ... }, ; or something similar.) =head2 The C<leave> function As mentioned above, a C<return> call causes the innermost surrounding subroutine, method, rule, token, regex (as a keyword) or macro to return. Only declarations with an explicit declarator keyword (C<sub>, C<submethod>, C<method>, C<macro>, C<regex>, C<token>, and C<rule>) may be returned from. Statement prefixes such a C<do> and C<try> do not fall into that category. You cannot use C<return> to escape directly into the surrounding context from loops, bare blocks, pointy blocks, or quotelike operators such as C<rx//>; a C<return> within one of those constructs will continue searching outward for a "proper" routine to return from. Nor may you return from property blocks such as C<BEGIN> or C<CATCH> (though blocks executing within the lexical and dynamic scope of a routine can of course return from that outer routine, which means you can always return from a C<CATCH> or a C<FIRST>, but never from a C<BEGIN> or C<INIT>.) To return from blocks that aren't routines, the C<leave> method is used instead. (It can be taken to mean either "go away from" or "bequeath to your successor" as appropriate.) The object specifies the scope to exit, and the method's arguments specify the return value. If the object is omitted (by use of the function or listop forms), the innermost block is exited. Otherwise you must use something like C<callframe> or C<&?BLOCK> or a dynamic variable to specify the scope you want to exit. A label (such as a loop label) previously seen in the lexical scope also works as a kind of singleton dynamic object: it names a statement that is serving both as an outer lexical scope and as a frame in the current dynamic scope. As with C<return>, the arguments are taken to be a C<Capture> holding the return values. leave; # return from innermost block of any kind callframe(Method).leave; # return from innermost calling method &?ROUTINE.leave(1,2,3); # Return from current sub. Same as: return 1,2,3 &?ROUTINE.leave <== 1,2,3; # same thing, force return as feed OUTER.leave; # Return from OUTER label in lexical scope &foo.leave: 1,2,3; # Return from innermost surrounding call to &foo Note that these are equivalent in terms of control flow: COUNT.leave; last COUNT; However, the first form explicitly sets the return value for the entire loop, while the second implicitly returns all the previous successful loop iteration values as a list comprehension. (It may, in fact, be too late to set a return value for the loop if it is being evaluated lazily!) A C<leave> from the inner loop block, however, merely specifies the return value for that iteration: for 1..10 { leave $_ * 2 } # 2..20 Note that this: leave COUNT; will always be taken as the function, not the method, so it returns the C<COUNT> object from the innermost block. The indirect object form of the method always requires a colon: leave COUNT: ; =head2 Temporization The C<temp> macro temporarily replaces the value of an existing variable, subroutine, context of a function call, or other object in a given scope: { temp $*foo = 'foo'; # Temporarily replace global $foo temp &bar := sub {...}; # Temporarily replace sub &bar ... } # Old values of $*foo and &bar reinstated at this point C<temp> invokes its argument's C<.TEMP> method. The method is expected to return a C<Callable> object that can later restore the current value of the object. At the end of the lexical scope in which the C<temp> was applied, the subroutine returned by the C<.TEMP> method is executed. The default C<.TEMP> method for variables simply creates a closure that assigns the variable's pre-C<temp> value back to the variable. New kinds of temporization can be created by writing storage classes with their own C<.TEMP> methods: class LoudArray is Array { method TEMP { print "Replacing $.WHICH() at {caller.location}\n"; my $restorer = callsame; return { print "Restoring $.WHICH() at {caller.location}\n"; $restorer(); }; } } You can also modify the behaviour of temporized code structures, by giving them a C<TEMP> block. As with C<.TEMP> methods, this block is expected to return a closure, which will be executed at the end of the temporizing scope to restore the subroutine to its pre-C<temp> state: my $next = 0; sub next { my $curr = $next++; TEMP {{ $next = $curr }} # TEMP block returns the closure { $next = $curr } return $curr; } # and later... say next(); # prints 0; $next == 1 say next(); # prints 1; $next == 2 say next(); # prints 2; $next == 3 if ($hiccough) { say temp next(); # prints 3; closes $curr at 3; $next == 4 say next(); # prints 4; $next == 5 say next(); # prints 5; $next == 6 } # $next = 3 say next(); # prints 3; $next == 4 say next(); # prints 4; $next == 5 Note that C<temp> must be a macro rather than a function because the temporization must be arranged before the function causes any state changes, and if it were a normal argument to a normal function, the state change would be happen before C<temp> got control. Hypothetical variables use the same mechanism, except that the restoring closure is called only on failure. Note that dynamic variables may be a better solution than temporized globals in the face of multithreading. =head2 Wrapping Every C<Routine> object has a C<.wrap> method. This method expects a single C<Callable> argument. Within the code, the special C<callsame>, C<callwith>, C<nextsame> and C<nextwith> functions will invoke the original routine, but do not introduce an official C<CALLER> frame: sub thermo ($t) {...} # set temperature in Celsius, returns old value # Add a wrapper to convert from Fahrenheit... $handle = &thermo.wrap( { callwith( ($^t-32)/1.8 ) } ); The C<callwith> function lets you pass your own arguments to the wrapped function. The C<callsame> function takes no argument; it implicitly passes the original argument list through unchanged. The C<callsame> and C<nextsame> functions are really short for: callwith( |callframe(Routine).args ) nextwith( |callframe(Routine).args ) The call to C<.wrap> replaces the original C<Routine>'s C<do> property with the C<Callable> argument, and arranges that any call to C<callsame>, C<callwith>, C<nextsame> or C<nextwith> invokes the previous version of the routine. In other words, the call to C<.wrap> has more or less the same effect as: my &old_thermo := &thermo; &thermo = sub ($t) { old_thermo( ($t-32)/1.8 ) } Note that C<&thermo.WHICH> stays the same after the C<.wrap>, as it does with the equivalent assignment shown since assignment to a C<Routine> works like a container, changing the contained C<do> property but not the container itself. The call to C<.wrap> returns a unique handle that has a C<restore> method that will undo the wrapping: $handle.restore; This does not affect any other wrappings placed to the routine. A wrapping can also be restricted to a particular dynamic scope with temporization: # Add a wrapper to convert from Kelvin # wrapper self-unwraps at end of current scope temp &thermo.wrap( { callwith($^t + 273.16) } ); The entire argument list may be captured by binding to a C<Capture> parameter. It can then be passed to C<callwith> using that name: # Double the return value for &thermo &thermo.wrap( -> |args { callwith(|args) * 2 } ); In this case only the return value is changed. The wrapper is not required to call the original routine; it can call another C<Callable> object by passing the C<Capture> to its C<callwith> method: # Transparently redirect all calls to &thermo to &other_thermo &thermo.wrap( sub (|args) { &other_thermo.callwith(|args) } ); or more briefly: &thermo.wrap( { &other_thermo.callsame } ); Since the method versions of C<callsame>, C<callwith>, C<nextsame>, and C<nextwith> specify an explicit destination, their semantics do not change outside of wrappers. However, the corresponding functions have no explicit destination, so instead they implicitly call the next-most-likely method or multi-sub; see S12 for details. As with any return value, you may capture the returned C<Capture> of C<call> by binding: my |retval := callwith(|args); ... # postprocessing return |retval; Alternately, you may prevent any return at all by using the variants C<nextsame> and C<nextwith>. Arguments are passed just as with C<callsame> and C<callwith>, but a tail call is explicitly enforced; any code following the call will be unreached, as if a return had been executed there before calling into the destination routine. Within an ordinary method dispatch these functions treat the rest of the dispatcher's candidate list as the wrapped function, which generally works out to calling the same method in one of our parent (or older sibling) classes. Likewise within a multiple dispatch the current routine may defer to candidates further down the candidate list. Although not necessarily related by a class hierarchy, such later candidates are considered more generic and hence likelier to be able to handle various unforeseen conditions (perhaps). Note that all routines are (by default) considered to be candidates for inlining and constant folding. The optimizer is allowed to start making these optimizations after the main program's C<LINK> time, but not before. After any routine is "hard" inlined or constant folded, it is explicitly retyped as immutable; any attempt to wrap an immutable routine will result in failure of the wrap call. An immutable routine is so marked by recasting to type C<HardRoutine>, a subclass of C<Routine>. On the other hand, it is also possible to explicitly mark a routine as mutable, and then the ability to wrap it must be preserved even after C<LINK> time. This is done by recasting to C<SoftRoutine>. Explicitly marking a routine as either mutable or immutable should be considered permanent. It is still possible to inline soft routines, but only if the possibility of indirection is detected inline as well, and provision made (either inline or via external rewriting) for dealing with any wrappers. Hence, any routine marked as soft before C<LINK> time is exempt from hard inlining or folding. There are several methods of marking a routine as soft, however no method is provided for marking routines as hard, since that is the job of the optimizer. A routine may be marked as soft: =over =item * if it is declared using "our" explicitly =item * if it is mentioned in the argument list of a C<use soft> pragma, =item * if its name matches any wildcard pattern (TBD) in a C<use soft>, =item * if the module or class in which it is defined is mentioned in a C<use soft>, =item * or if there is a general C<use soft *;> declaration, which basically turns on AOP for everything at run time. Be aware that this may turn your optimizer into more of a "pessimizer". =back For any normal standalone application, any C<use soft> pragma applies to the entire program in which it participates, provided it is performed before C<LINK> time. The optimizer may then harden anything that was not requested to remain soft. A plug-in system, such as for a web server, may choose either to allow individual plug-ins to behave as independent programs by letting the optimizer harden individual plug-ins independently, or treat all plug-ins as a part of the same program by softening all plug-ins. (Similar considerations apply to optimizing classes to closed/final.) Note that installing a wrapper before C<LINK> time is specifically I<not> one of the ways to mark a routine as soft. Such a routine may still be hardened at C<LINK> time despite being wrapped during compile time. =head2 The C<&?ROUTINE> object C<&?ROUTINE> is always an alias for the lexically innermost C<Routine> (which may be a C<Sub>, C<Method>, or C<Submethod>), so you can specify recursion on an anonymous sub: my $anonfactorial = sub (Int $n) { return 1 if $n<2; return $n * &?ROUTINE($n-1); }; You can get the current routine name by calling C<&?ROUTINE.name>. Outside of any sub declaration, this call returns failure. Note that C<&?ROUTINE> refers to the current single sub, even if it is declared C<multi>. To redispatch to the entire suite under a given short name, just use the named form to call the C<proto>, since there are no anonymous C<multi>s. =head2 The C<&?BLOCK> object C<&?BLOCK> is always an alias for the current block, so you can specify recursion on an anonymous block: my $anonfactorial = -> Int $n { $n < 2 ?? 1 !! $n * &?BLOCK($n-1) }; C<&?BLOCK.labels> contains a list of all labels of the current block. This is typically matched by saying if &?BLOCK.labels.any eq 'Foo' {...} If the innermost lexical block happens to be the main block of a C<Routine>, then C<&?BLOCK> just returns the C<Block> object, not the C<Routine> object that contains it. [Note: to refer to any C<$?> or C<&?> variable at the time the sub or block is being compiled, use the C<< COMPILING:: >> pseudopackage.] =head2 Priming Every C<Callable> object has a C<.assuming> method, which does partial function application, aka I<priming>. This method does a partial binding of a set of arguments to a signature and returns a new function that takes only the remaining arguments. &textfrom := &substr.assuming(str=>$text, len=>Inf); or equivalently: &textfrom := &substr.assuming(:str($text) :len(Inf)); or even: &textfrom := &substr.assuming :str($text):len(Inf); It returns a C<Callable> object that implements the same behaviour as the original subroutine, but has the values passed to C<.assuming> already bound to the corresponding parameters: $all = textfrom(0); # same as: $all = substr($text,0,Inf); $some = textfrom(50); # same as: $some = substr($text,50,Inf); $last = textfrom(-1); # same as: $last = substr($text,-1,Inf); Position parameters may also be primed. To skip a position argument, pass a C<*>, which is handled specially by C<.assuming>. Passing a C<Nil> causes priming with the default argument supplied with the parameter at that position in the signature, or the (presumably undefined) default value for the container associated with the parameter if the parameter in question has no default. C<Nil> may also be passed to a named argument to force priming to the default. The result of a C<use> statement is a (compile-time) object that also has a C<.assuming> method, allowing the user to bind parameters in all the module's subroutines/methods/etc. simultaneously: (use IO::Logging).assuming(logfile => ".log"); This special form should generally be restricted to named parameters. To prime a particular C<multi> variant, it may be necessary to specify the type for one or more of its parameters to pick out a single function: &woof ::= &bark:(Dog).assuming :pitch<low>; &pine ::= &bark:(Tree).assuming :pitch<yes>; =head2 Macros Macros are functions or operators that are called by the compiler as soon as their arguments are parsed (if not sooner). The syntactic effect of a macro declaration or importation is always lexically scoped, even if the name of the macro is visible elsewhere. As with ordinary operators, macros may be classified by their grammatical category. For a given grammatical category, a default parsing rule or set of rules is used, but those rules that have not yet been "used" by the time the macro keyword or token is seen can be replaced by use of "is parsed" trait. (This means, for instance, that an infix operator can change the parse rules for its right operand but not its left operand.) In the absence of a signature to the contrary, a macro is called as if it were a method on the current match object returned from the grammar rule being reduced; that is, all the current parse information is available by treating C<self> as if it were a C<$/> object. [Conjecture: alternate representations may be available if arguments are declared with particular AST types.] Macros may return either a string to be reparsed, or a syntax tree that needs no further parsing. The textual form is handy, but the syntax tree form is generally preferred because it allows the parser and debugger to give better error messages. Textual substitution on the other hand tends to yield error messages that are opaque to the user. Syntax trees are also better in general because they are reversible, so things like syntax highlighters can get back to the original language and know which parts of the derived program come from which parts of the user's view of the program. Nevertheless, it's difficult to return a syntax tree for an unbalanced construct, and in such cases a textual macro may be a clearer expression of the evil thing you're trying to do. If you call a macro at runtime, the result of the macro is automatically evaluated again, so the two calls below print the same thing: macro f { '1 + 1' } say f(); # compile-time call to &f say &f(); # runtime call to &f A compile-time call to a macro before its definition is erroneous. =head2 Quasiquoting In aid of returning syntax tree, Perl provides a "quasiquoting" mechanism using the quote C<quasi>, followed by a block intended to represent an AST: return quasi { say "foo" }; Modifiers to the C<quasi> can modify the operation: :ast(MyAst) # Default :ast(AST) :lang(Ruby) # Default :lang($?PARSER) :unquote<[: :]> # Default "triple rule" Within a quasiquote, variable and function names resolve according to the lexical scope of the macro definition. Unrecognized symbols raise errors when the macro is being compiled, I<not> when it's being used. Use of a macro argument in a quasiquote without unquoting should provide a warning, as this is very likely to be an error. # Oops; size of the AST of the argument macro mouse ($arg) { quasi { $arg.elems } } To make a symbol resolve to the (partially compiled) scope of the macro call, use the C<COMPILING::> pseudo-package: macro moose () { quasi { $COMPILING::x } } moose(); # macro-call-time error my $x; moose(); # resolves to 'my $x' If you want to mention symbols from the scope of the macro call, use the import syntax as modifiers to C<quasi>: :COMPILING<$x> # $x always refers to $x in caller's scope :COMPILING # All free variables fallback to caller's scope If those symbols do not exist in the scope of the compiling scope, a compile-time exception is thrown at macro call time. Similarly, in the macro body you may either refer to the C<$x> declared in the scope of the macro call as C<$COMPILING::x>, or bind to them explicitly: my $x := $COMPILING::x; You may also use an import list to bind multiple symbols into the macro's lexical scope: require COMPILING <$x $y $z>; Note that you need to use the run-time C<require> form, not C<use>, because the macro caller's compile-time is the macro's runtime. =head2 Splicing Bare AST variables (such as the arguments to the macro) may not be spliced directly into a quasiquote because they would be taken as normal bindings. Likewise, program text strings to be inserted need to be specially marked or they will be bound normally. To insert an "unquoted" expression of either type within a quasiquote, use the quasiquote delimiter tripled, typically a bracketing quote of some sort: return quasi { say $a + {{{ $ast }}} } return quasi [ say $a + [[[ $ast ]]] ] return quasi < say $a + <<< $ast >>> > return quasi ( say $a + ((( $ast ))) ) The delimiters don't have to be bracketing quotes, but the following is probably to be construed as Bad Style: return quasi / say $a + /// $ast /// / (Note to implementors: this must not be implemented by finding the final closing delimiter and preprocessing, or we'll violate our one-pass parsing rule. Perl 6 parsing rules are parameterized to know their closing delimiter, so adding the opening delimiter should not be a hardship. Alternately the opening delimiter can be deduced from the closing delimiter. Writing a rule that looks for three opening delimiters in a row should not be a problem. It has to be a special grammar rule, though, not a fixed token, since we need to be able to nest code blocks with different delimiters. Likewise when parsing the inner expression, the inner parser subrule is parameterized to know that C<}}}> or whatever is its closing delimiter.) Unquoted expressions are inserted appropriately depending on the type of the variable, which may be either a syntax tree or a string. (Again, syntax tree is preferred.) The case is similar to that of a macro called from within the quasiquote, insofar as reparsing only happens with the string version of interpolation, except that such a reparse happens at macro call time rather than macro definition time, so its result cannot change the parser's expectations about what follows the interpolated variable. Hence, while the quasiquote itself is being parsed, the syntactic interpolation of an unquoted expression into the quasiquote always results in the expectation of an operator following the unquote. (You must use a call to a submacro if you want to expect something else.) Of course, the macro definition as a whole can expect whatever it likes afterwards, according to its syntactic category. (Generally, a term expects a following postfix or infix operator, and an operator expects a following term or prefix operator. This does not matter for textual macros, however, since the reparse of the text determines subsequent expectations.) Quasiquotes default to hygienic lexical scoping, just like closures. The visibility of lexical variables is limited to the quasi expression by default. A variable declaration can be made externally visible using the C<COMPILING::> pseudo-package. Individual variables can be made visible, or all top-level variable declarations can be exposed using the C<quasi :COMPILING> form. Both examples below will add C<$new_variable> to the lexical scope of the macro call: quasi { my $COMPILING::new_variable; my $private_var; ... } quasi :COMPILING { my $new_variable; { my $private_var; ... } } (Note that C<:COMPILING> has additional effects described in L</Macros>.) =head1 Other matters =head2 Anonymous hashes vs blocks C<{...}> is always a block. However, if it is completely empty or consists of a single list, the first element of which is either a hash or a pair, it is executed immediately to compose a C<Hash> object. The standard C<pair> list operator is equivalent to: sub pair (*@LIST) { my @pairs; for @LIST -> $key, $val { push @pairs, $key => $val; } return @pairs; } or more succinctly (and lazily): sub pair (*@LIST) { gather for @LIST -> $key, $val { take $key => $val; } } The standard C<hash> list operator is equivalent to: sub hash (*@LIST) { return { pair @LIST }; } So you may use C<sub> or C<hash> or C<pair> to disambiguate: $obj = sub { 1, 2, 3, 4, 5, 6 }; # Anonymous sub returning list $obj = { 1, 2, 3, 4, 5, 6 }; # Anonymous sub returning list $obj = { 1=>2, 3=>4, 5=>6 }; # Anonymous hash $obj = { 1=>2, 3, 4, 5, 6 }; # Anonymous hash $obj = hash( 1, 2, 3, 4, 5, 6 ); # Anonymous hash $obj = hash 1, 2, 3, 4, 5, 6 ; # Anonymous hash $obj = { pair 1, 2, 3, 4, 5, 6 }; # Anonymous hash =head2 Pairs as lvalues Since they are immutable, Pair objects may not be directly assigned: (key => $var) = "value"; # ERROR However, when binding pairs, names can be used to "match up" lvalues and rvalues, provided you write the left side as a signature using C<:(...)> notation: :(:who($name), :why($reason)) := (why => $because, who => "me"); (Otherwise the parser doesn't know it should parse the insides as a signature and not as an ordinary expression until it gets to the C<:=>, and that would be bad. Alternately, the C<my> declarator can also force treatment of its argument as a signature.) =head2 Out-of-scope names C<< GLOBAL::<$varname> >> specifies the C<$varname> declared in the C<*> namespace. Or maybe it's the other way around... C<< CALLER::<$varname> >> specifies the C<$varname> visible in the dynamic scope from which the current block/closure/subroutine was called, provided that variable carries the "C<dynamic>" trait. (All variables with a C<*> twigil are automatically marked with the trait. Likewise certain implicit lexicals (C<$_>, C<$/>, and C<$!>) are so marked.) C<< DYNAMIC::<$varname> >> specifies the C<$varname> visible in the innermost dynamic scope that declares the variable with the "C<is dynamic>" trait or with a name that has the C<*> twigil. C<< MY::<$varname> >> specifies the lexical C<$varname> declared in the current lexical scope. C<< OUR::<$varname> >> specifies the C<$varname> declared in the current package's namespace. C<< COMPILING::<$varname> >> specifies the C<$varname> declared (or about to be declared) in the lexical scope currently being compiled. C<< OUTER::<$varname> >> specifies the C<$varname> declared in the lexical scope surrounding the current lexical scope (i.e. the scope in which the current block was defined). =head2 Declaring a C<MAIN> subroutine Ordinarily a top-level Perl "script" just evaluates its anonymous mainline code and exits. During the mainline code, the program's arguments are available in raw form from the C<@*ARGS> array. At the end of the mainline code, however, a C<MAIN> subroutine will be called with whatever command-line arguments remain in C<@*ARGS>. This call is performed if and only if: =over =item a) the compilation unit was directly invoked rather than by being required by another compilation unit, and =item b) the compilation unit declares a C<Routine> named "C<MAIN>", and =item c) the mainline code is not terminated prematurely, such as with an explicit call to C<exit>, or an uncaught exception. =back The command line arguments (or what's left of them after mainline processing) is magically converted into a C<Capture> and passed to C<MAIN> as its arguments, so switches may be bound as named args and other arguments to the program may be bound to positional parameters or the slurpy array: sub MAIN ($directory, :$verbose, *%other, *@filenames) { for @filenames { ... } } Each incoming argument is automatically passed through the C<val()> function, which will attempt to intuit the types of the textual arguments such that they may be used in multimethod dispatch. If C<MAIN> is declared as a set of C<multi> subs, multi dispatch is performed, and the type information intuited by C<val()> may be used to distinguish the different signatures: multi MAIN (Int $i) {...} # foo 1 multi MAIN (Rat $i) {...} # foo 1/2 multi MAIN (Num $i) {...} # foo 1e6 multi MAIN ($i) {...} # foo bar As with module and class declarations, a sub declared with the C<unit> declarator (and ending in semicolon) is allowed at the outermost file scope if it is the first such declaration, in which case the rest of the file is the body: unit sub MAIN ($directory, :$verbose, *%other, *@filenames); for @filenames { ... } This form is allowed only for simple subs named C<MAIN> that are intended to be run from the command line. A C<proto> or C<multi> definition may not be written in semicolon form, nor may C<MAIN> subs within a module or class be written in semicolon form. (A C<MAIN> routine is allowed in a module or class, but is not usually invoked unless the file is run directly (see a above). This corresponds to the "unless caller" idiom of Perl 5.) In general, you may have only one semicolon-style declaration that controls the whole file. If an attempted dispatch to C<MAIN> fails, the C<USAGE> routine is called. If there is no C<USAGE> routine, a default message is printed to standard error. If C<--help> is passed as a command line option to the program, the usage message is printed to standard output instead. This usage message is automatically generated from the signature (or signatures) of C<MAIN>. This message is generated at compile time, and hence is available at any later time as C<$?USAGE> (EDIT: variable renamed to C<$*USAGE> and message is generated on-demand, at runtime). Common Unix command-line conventions are mapped onto the capture as follows: Assuming C<-n> is the short name for C<--name>, On command line... $*ARGS capture gets... # Short names -n :name -n=value :name<value> -n="spacey value" :name«'spacey value'» -n='spacey value' :name«'spacey value'» -n=val1,'val 2',etc :name«val1 'val 2' etc» # Long names --name :name --name=value :name<value> --name="spacey value" :name«'spacey value'» --name "spacey value" :name«'spacey value'» --name='spacey value' :name«'spacey value'» --name=val1,'val 2',etc :name«val1 'val 2' etc» -- # end named argument processing # Negation --/name :!name --/name=value :name<value> but False --/name="spacey value" :name«'spacey value'» but False --/name='spacey value' :name«'spacey value'» but False --/name=val1,'val 2',etc :name«val1 'val 2' etc» but False # Native :name :name :/name :!name :name=value :name<value> :name="spacey value" :name«'spacey value'» :name='spacey value' :name«'spacey value'» :name=val1,'val 2',etc :name«val1 'val 2' etc» Exact Perl 6 forms are okay if quoted from shell processing: ':name<value>' :name<value> ':name(42)' :name(42) For security reasons, only constants are allowed as arguments, however. The default C<Capture> mapper pays attention to declaration of C<MAIN>'s parameters to resolve certain ambiguities. A C<--foo> switch needs to know whether to treat the next word from the command line as an argument. (Allowing the spacey form gives the shell room to do various things to the argument.) The short C<-foo> form never assumes a separate argument, and you must use C<=>. For the C<--foo> form, if there is a named parameter corresponding to the switch name, and it is of type C<Bool>, then no argument is expected. Otherwise an argument is expected. If the parameter is of a non-slurpy array type, all subsequent words up to the next command-line switch (or the end of the list) are bound to that parameter. As usual, switches are assumed to be first, and everything after the first non-switch, or any switches after a C<-->, are treated as positionals or go into the slurpy array (even if they look like switches). Other policies may easily be introduced by calling C<MAIN> explicitly. For instance, you can parse your arguments with a grammar and pass the resulting C<Match> object as a C<Capture> to C<MAIN>: @*ARGS ~~ /<MyGrammar::top>/; MAIN(|$/); exit; sub MAIN ($frompart, $topart, *@rest) { if $frompart<foo> { ... } if $topart<bar><baz> { ... } } This will conveniently bind top-level named matches to named parameters, but still give you access to nested matches through those parameters, just as any C<Match> object would. Of course, in this example, there's no particular reason the sub has to be named C<MAIN>. To give both a long and a short switch name, you may use the pair notation to install several names for the same parameter. If any of the names is a single character, it will be considered a short switch name, while all other parameters names are considered as long switch name. So if the previous declaration had been: sub MAIN (:f(:$frompart), :t(:$topart), *@rest) then you could invoke the program with either C<-f> or C<--frompart> to specify the first parameter. Likewise you could use either C<-t> or C<--topart> for the second parameter. =head2 Relationship of MAIN routine with lexical setting The preceding section describes the use of C<MAIN> in the user's code. There may also be an implicit C<MAIN> routine supplied by the setting of the current compilation unit. (The C<-n> and C<-p> command-line switches are implemented this way.) In this case the user's mainline code is not automatically executed; instead, execution is controlled by the setting's C<MAIN> routine. That routine calls C<{YOU_ARE_HERE}> at the point where the user's code is to be lexically inserted (in the abstract). A setting may also call C<{YOU_ARE_HERE}> outside of a C<MAIN> routine, in which case it functions as a normal setting, and the C<{YOU_ARE_HERE}> merely indicates where the user's code goes logically. (Or from the compiler's point of view, which the lexical scope to dump a snapshot of for later use by the compiler as the setting for a different compilation unit.) In this case the execution of the user code proceeds as normal. In fact, the C<CORE> setting ends with a C<{YOU_ARE_HERE}> to dump the C<CORE> lexical scope as the standard setting. In this sense, C<CORE> functions as an ordinary prelude. If a C<MAIN> routine is declared both in the setting and in the user's code, the setting's C<MAIN> functions as the actual mainline entry point. The user's C<MAIN> functions in an embedded fashion; the setting's invocation of C<{YOU_ARE_HERE}> functions as the main invocation from the point of view of the user's code, and the user's C<MAIN> routine will be invoked at the end of each call to C<{YOU_ARE_HERE}>. =head2 Implementation note on autothreading of only subs The natural way to implement autothreading for C<multi> subs is to simply have the junctional signatures (the ones that can accept C<Mu> or junction as well as C<Any> parameters) match more loosely than the non-autothreading versions, and let multiple dispatch find the appropriate sub based on the signature. Those generic routines then end up redispatching to the more specific ones. On the other hand, the natural implementation of C<only> subs is to call the sub in question directly for efficiency (and maybe even inline it in some cases). That efficiency is, after all, the main reason for not just making all subs C<multi>. However, this direct call conflicts with the desire to allow autothreading. It might be tempting to simply make everything multi dispatch underneath, and then say that the C<only> declaration merely means that you get an error if you redeclare. And maybe that is a valid approach if the multiple dispatch mechanism is fast enough. However, a direct call still needs to bind its arguments to its parameters correctly, and it has to handle the case of failure to bind somehow. So it is also possible to implement autothreading of C<only> subs based on failover from the binding failure. This could either be a one-shot failover followed by a conversion to a C<multi> call, or it could failover every time you try to autothread. If we assume that junctional processing is likely to be fairly heavyweight most of the time compared to the cost of failing to bind, that tends to argue for failing over every time. This is also more conducive to inlining, since it's difficult to rewrite inlined calls. In any case, nowadays a C<proto> declaration is considered to be a kind of C<only> sub, and needs to handle autothreading similarly if the signature of the C<proto> excludes junctions. =head2 Introspection This section describes the methods implemented by the routine objects that allow introspection of the inner works of that routine. =head3 Routine =over =item .candidates This method returns a (potentially lazy) list of the candidates associated with the current routine. An "only" routine should return a list with itself as the single item. =item .signature This method returns the signature of the current routine. =item .cando($capture) This method returns a (potentially lazy) list of the candidates that match the given capture, ordered by goodness of match, with the best match first. =item .push($candidate) Adds C<$candidate> to the list of candidates for this C<proto>, calling this method in an C<only> routine should result in a failure. It is also accepted for C<multi>s declared in the source code to finalize the list of candidates and also return a failure here. But C<Proto>s created by calling C<Proto.new()> should be able add candidates at run-time. =back =head3 Signature See section L</Signature Introspection>. =head1 AUTHORS Damian Conway <damian@conway.org> Allison Randal <al@shadowed.net> Larry Wall <larry@wall.org> Daniel Ruoso <daniel@ruoso.com> =for vim:set expandtab sw=4: