=encoding utf8 =head1 TITLE DRAFT: Synopsis 32: Setting Library - Str =head1 VERSION Created: 19 Mar 2009 (extracted from S29-functions.pod) Last Modified: 2015-07-24 Version: 13 The document is a draft. =head1 Str General notes about strings: The C class contains strings encoded at the NFG level. Other standard Unicode normalizations can be found in their appropriately-named types: C, C, C, and C. The C type contains a string in a mixture of normalizations (i.e. not normalized). S15 describes these in more detail. The following are all provided by the C class, as well as related classes: =over =item chop multi method chop(Str $string: $n = 1 --> Str) is export Returns string with an optional number of characters removed from the end. Defaults to removing one character. =item chomp multi method chomp(Str $string: --> Str) is export Returns string with one newline removed from the end. An arbitrary terminator can be removed if the input filehandle has marked the string for where the "newline" begins. (Presumably this is stored as a property of the string.) Otherwise a standard newline is removed. Note: Most users should just let their I/O handles autochomp instead. (Autochomping is the default.) =item lc multi method lc(Str $string: --> Str) is export Returns the input string after forcing each character to its lowercase form. Note that one-to-one mapping is not in general guaranteed; different forms may be chosen according to context. =item uc multi method uc(Str $string: --> Str) is export Returns the input string after forcing each character to its uppercase (not titlecase) form. Note that one-to-one mapping is not in general guaranteed; different forms may be chosen according to context. =item fc multi method fc(Str $string: --> Str) is export Does a Unicode "fold case" operation suitable for doing caseless string comparisons. (In general, the returned string is unlikely to be useful for any purpose other than comparison.) =item tc multi method tc(Str $string: --> Str) is export Converts the first character of a string to titlecase form, leaving the rest of the characters unchanged, then returns the modified string. If there is no titlecase mapping for the first character, the entire string is returned unchanged. In any case, this function never changes any character after the first. (It is like the old Perl 5 C function in that respect.) =item tclc multi method tclc(Str $string: --> Str) is export Forces the first character of a string to titlecase and the rest of the characters to lowercase, then returns the modified string. =item wordcase multi method wordcase(Str $string: :&filter = &tclc, :$where = True --> Str) is export Performs a substitutional mapping of each word in the string, defaulting to the C mapping. Words are defined as Perl 6 identifiers, hence admit hyphens and apostrophes when followed by a letter. (Note that trailing apostrophes don't matter when casemapping.) The following should have the same result: .wordcase; .subst(:g, / + % <[ \- ' ]> /, *.Str.tclc) The C function is always applied to the first and last word, and additionally to any intermediate word that smartmatches with the C parameter. Assuming suitable definitions of word lists, standard English capitalization might be handled with something like this: my $where = none map *.fc, @conjunctions, @prepositions; .wordcase(:$where); (Note that the "standard" authorities disagree on the prepositions!) [XXX: Is case-insensitive matching on C's part necessary?] The smartmatching is done case insensitively, so you should store your exceptions in C form. If the C smartmatch does not match, then the word will be forced to lowercase. There is no provision for an alternate regex; if you need a custom word recognizer, you can write your own C<.subst> as above. =item samecase multi method samecase(Str $string: Str $pattern --> Str) is export Has the effect of making the case of the string match the case pattern in C<$pattern>. (Used by s:ii/// internally, see L.) =item samemark multi method samemark(Str $string: Str $pattern --> Str) is export Has the effect of making the case of the string match the marking pattern in C<$pattern>. (Used by s:mm/// internally, see L.) =item length This method does not exist in Perl 6. You must use either C or C, depending on what kind of count you need. =item chars multi method chars(Str $string: --> Int) is export Returns the number of characters in the string. For C this corresponds to the number of graphemes, for other types this is equivalent to C. =item codes multi method codes(Str $string: --> Int) is export Returns the number of codepoints in the string. For C this corresponds to the number of characters as if it were an C type string. =item bytes Gone. Use C<$str.encode($encoding).bytes> instead. =item encode multi method encode($encoding = $?ENC --> Buf) Returns a C which represents the original string in the given encoding. The actual return type is as specific as possible, so C<$str.encode('UTF-8')> returns a C object, C<$str.encode('ISO-8859-1')> a C. C is functionally equivalent to C. If you mean one of the other normalization forms, convert the C to the appropriate type first. =item index multi method index(Str $string: Str $substring, Int $pos) is export C searches for the first occurrence of C<$substring> in C<$string>, starting at C<$pos>. If the substring is found, then the value returned represents the position of the first character of the substring. If the substring is not found, C is returned. Do not evaluate it as a number, because that will assume <0> and issue a warning. [Note: if C<$substring> is not of the same string type as C<$string>, should that cause an error, or should C<$substring> be converted to C<$string>'s type?] =item pack multi pack(*@items where { all(@items) ~~ Pair } --> buf8) multi pack(Str $template, *@items --> buf8) C takes a list of pairs and formats the values according to the specification of the keys. Alternately, it takes a string C<$template> and formats the rest of its arguments according to the specifications in the template string. The result is a sequence of bytes. Templates are strings of the form: grammar Str::PackTemplate { regex TOP { ^