---
title: Scala
published: October 12, 2013
excerpt: Promising mixture of OO and FP
comments: off
toc: left
---
I've been meaning to learn Scala for some time. Haskell has really left me wanting to program in a very functional style, and it seems like Scala is used more in the industry than Haskell. Scala seems to provide a decent compromise in a language that mixes object-oriented programming with functional programming characteristics. Furthermore, the JVM is battle-tested and probably the most robust virtual machine of any language out there at the moment.
*[JVM]: Java Virtual Machine
* toc
# Basics
The syntax for a function definition is:
``` scala
def max(a: Int, y: Int): Int = if (x > y) x else y
```
If a function only takes one parameter, it can be called without parentheses. Functions can use special characters such as `+` and `-`, and can transform:
``` scala
1 + 2
(1).+(2)
```
In fact, any function that takes at least two parameters can be called in infix operator notation. If it takes more than two parameters, the right hand side should apply those within parentheses.
A unary operator can be defined by defining a method with `unary_` prepending the name.
If the function ends with `:`, then the code is transformed in reverse:
``` scala
a :: b
b.::(a)
```
A class can be defined as:
``` scala
class MyClass(index: Int, name: String)
```
This also defines a constructor that takes integer and string parameters.
Values can be defined in two ways: `val` are immutable values and `var` are mutable variables.
Arrays are indexed using parentheses, not brackets as in other languages. They're defined as:
``` scala
val strings = new Array[String](3)
strings(0) = "Hello"
strings(1) = "World"
for (i <- 0 to 2)
print(strings(i))
```
Lambdas are of the form:
``` scala
args.foreach((arg: String) => println(arg))
```
Though Scala allows partial function application, so the above can be shortened to:
``` scala
args.foreach(println)
```
When parentheses are applied to a variable with one or more parameters, Scala transforms the code into an invocation of the `apply` method on that variable:
``` scala
strings(i)
strings.apply(i)
```
On the other hand, when a variable with applied parentheses and arguments is assigned to, the code is transformed to a call to the `update` method:
``` scala
strings(i) = "assignment"
strings.update(i, "assignment")
```
In Scala, Arrays are mutable while Lists are immutable. With Lists, `::` is cons and `:::` is concatenation.
Tuples are declared in the usual form, as in Haskell. However, elements are accessed using `_n` where n is the 1-based number of the element.
The `->` method returns a two-element tuple of the left and right parameters, and is used for example to insert items into a `Map`:
``` scala
val treasureMap = Map[Int, String]()
treasureMap += (1 -> "Go to island")
```
The `Unit` type is equivalent to Haskell's `()` or `void` in C++.
Scala has symbols similar to Ruby's `:symbols` which are defined as:
``` scala
val someKey = 'symbol
```
The `yield` keyword can be used in conjunction with `for` loops to generate new collections:
``` scala
val shouts = for (arg <- args) yield arg + "!"
```
Raw strings are possible by using three successive double quote delimiters:
``` scala
val rawstr = """\d+"""
```
# Classes and Objects
Objects have members consisting of fields (data) and methods (code). Fields can be declared private.
If you assign an object to a `val`, you can mutate the object but not reassign the `val` to another object. Unit methods can leave off the result type and the equals sign:
``` scala
def add(b: Byte): Unit = sum += b
def add(b: Byte) { sum += b }
```
The semicolon insertion rules state that a semicolon is inserted at the end of a line if:
1. the line ends in a word that would not be legal as the end of a statement (e.g. period or infix operator)
2. next line begins with a word that cannot start a statement
3. line ends while inside parentheses or brackets, because these cannot contain multiple statements anyway
Singleton objects are defined using the `object` keyword and their members are as if `static` members of C++ classes. These cannot be instantiated.
``` scala
object Singleton {
private val num = 0
}
```
When a singleton object is named after an existing class, it is referred to as the class' _companion object_ [^companion_ruby]. They must both be defined in the same source file. They can access each other's private members.
[^companion_ruby]: Reminds me of Ruby's 'EigenClasses', but I'm not quite sure yet if it's indeed similar, or if companion objects truly are just a separation for specifying class-wide values/methods.
A class is defined using the class keyword, and it can take parameters:
``` scala
class Rational(num: Int, den: Int)
```
Any code inside the class body will be put inside the _primary constructor_.
_Fields_, accessible publicly, are created by defining class-level values:
``` scala
class Rational(num: Int, den: Int) {
val numer: Int = num
val denom: Int = den
}
```
_Auxiliary constructors_ are additional constructors that simply directly or indirectly delegate object construction to the primary constructor. They are named after `this`, and their first action must be invoking another constructor:
``` scala
class Rational(num: Int, den: Int) {
def this(n: Int) = this(n, 1)
}
```
Fields and methods can be defined private using the `private` keyword.
_Literal identifiers_ are ones that are enclosed in backticks, so that any string that would be accepted by the runtime will result in an identifier.
Implicit conversions can be defined so that values of a certain time are implicitly converted to another type in order for an operation to go through. For example, if there's an addition function for `Rational` that takes a `Rational` and an Integer, we can do `r + 1` but not `1 + r` since that would attempt to invoke the function on the Integer itself. To solve this, we can define an implicit conversion to `Rational` on Integer:
``` scala
implicit def intToRational(x: Int) = new Rational(x)
```
# Functions
## Partial Application
It's not possible to assign methods or nested functions to values, or pass them as arguments to other functions. If this is intended, one can use the explicit partial application syntax:
``` scala
val partial = some.method _
```
This syntax creates an ephemeral class [^cpp_partialapplication] that defines an `apply` method that takes the appropriate amount of arguments. When the partially applied value is called, the arguments are forwarded to the underlying function and its result is returned.
[^cpp_partialapplication]: Reminds me of `std::bind` in C++11, which does the same thing by creating a functor, or [function object] (not to be confused with the category theory [Functor] more common in [Haskell]).
[function object]: http://en.wikipedia.org/wiki/Function_object#In_C_and_C.2B.2B
[Functor]: http://en.wikipedia.org/wiki/Functor
[Haskell]: http://www.haskell.org/haskellwiki/Functor
This can also be done for specific arguments, but the type must be stated explicitly, presumably to disambiguate overloads:
``` scala
val partial = func(1, _ : Int, 3)
```
If passing a partially applied function, where all parameters are unapplied, as an argument to another function, partial application can be written implicitly:
``` scala
someNumbers.foreach(println _)
someNumbers.foreach(println)
```
Closures capture values by reference.
Variable arguments can be specified with an asterisk `*`, and are treated as an `Array` inside the function:
``` scala
def echo(args: String*) = for (arg <- args) println(arg)
echo("one", "two", "three")
```
Conversely, an `Array` can be expanded into multiple arguments using the `_*` symbol:
``` scala
val arr = Array("one", "two", "three")
echo(arr: _*)
```
It's possible to pass named parameters to functions:
``` scala
def speed(distance: Float, time: Float): Float = distance / time
speed(time = 10, distance = 100)
```
It's possible to define default parameter values:
``` scala
def printTime(out: java.io.PrintStream) =
out.println("time = " + System.currentTimeMillis())
```
When used in combination with named parameters, it's possible to leave out all parameters except one.
Scala supports tail recursion.
Currying can be made explicit in a function definition by adding multiple parameter lists. This makes it possible to curry functions without having to specify the type of the holes as in partial application above.
``` scala
def curriedSum(x: Int)(y: Int) = x + y
val three = curriedSum(1)(2)
def equivalentSum(x: Int) = (y: Int) => x = y
val three_ = equivalentSum(1)(2)
```
Explicitly defining multiple parameter lists allows using braces for the last parameter:
``` scala
val braces = curriedSum(1) { 2 }
```
Parameters can be passed "by-name" so that parameters can be passed to a function and not evaluated until explicitly done so within the function. The syntax is simply to prepend the parameter with `=>`, I take this to mean that it's wrapping the parameter in a parameter-less lambda like `()` so that the expression passed as the parameter isn't evaluated until explicitly done so within the function:
``` scala
def byNameAssert(predicate: => Boolean) =
if (assertionsEnabled && !predicate)
throw new AssertionError
byNameAssert(5 > 3)
```
# Composition and Inheritance
Abstract classes are defined using the abstract keyword:
``` scala
abstract class Element {
def contents: Array[String]
def height: Int = contents.length
def width: Int = if (height == 0) 0 else contents(0).length
}
```
Methods themselves are abstract if they have no defined implementation.
Parameterless methods are those that take no parameters and their parentheses are omitted. The convention is to omit the parentheses when there are no parameters and the method accesses mutable state only by reading its class' fields. Further, empty parentheses in function calls can be omitted. The convention is to only omit them if they have no side-effects. I/O calls for example should always explicitly contain the parentheses.
Classes can be extended other classes. This allows the sub-class to inherit all non-private members of the parent class, and makes the sub-class a sub-type of the parent type. All classes implicitly extend from `scala.AnyRef` which is equivalent to Java's `java.lang.Object`:
``` scala
class ArrayElement(conts: Array[String]) extends Element {
def contents: Array[String] = conts
}
```
Class fields and methods belong to the same namespace, which makes it possible for a subclass to override one with the other:
``` scala
class ArrayElement(conts: Array[String]) extends Element {
val contents: Array[String] = conts
}
```
Parametric fields are fields that are initialized through the primary constructor:
``` scala
class ArrayElement(
val contents: Array[String]
) extends Element
```
It's possible to invoke superclass constructors by simply supplying the parameter in the extends section:
``` scala
class LineElement(s: String) extends ArrayElement(Array(s)) {
override def width = s.length
override def height = 1
}
```
Overriding must be explicitly denoted.
The `final` keyword can be used to prevent subclass overriding on a particular method:
``` scala
class A extends B {
final override def method() { println "and that's final" }
}
```
The `final` keyword can also be used to prevent subclassing of a particular class entirely:
``` scala
final class A extends B {
override def method() { println "final class" }
}
```
# Hierarchy
Every class is a direct or indirect subclass of `Any`, which defines a variety of "universal methods" available to all subclasses. The `==` method is final and simply calls `equals`, which is the method that subclasses should override.
`Any` has two direct subclasses. `AnyVal` is the parent class of every built-in value class in Scala: `Byte`, `Short`, `Char`, `Int`, `Long`, `Float`, `Double`, `Boolean`, and `Unit`. These can't be instantiated with `new` because they are defined as final and abstract. `AnyRef` is the base class of all reference classes, it's just an alias for `java.lang.Object`.
There are two "bottom" types, which are subclasses of every kind of type. `Null` is the type of the `null` reference and it inherits from any class that inherits from `AnyRef`. `Nothing` is a type that simply signifies abnormal termination. For example, the `error` function is of type `Nothing`. Since it's a subclass of any class, the `error` function can be called from within any other function regardless of its type.
# Traits
Traits encapsulate methods and fields which can be mixed into other classes [^ruby_mixins]. Traits are defined as:
[^ruby_mixins]: Reminds me of Ruby's modules that can be included.
``` scala
trait SomeTrait {
def someMethod() {
println("printing")
}
}
```
A trait can subclass a certain class, which essentially specifies a constraint on the trait such that it can only be mixed into classes that are or subclass the extended class.
Traits can be mixed in using either the `extends` or `with` keywords, multiple mixins are simply chained using `with`. They also defined types, so a value of a certain trait type can be set to any object whose class mixes-in the particular trait.
Traits can override methods implemented in the classes they're mixed into. This is accomplished with the `abstract override` directive. This combination of specifiers conveys the fact that it overrides a method that it's mixed into (hence `override`), and therefore that method must be defined in the class it's mixed into (hence `abstract`). Methods with these qualifiers can use `super` to access the class they're mixed into, specifically the same method they're overriding:
``` scala
class BasicIntQueue extends IntQueue {
private val buf = new ArrayBuffer[Int]
def get() = buf.remove(0)
def put(x: Int) { buf += x }
}
trait Doubling extends IntQueue {
abstract override def put(x: Int) { super.put(2 * x) }
}
```
Multiple mixins can be "stacked," in which case the same function is called in reverse order of the mixin list. That is, the right-most trait is treated as if a "superclass" of the one to its left, and `super` has that effect.
# Packages
The `package` statement is used as in Java to specify that what follows is to be part of the specified package. However, it can also be used with braces to only insert the contained code within the package.
The `import` statement can be used to import packages and symbols in different ways:
``` scala
def showFruit(fruit: Fruit) {
import fruit._
println(name + "s are " + color)
}
// only Apple and Orange
import Fruits.{Apple, Orange}
// everything, but rename Apple to McIntosh
import Fruits.{Apple => McIntosh, _}
// everything except Pear
import Fruits.{Pear => _, _}
```
Every Scala file implicitly imports the following packages:
``` scala
import java.lang._
import scala._
import Predef._
```
# Testing
Assertions can be made with `assert` and enabled in the JVM using the command options `-ea` (enable assertions) and `-da`. An alternative function is `ensuring`, which also takes a predicate. The predicate is a function that takes a so-called "result type" and returns a Boolean. If the predicate returns true, then `ensuring` returns the "result type," but if the predicate returns false, then `ensuring` throws an `AssertionError`.
Unit testing is possible with tools such as [ScalaTest](http://www.scalatest.org) and ScalaCheck. ScalaTest tests are functions with names prefixed by `test` inside classes that extend `Suite`:
``` scala
import org.scalatest.Suite
import Element.elem
class ElementSuite extends Suite {
def testUniformElement() {
val ele = elem('x', 2, 3)
assert(ele.width == 2)
}
}
```
ScalaTest has another style of testing with the FunSuite package:
``` scala
import org.scalatest.FunSuite
import Element.elem
class ElementSuite extends FunSuite {
test("elem result should have passed width") {
val ele = elem('x', 2, 3)
assert(ele.width == 2)
}
}
```
The assertions used in the previous two ScalaTest suites would only show an error that the assertion failed, without showing what the two values were. ScalaTest also provides a `===` operator that is more descriptive:
``` scala
assert(ele.width === 2) // "3 did not equal 2"
```
There is also an `expect` method that differentiates between the two values:
``` scala
expect(2) {
ele.width
} // "Expected 2, but got 3"
```
Method `intercept` can verify that an exception is thrown:
``` scala
intercept[IllegalArgumentException] {
elem('x', -2, 3)
}
```
## Behavior-Driven Development
The `FlatSpec` is one of many traits, such as [`FunSpec`](http://www.scalatest.org/getting_started_with_fun_spec), that allow for behavior-driven development (BDD). These traits can be mixed with other traits such as `ShouldMatchers` which allow the use of the `should` function to write tests very naturally as in RSpec:
*[BDD]: Behavior-Driven Development
``` scala
import org.scalatest.FlatSpec
import org.scalatest.matchers.ShouldMatchers
import Element.elem
class ElementSpec extends FlatSpec with ShouldMatchers {
"A UniformElement" should "have a width equal to the passed value" in {
val ele = elem('x', 2, 3)
ele.width should be (2)
}
it should "have a height equal to the passed value" in {
val ele = elem('x', 2, 3)
ele.height should be (3)
}
it should "throw an IAE if passed a negative width" in {
evaluating {
elem('x', 2, 3)
} should produce [IllegalArgumentException]
}
}
```
## Property Testing
[ScalaCheck](http://www.scalacheck.org/) is another testing tool that is similar to Haskell's [QuickCheck](http://hackage.haskell.org/package/QuickCheck), which essentially randomly generates test input data to test with. There is an implication operator `==>` that takes a function that places a constraint on the test data and a condition that must hold true for all data that fits that constraint:
``` scala
import org.scalatest.WordSpec
import org.scalatest.prop.Checkers
import org.scalacheck.Prop._
import Element.elem
class ElementSpec extends WordSpec with Checkers {
"elem result" must {
"have passed width" in {
check((w: Int) => w > 0 ==> (elem('x', w, 3).width == w))
}
}
}
```
# Pattern Matching
_Case Classes_ are like stub classes which contain and are mainly about publicly accessible data. They're very similar to Haskell's algebraic data types (ADT). Consider the following domain specific language (DSL):
*[ADT]: Algebraic Data Type
*[DSL]: Domain Specific Language
``` scala
abstract class Expr
case class Var(name: String) extends Expr
case class Number(num: Double) extends Expr
case class UnOp(operator: String, arg: Expr) extends Expr
case class BinOp(operator: String, left: Expr, right: Expr) extends Expr
```
These case classes define factory methods to avoid having to explicitly write `new`, fields for the data, simple `toString` implementations, and a `copy` method that is very similar to Haskell's record syntax. For example, the following creates a copy of the `op` `BinOp` with the `operator` field changed to `"-"`:
``` scala
op.copy(operator = "-")
```
This is equivalent to Haskell's record syntax:
``` haskell
op { operator = "-" }
```
The main use of case classes is pattern matching. For example, a mathematical expression expressed using the `Expr` DSL constructed above can be simplified as follows:
``` scala
def simplifyTop(expr: Expr): Expr =
expr match {
case UnOp("-", UnOp("-", e)) => e
case BinOp("+", e, Number(0)) => e
case BinOp("*", e, Number(1)) => e
case _ => expr
}
```
This is very similar to other pattern matching features available in Haskell for example. In cases, we can use _constructor patterns_, _constant patterns_, and _variable patterns_. Variable patterns are simply those that begin with lower case characters, but we can force a lowercase identifier to be treated as a constant by surrounding it with backticks.
It's also to use _sequence patterns_. In this case, the `_*` operator can be used to specify "the rest of the sequence," like so:
``` scala
expr match {
case List(0, _*) => println("found it")
case _ =>
}
```
It's also possible to use _type patterns_ to match the type of the expression:
``` scala
def generalSize(x: Any) = x match {
case s: String => s.length
case m: Map[_, _] => m.size
case _ => -1
}
```
_Type Erasure_ means that information about type arguments is not maintained at runtime. This is an artifact of the erasure model of generics that Java uses. As a result, it's not possible to pattern match on the type of `Map`, and the following code will return `true` for any `Map`:
``` scala
def isIntToIntMap(x: Any) = x match {
case m: Map[Int, Int] => true
case _ => false
}
```
It's possible to use `@` for _variable binding_ in pattern matching, just as in Haskell:
``` scala
expr match {
case UnOp("abs", e @ UnOp("abs", _)) => e
case _ =>
}
```
_Pattern guards_ are additional conditions placed on pattern matches. For example, to simplify an addition expression with identical operands to a multiplication by two, we can use:
``` scala
def simplifyAdd(e: Expr) = e match {
case BinOp("+", x, y) if x == y =>
BinOp("*", x, Number(2))
case _ => e
}
```
A _sealed class_ is one that restricts subclassing of that class to the file it's defined in. This way the compiler can guarantee and enforce exhaustive pattern matching, which it couldn't do otherwise because the class could be extended in another file:
``` scala
sealed abstract class Expr
```
If there's a function that is known to only handle a specific subset of subtypes, a catchall case can be added, or the `@unchecked` annotation can be used:
``` scala
def describe(e: Expr): String = (e: @unchecked) match {
case Number(_) => "a number"
case Var(_) => "a variable"
}
```
The `Option` type is equivalent to Haskell's `Maybe` type. It can take on either a parameterized `Some` value or `None`.
A _case sequence_ is a function literal specific defined as a pattern match where each case is an entry point to the function:
``` scala
val withDefault: Option[Int] => Int = {
case Some(x) => x
case None => 0
}
```
This is why it's possible to pass a pattern match directly to the `react` function in the actors library:
``` scala
react {
case (name: String, actor: Actor) => {
actor ! getip(name)
act()
}
case msg => {
println("Unhandled message: " + msg)
act()
}
}
```
If a case sequence doesn't provide exhaustive patterns, it is considered a _partial function_ since it doesn't provide an output for every input. If it's applied to a value that it can't match, it throws a run-time exception.
The parameterized type `PartialFunction` can represent partial functions:
``` scala
val second: PartialFunction[List[Int],Int] = {
case x :: y :: _ => y
}
```
The function `isDefinedAt` can then be used to determine if the function is defined for a particular value:
```
scala> second.isDefinedAt(List(5, 6, 7))
res0: Boolean = true
scala> second.isDefinedAt(List())
res0: Boolean = false
```
The partial function above gets translated to:
``` scala
new PartialFunction[List[Int],Int] {
def apply(xs: List[Int]) = xs match {
case x :: y :: _ => y
}
def isDefinedAt(xs: List[Int]) = xs match {
case x :: y :: _ => true
case _ => false
}
}
```
The actors library's `react` function for example uses partial a partial argument function, since it's defined only for the messages teh caller wants to handle.
# Tuples
Tuples are the same as in Haskell, C++11, Python etc. They are indexed using `._n` where `n` it the nth tuple element. A difference from something like Python is that the following:
``` scala
val word, index = longest
```
Where `longest` is a tuple, ends up assigning the tuple to both `word` and `index`, contrary to what happens in Python. To assign the correct elements as in C++11's `std::tie`, use parentheses to pattern match as in Haskell.
# Stateful Objects
Scala has support for something similar to C# properties. When a non-private `var` is defined in a class, a pair of getter and setters is automatically generated for that variable. For example, given the following declaration:
``` scala
class Time {
var hour = 12
var minute = 0
}
```
Changes the variable to be `private[this]` so that it's only accessible from the object itself, and the getters and setters take on the visibility of the original variable, this in effect restricts all external access to its generated getters and setters. The setter takes the form `x_=(arg)` [^ruby_setter].
[^ruby_setter]: This is of course very similar to setters in Ruby, which take the form `attr=(arg)`.
``` scala
class Time {
private[this] var h = 12
private[this] var m = 0
def hour: Int = h
def hour_=(x: Int) { h = x }
def minute: Int = m
def minute_=(x: Int) { m = x }
}
```
These getters and setters can be defined explicitly in order to encode things such as input validation:
``` scala
class Time {
private[this] var h = 12
private[this] var m = 0
def hour: Int = h
def hour_=(x: Int) {
require(0 <= x && x < 24)
h = x
}
def minute: Int = m
def minute_=(x: Int) {
require(0 <= x && x < 60)
m = x
}
}
```
It's also possible to define getters and setters that aren't backed by a variable, which is particularly useful for converter functions, for example. Note that in the following example, `_` is used to give the `celsius` variable a default value (0 for numeric types) [^monoid_default_value].
[^monoid_default_value]: This reminds me of a [Monoid]'s identity, `mempty` in the Haskell typeclass, but I doubt that `_` is backed by a Monoid typeclass because a Monoid would also require an associative binary operation. Perhaps it's simply more like the [default] package's default typeclass.
[Monoid]: http://hackage.haskell.org/package/base-4.6.0.1/docs/Data-Monoid.html
[default]: http://hackage.haskell.org/package/data-default
``` scala
class Thermometer {
var celsius: Float = _
def fahrenheit = celsius * 9 / 5 + 32
def fahrenheit_=(f: Float) {
celsius = (f - 32) * 5 / 9
}
override def toString = fahrenheight + "F/" + celsius + "C"
}
```
# Type Parameterization
It's possible to hide the primary constructor by making it private. This makes its type usable but not its constructor, which can only be used from the class itself, such as through an auxiliary constructor, or its companion object.
``` scala
class Queue[T] private (
private val leading: List[T],
private val trailing: List[T]
)
```
For example, a factory method can be created in the companion object:
``` scala
object Queue {
def apply[T](xs: T*) = new Queue[T](xs.toList, Nil)
}
```
Another way is to define a _generic trait_, i.e. one that is parameterized, and hide the implementation inside the companion object, along with a factory method in the companion object:
``` scala
trait Queue[T] {
def head: T
def tail: Queue[T]
def enqueue(x: T): Queue[T]
}
object Queue {
def apply[T](xs: T*): Queue[T] = new QueueImpl[T](xs.toList, Nil)
private class QueueImpl[T](
private val leading: List[T],
private val trailing: List[T]
) extends Queue[T] {
def mirror = ...
def tail = ...
def enqueue = ...
}
}
```
## Variance
_Variance_ refers to inheritance relationships of parameterized types, such as whether `Set[String]` is a subtype of `Set[AnyRef]`.
For example, if `S` is a subtype of `T` and `Queue[S]` is considered a subtype of `Queue[T]`, then `Queue` is _covariant_ in its type parameter `T`. This would mean for example that we could pass `Queue[String]` to a method that accepted types `Queue[AnyRef]`.
However, generic types are _nonvariant_ by default, meaning that there would be no such subtype relationship. It's possible to annotate the type parameter as being _covariant_ by prepending the type parameter with `+`:
``` scala
trait Queue[+T] { ... }
```
_Contravariance_ would mean that if `T` is a subtype of `S`, then `Queue[S]` is a subtype of `Queue[T]`. It's also possible to annotate the type parameter to be _contravariant_ by using the `-` annotation:
``` scala
trait Queue[-T] { ... }
```
In order to make `Queue` covariant, it's necessary to specify a lower bound on the `enqueue` method and make it polymorphic as well. The lower bound enforces the requirement that `U` be a supertype of `T`.
``` scala
class Queue[+T] (
private val leading: List[T],
private val trailing: List[T]
) {
def enqueue[U :> T](x: U) = new Queue[U](leading, x :: trailing)
}
```
This means that given supertype `Fruit` and subtypes `Apple` and `Orange`, an `Orange` can be appended to a `Queue[Apple]`, yielding a `Queue[Fruit]` result.
There are also upper bounds which enforce the requirement that a type be a subtype of another, in the example below, it means that `T` must be a subtype of `Ordered[T]`:
``` scala
def orderedMergeSort[T <: Ordered[T]](xs: List[T]): List[T] = ...
```
# Abstract Members
A member of a class or trait is _abstract_ if it doesn't have a complete definition within the class. The implementations are meant to be defined in subclasses. Unlike other object-oriented languages, it's also possible to declare abstract fields and even abstract types. It's possible to declare abstract types, methods, `val`s and `var`s:
``` scala
trait Abstract {
type T
def transform(x: T): T
val initial: T
var current: T
}
```
This can then be implemented in a subclass:
``` scala
class Concrete extends Abstract {
type T = String
def transform(x: String) = x + x
val initial = "hi"
var current = initial
}
```
It's possible to override abstract methods with a concrete `val`, since the `val` will yield the same value every time, whereas the reverse can't be guaranteed.
Abstract `var`s provide implicit getters and setters for certain values.
Class parameter arguments are evaluated _before_ being passed to the class constructor, but concrete `val` definitions are evaluated _after_ the superclass is initialized. The following yields an error for failing to satisfy the requirement, since at the time the requirement is checked, the value of `denomArg` is still 0, since it's defined in the subclass.
``` scala
trait RationalTrait {
val numerArg: Int
val denomArg: Int
require(denomArg != 0)
}
val x = 2
// defines an anonymous class that mixes in RationalTrait
// with its definition enclosed
new RationalTrait {
val numerArg = 1 * x
val denomArg = 2 * x
}
// yields java.lang.IllegalArgumentException: requirement failed
```
Pre-initialized fields are one way to solve this problem by defining certain subclass fields before the superclass is called:
``` scala
// anonymous class
new {
val numerArg = 1 * x
val denomArg = 2 * x
} with RationalTrait
// object definition
object twoThirds extends {
val numerArg = 2
val denomArg = 3
} with RationalTrait
```
The other way to solve this problem is using lazy `val`s, which defers the evaluation of the `val`'s expression until the first time it the `val` is used, much like in Haskell.
``` scala
trait LazyRationalTrait {
val numerArg: Int
val denomArg: Int
lazy val numer = numerArg / g
lazy val denom = denomArg / g
private lazy val g = {
require(denomArg != 0)
gcd(numerArg, denomArg)
}
}
```
Abstract types are useful when there are abstract methods which take on parameters of types specific to the subclass. For example, it wouldn't make sense to have the following because it would mean that we could pass any type of `Food` to any `Animal` subclass.
``` scala
class Food
abstract class Animal {
def eat(food: Food)
}
```
Instead of defining the `eat` method as taking a `Food` parameter, we could define it to take some abstract type with an upper bound enforcing that it is a subclass of `Food`. This way, a subclass of `Animal` could explicitly specify the type of `Food` it eats.
``` scala
class Food
abstract class Animal {
type SuitableFood <: Food
def eat(food: SuitableFood)
}
```
A _path-dependent type_ is one that depends on its path, i.e. `some.object.Type`. Such a type can be instantiated using that syntax, since it implies a reference to `Type`'s outer object, particularly `object`. The syntax `Outer#Inner` can't be used to instantiate `Inner` since it doesn't refer to any instance of `Outer`.
## Structural Subtyping
_Structural subtyping_ is when two types get a subtyping relationship because they have the same members [^go_interfaces]. This is contrasted to the more traditional _nominal subtyping_, where each type has a name they have an explicitly declared subtyping relationship. Structural subtyping is achieved in Scala using _refinement types_.
[^go_interfaces]: This reminds me of [Go's interfaces].
[Go's interfaces]: /notes/go/#interfaces
For example, to create a `Pasture` class full of `Animal`s that eat `Grass`, we can define:
``` scala
Animal { type SuitableFood = Grass }
class Pasture {
var animals: List[Animal { type SuitableFood = Grass }] = Nil
}
```
It can also be used to generalize a `using` method, such as Python's `with` syntax, so that it works on any object that has a `close` method. The first attempt at generalization wouldn't work because `T` could be any type, even one that _doesn't_ have a `close` method. This can be fixed by specifying an upper bound consisting of a refinement type. Note that if no base type is specified, like `Animal` preceding the braces above, then Scala uses `AnyRef`
``` scala
def using[T, S](obj: T)(operation: T => S) = {
val result = operation(obj)
obj.close() // type error
result
}
def using[T, <: { def close(): Unit }, S](obj: T)(operation: T => S) = {
val result = operation(obj)
obj.close()
result
}
```
## Enumerations
Enumerations in Scala aren't defined at the language-level. Instead there is a class `scala.Enumeration` that can be used to define enumerations, which works due to path-dependent types. This means that `Color.Value` would be different from `Direction.Value` becuase their parts differ:
``` scala
object Color extends Enumeration {
val Red, Green, Blue = Value
}
object Direction extends Enumeration {
val North = Value("North")
val East = Value("East")
val South = Value("South")
val West = Value("West")
}
for (d <- Direction.values) print (d + " ")
Direction.East.id == 1
Direction(1) == Direction.East
```
# Implicit Conversions and Parameters
Implicit conversions [^implicit_conversions_opinion] can be used to make two independent libraries interoperate in a simple manner. For example, using Swing in Scala would look something like this:
[^implicit_conversions_opinion]: Now that I've learned what implicit conversions are in Scala, I have formed an opinion about them. Implicit conversions are one of the parts that make C++ [very complex]. Scala's implicit conversions don't seem as complex as C++'s, here they're implicit at the site of use, but explicit at the site of definition. In C++ it feels that they're implicit on both ends, since you can have overloaded conversion operators and conversion constructors on either type, which is further made ambiguous with arithmetic type conversions.
[very complex]: /notes/cpp/#conversion-ambiguity
``` scala
val button = new JButton
button.addActionListener(
new ActionListener {
def actionPerformed(event: ActionEvent) {
println("pressed!")
}
}
)
```
However, code written this way reflects Java's limitations and is therefore not idiomatic Scala. Using implicit conversions it can be possible to write it something like this:
``` scala
button.addActionListener(
(_: ActionEvent) => println("pressed!")
)
```
This is done using an implicit conversion function like this one. What happens is that Scala compiles it normally and encounters a type error in the above code, so it checks if there's an implicit conversion function of the correct type, `ActionEvent => Unit`, and if it works then it continues compilation:
``` scala
implicit def function2ActionListener(f: ActionEvent => Unit) =
new ActionListener {
def actionPerformed(event: ActionEvent) = f(event)
}
```
There are a variety of rules concerning implicit definitions.
* Only functions marked as `implicit` are tried.
* The implicit conversion must be in the scope as a single identifier, i.e. not `some.convert`. This is why some libraries include a `Preamble` object which often contains useful implicit conversions which can be imported with `import.Preamble._`
* The _exception_ to this rule is that the compiler also looks inside the companion object of the source or target types of the conversion.
* The compiler only attempts one implicit conversion, i.e. it won't attempt converting `x + y` into `convert1(convert2(x)) + y`.
Implicit conversions are also used on the receiver of a selection. For example, `"abc".exists` is converted to `stringWrapper("abc").exists`.
Implicit conversions are often used for simulating new syntax, such as the `->` in a `Map`, which is defined as:
``` scala
package scala
object Predef {
class ArrowAssoc[A](x: A) {
def -> [B](y: B): Tuple2[A, B] = Tuple2(x, y)
}
implicit def any2ArrowAssoc[A](x: A): ArrowAssoc[A] =
new ArrowAssoc(x)
}
```
## Implicit Parameters
Implicit parameters are those that can optionally be provided by the compiler. Note that the `implicit` applies to the entire last parameter list. Also, we didn't use a direct `String` since the compiler selects implicit parameters based on their types, so this should lower the chances that another type is used to fulfill the implicit parameter. Finally, implicit parameters must be available as single identifiers, which is why they are usually declared in an object which is imported.
``` scala
class PreferredPrompt(val preference: String)
class PreferredDrink(val preference: String)
object Greeter {
def greet(name: String)(implicit prompt: PreferredPrompt, drink: PreferredDrink) {
println("Welcome " + name + ", have some " + drink.preference)
println(prompt.preference)
}
}
object Preferences {
implicit val prompt = new PreferredPrompt("$ ")
implicit val drink = new PreferredDrink("tea")
}
import Preferences._
Greeter.greet("Bob")(prompt, drink)
Greeter.greet("Bob") // or implicitly
```
Implicit parameters are often used to provide information about a type in a preceding, explicit parameter list. For example, it can be used to pass a compare function to a function that yields the largest element in a list. Scala actually provides many `orderer` functions in the standard library, which makes this function usable with many standard types without explicitly providing an `orderer`. Note that again, types are made as specific as possible to reduce ambiguity to the developer and to restrict the options available to the compiler:
``` scala
def maxList[T](elements: List[T])(implicit orderer: T => Ordered[T]): T =
elements match {
case List() => throw new IllegalArgumentException("empty")
case List(x) => x
case x :: rest =>
val maxRest = maxList(rest) // (ordered) is implicit
if (x > maxRest) x // ordered(x) is implicit
else maxRest
}
```
Also note that the implicit parameter can be used as an implicit parameter and conversion in the body, as a result, `ordered` doesn't appear anywhere in the function body. This is a very common thing to do, and since the name of the implicit parameter isn't used anywhere, it's possible to use a _view bound_.
For example, the following code essentially enforces the requirement that `T` can be _treated_ as an `Ordered[T]`, where _treated_ would mean that there is an implicit conversion available. If `T` is already an `Ordered[T]`, then an identity function is used as the implicit conversion:
``` scala
def maxList[T <% Ordered[T]](elements: List[T]): T = ...
```
If multiple conversions apply for an implicit conversion, Scala generally refuses to insert a conversion. However, since Scala 2.8, it now does something similar to C++ where it'll choose the most specific conversion available, where being _more specific_ entails:
* the argument type is a subtype of another conversion's argument type
* both conversions are methods and the enclosing class extends the other conversion's enclosing class
# Lists
The left fold is possible with `/:` and `foldLeft` and the right fold with `:\` and `foldRight`. The `/:` and `:\` names represent the way the fold tree leans.
``` scala
def sum(xs: List[Int]): Int = (0 /: xs) (_ + _)
def flattenRight[T](xss: List[List[T]]) =
(xss :\ List[T]()) (_ ::: _)
```
An efficient way to append lists in constant time is to use `ListBuffers` [^difference_lists]:
[^difference_lists]: I wonder if these are similar to Haskell's [difference lists].
[difference lists]: http://hackage.haskell.org/package/dlist/docs/Data-DList.html
``` scala
import scala.collection.mutable.ListBuffer
val buf = new ListBuffer[Int]
buf += 1
3 +=: buf
buf.toList
```
An `ArrayBuffer` is similar to an `std::vector` in that it automatically resizes itself to fit its contents.
## Implementation {#list-implementation}
Lists are implemented as a covariant, abstract class `List` for which there are subclasses `::` and `Nil`. The covariant property allows a `List[Int]` to be assigned to a `List[Any]`:
``` scala
package scala
abstract class List[+T]
val xs: List[Any] = List(1, 2, 3)
```
The `Nil` object inherits from `List[Nothing]` so that `Nil` can be assigned to any `List`. The methods `isEmpty`, `head`, and `tail` are implemented for `Nil`, where the first returns `true` and the latter two throw an exception, as in Haskell.
``` scala
case object Nil extends List[Nothing] {
override def isEmpty = true
def head: Nothing = throw ...
def tail: List[Nothing] = throw ...
}
```
The `::`, "cons" class is defined so that `x :: xs` is treated as `::(x, xs)` where `::` is a case class defined as:
``` scala
final case class ::[T](head: T, tail: List[T]) extends List[T] {
override def isEmpty: Boolean = false
}
```
List operations are defined so that the result type is "widened" to accommodate the types of all list elements. This is done by placing a lower bound
``` scala
def ::[U >: T](x: U): List[U] = new scala.::(x, this)
abstract class Fruit
class Apple extends Fruit
class Orange extends Fruit
val apples = new Apple :: Nil
val fruits = new Orange :: apples
```
It turns out that the cons class is actually defined such that the tail is a mutable var, but only accessible within the `scala` package:
``` scala
final case class ::[U](hd: U, private[scala] var tl: List[U]) extends List[U] {
def head = hd
def tail = tl
override def isEmpty: Boolean = false
}
```
A `ListBuffer` then works by directly modifying the tail of the last cons cell in the list. This means that the tails in the following two variables are shared to avoid copying [^mutability]:
[^mutability]: It seems like Scala uses mutability to leave optimization up to the developer. Contrast this with Haskell, where everything is immutable at the language level and the optimizations are done underneath at compile time or at the runtime level. The cons `:` in Haskell for example would also reuse the tail, creating a [persistent linked list], but the developer doesn't have to worry about how to best implement something like this for efficiency. Scala of course affords one more flexibility in how they implement something, but it expects that every developer be mindful of how best to implement things in the unusual functional and imperative environment.
[persistent linked list]: http://en.wikipedia.org/wiki/Persistent_data_structure#Linked_lists
``` scala
val ys = 1 :: xs
val zs = 2 :: xs
```
# For Expressions
All `for` expressions that `yield` are translated by the compiler into combinations of `map`, `flatMap`, and `withFilter`, and those that don't `yield` into combinations of `withFilter` and `foreach`. This means that the following are equivalent:
``` scala
persons withFilter (p => !p.isMale) flatMap (p =>
(p.children map (c => (p.name, c.name))))
for (p <- persons; if !p.isMale; c <- p.children)
yield (p.name, c.name)
```
`for` expressions are of the following form, where _seq_ is a sequence of _generators_, _definitions_, and _filters_ delimited by semicolons:
``` scala
for (seq) yield expr
// for example: seq = generator; definition; filter
for (p <- persons; n = p.name; if (n startsWith "To"))
yield n
```
Generators are of the form `pat <- expr` where the pattern `pat` is matched for each element in the list. If the match succeeds, the variables are bound to the pattern components. If the match fails, the element is discarded from iteration.
## Translation {#for-expression-translation}
The following are examples of how `for` expressions are translated into combinations of `map`, `flatMap`, and `withFilter`.
``` scala
// case 1: one generator
for (x <- expr1) yield expr2
expr1.map(x => expr2)
// case 2: generator and filter
for (x <- expr1 if expr2) yield expr3
// transform into case 1
for (x <- expr1.withFilter(x => expr2)) yield expr3
expr1.withFilter(x => expr2).map(x => expr3)
// case 3: two generators
for (x <- expr1; y <- expr2; seq) yield expr3
expr1.flatMap(x => for (y <- expr2; seq) yield expr3)
// case 4: tuple pattern
for ((x1, ..., xn) <- expr1) yield expr2
expr1.map { case (x1, ..., xn) => expr2 }
// case 5: arbitrary pattern
// first filter by successful match, then map.
// guarantees no MatchError exception
for (pat <- expr1) yield expr2
expr1.withFilter {
case pat => true
case _ => false
}.map {
case pat => expr2
}
// case 6: contains definitions
// expr2 is evaluated each time x is generated
for (x <- expr1; y = expr2; seq) yield expr3
for ((x, y) <- for (x <- expr1) yield (x, expr2); seq)
yield expr3
// case 7: side-effect loop
for (x <- expr1) body
expr1.foreach(x => body)
for (x <- expr1; if expr2; y <- expr3) body
expr1.withFilter(x => expr2).foreach(x =>
expr3.foreach(y => body))
```
## Generalized For Expressions
It's possible to add support for `for` expressions to any data type by defining `map`, `flatMap`, `withFilter`, and `foreach`. Depending on which of these functions are implemented, the following features of `for` expressions become available:
* `map`: expressions with a single generator
* `flatMap` and `map`: expressions with multiple generators
* `foreach`: loops with single/multiple generators
* `withFilter`: filter expressions
Type checking is performed after translation occurs. Given a parameterized type `C` denoting a collection, the following type signatures are generally used for the required methods. A standard technique to optimize `withFilter` is to not return an entire new object but to return a wrapper object that remembers that elements need to be filtered when they're processed later:
``` scala
abstract class C[A] {
// monad methods
def map[B](f: A => B): C[B]
def flatMap[B](f: A => C[B]): C[B]
def withFilter(p: A => Boolean): C[A]
def foreach(b: A => Unit): Unit
}
```
# Collections API
The `Traversable` trait is at the top of the collections hierarchy and defines the following abstract operation, which is meant to traverse all elements of a collection and apply the operation `f` to each element:
``` scala
def foreach[U](f: Elem => U)
```
The `Iterable` trait defines an abstract `iterator` method, which `Iterable` uses to define `foreach` of `Traversable` it extends from:
``` scala
def foreach[U](f: Elem => U): Unit = {
val it = iterator
while (it.hasNext) f(it.next())
}
```
## Sequences
The sequence traits `Seq`, `IndexedSeq`, and `LinearSeq` represent iterables that have a length and whose elements have fixed indexed positions starting from 0.
Buffers are a sub-category of sequences that allow element insertions, removals, and appending operations. Common buffers are `ListBuffer` and `ArrayBuffer`.
## Sets
Sets are iterables with no duplicate elements. There are two subtraits `SortedSet` and `BitSet`. Ordering is preserved in `SortedSet`, and is backed by an ordered binary tree, with `immutable.TreeSet` being a [red-black tree] that keeps the tree balanced. `BitSet` uses an array of `Long` values to efficiently represent a set of packed bits, much like C++'s [bitset].
[red-black tree]: /notes/algorithms/#red-black-trees
[bitset]: http://en.cppreference.com/w/cpp/utility/bitset
## Maps
Maps' `get` method returns an `Option[Value]`, like `lookup` would return a `Maybe a` in Haskell. The `getOrElseUpdate` function facilitates the use of Maps as caches.
## Streams
Streams have elements that are computed lazily. The `#::` function is used to construct streams. Notice that only the head has been computed so far:
``` scala
val str = 1 #:: 2 #:: 3 #:: Stream.empty
// Stream(1, ?)
def fibonacciFrom(a: Int, b: Int): Stream[Int] =
a #:: fibonacciFrom(b, a + b)
val fibs = fibonacciFrom(1, 1).take(7).toList
// List(1, 1, 2, 3, 5, 8, 13)
```
## Vectors
Vectors are effectively constant time, random access sequences represented as broad, shallow trees where every tree node contains up to 32 elements of the vector or 32 other tree nodes. The `updated` method can be used to update particular elements and is also effectively constant time, since only the node that contains the element and every node that points to it must be copied. These are currently the default implementation of immutable indexed sequences, `collection.immutable.IndexedSeq`.
## Ranges
Ranges can be defined as follows:
``` scala
1 to 3 // => Range(1, 2, 3)
5 to 14 by 3 // => Range(5, 8, 11, 14)
1 until 3 // => Range(1, 2,)
```
## Arrays
Scala arrays correspond to Java arrays such that `Array[T]` in Scala is a `T[]` in Java. Scala arrays are compatible with sequences, and provide all operations that sequences provide. This is facilitated through implicit conversions to `scala.collection.mutable.WrappedArray`. There's also an implicit conversion to `ArrayOps` which supports various methods available to sequences, without actually turning the array into a sequence.
Java doesn't allow generic arrays `T[]`, but this is made possible in Scala by creating an array of `Objects`. Creating generic arrays in Scala through `Array[T]` requires a run-time hint, since the information about the type `T` gets erased at runtime. This is done with a _class manifest_ of type `scala.reflect.ClassManifest`, which is a type descriptor object that describes the top-level class of a type. The compiler can be instructed to generate code to construct and pass a class manifest, this is done via an implicit parameter. This way, the compiler looks for an implicit value of type `ClassManifest[T]` so that the correct type of array can be constructed at run-time:
``` scala
def someMethod[T](xs: Vector[T])(implicit m: ClassManifest[T]): Array[T] = ...
// or with a context bound
def someMethod[T: ClassManifest](xs: Vector[T]): Array[T] = ...
```
## Views
Transformer methods are ones such as `map` and `filter` and come in strict and non-strict varieties. A strict transformer constructs a new collection on the spot. A non-strict transformer construct a proxy for the result collection such that its elements are constructed on demand.
``` scala
def lazyMap[T, U](coll: Iterable[T], f: T => U) =
new Iterable[U] {
def iterator = coll.iterator map f
}
```
Most collections are strict by default in their transformers except for `Stream`. It's possible to turn a collection into a lazy one and vice versa through _collection views_, which represent a base collection with the difference that the transformers are lazy. The `view` method is used to make the transformers lazy, and `force` is used to go back to strict transformers.
``` scala
(v.view.map(_ + 1).map(_ * 2)).force
```
Views can provide a simple way to optimize otherwise costly operations [^lazy_haskell]:
[^lazy_haskell]: Something like this would be implicit in Haskell due to its non-strict nature.
``` scala
def isPalindrome(x: String) = x == x.reverse
def findPalindrome(s: Seq[String]) = s find isPalindrome
// creates 1000000 element sequence
findPalindrome(words take 1000000)
// creates no copy
findPalindrome(words.view take 1000000)
```
Views can also be used as subwindows into mutable sequences [^slices].
[^slices]: This is of course very much like [Go's slices] and [Rust's slices].
[Go's slices]: http://golang.org/doc/effective_go.html#slices
[Rust's slices]: http://static.rust-lang.org/doc/master/tutorial.html#vectors-and-strings
``` scala
val subarr = arr.view.slice(3, 6)
// can now perform operations on that slice only
// i.e. negate all elements in the slice
for (i <- 0 until subarr.length) subarr(i) = -subarr(i)
```
## Iterators
Iterators are affected by operations on them, such that for example a `map` called on an iterator leaves the iterator's position at the end of the sequence, so that an extra call to `next` will throw a `NoSuchElementException`. This is mitigated by duplicating the iterator with `duplicate` [^duplicate_fd].
[^duplicate_fd]: This reminds me of [open file descriptions] which record the file offset and status flags. Duplicate file descriptors [share this information]. To avoid this, it's necessary to perform a separate `open` call.
[open file descriptions]: http://man7.org/linux/man-pages/man2/open.2.html
[share this information]: http://man7.org/linux/man-pages/man2/dup.2.html
A `BufferedIterator` provides an extra method `head` that returns its first element without advancing the iterator.
## Java Interop {#java-interop-collections}
Scala provides implicit conversions for the major collection types in Java through the `JavaConversions` object.
## Architecture {#collections-architecture}
Most of the collection operations are implemented in terms of traversals and builders.
### Builders {#builders}
Builders are in charge of building new collections. The `result` method yields the collection that has been constructed thus far, and the builder can be reset to a clean slate to construct another collection with the `clear` method. The `mapResult` method can be used to return a result of a different type.
``` scala
package scala.collection.generic
class Builder[-Elem, +To] {
def +=(elem: Elem): this.type
def result(): To
def clear()
def mapResult(f: To => NewTo): Builder[Elem, NewTo] = ...
}
```
### Implementation Traits
The code is kept DRY by using _implementation traits_ which are named with a `Like` suffix, such as `TraversableLike`. These traits implement concrete methods and are parameterized using the collection's element type and its representation type, i.e. `Seq[I]` or `List[I]`. For example, `filter` is implemented here such that it creates a new builder for the representation type and appends elements to it if they satisfy the predicate, then the builder's result is returned.
*[DRY]: Don't Repeat Yourself
``` scala
package scala.collection
class TraversableLike[+Elem, +Repr] {
def newBuilder: Builder[Elem, Repr]
def foreach[U](f: Elem => U)
...
def filder(p: Elem => Boolean): Repr = {
val b = newBuilder
foreach { elem => if (p(elem)) b += elem }
b.result
}
}
```
A problem presents itself when we want to return a different type of sequence from the one that is being operator one, for example:
``` scala
Map("a" -> 1, "b" -> 2) map { case (x, y) => (y, x) }
// Map(1 -> "a", 2 -> "b")
Map("a" -> 1, "b" -> 2) map { case (x, y) => y }
// List(1, 2)
BitSet(1, 2, 3) map (_.toFloat)
// Set(1.0, 2.0, 3.0)
```
This is mitigated with an implicit parameter on the function so that a builder factory may produce the correct type of builder by passing it the implicit parameter.
``` scala
def map[B, That](p: Elem => B)
(implicit bf: CanBuildFrom[B, That, This]): That = {
val b = bf(this)
for (x <- this) b += f(x)
b.result
}
package scala.collection.generic
trait CanBuildFrom[-From, -Elem, +To] {
def apply(from: From): Builder[Elem, To]
}
```
For example, in the case of a `BitSet`, the companion object for `BitSet` would define a builder factory of type `CanBuildFrom[BitSet, Int, BitSet]`, with a more general fallback that converts to a regular `Set` with `CanBuildFrom[Set[_], A, Set[A]]`. Scala will then choose the more specific one when choosing the implicit instance.
Finally, collections are kept the same dynamic type. This is achieved through virtual dispatch by passing the source collection to the builder factory, which forwards the call to a `genericBuilder` method available on all generic, non-leaf classes, which itself calls the builder that belongs to the collection on which it is defined.
### Creating Collections
Creating a new collection type `T` can be done by using traits `IndexedSeq[Elem]` and `IndexedSeqLike[Elem, T]`. The latter requires the implementation of `newBuilder`. It's also necessary to implement appropriate `CanBuildFrom` implicits.
# Extractors
Extractors provide a way to define patterns that are decoupled from an object's representation. This is done using an `unapply` method that matches a value and deconstructs it:
``` scala
object EMail {
def unapply(str: String): Option[(String, String)] = {
val parts = str split "@"
if (parts.length == 2) Some(parts(0), parts(1)) else None
}
}
```
This is analogous to an `apply` method that constructs an object:
``` scala
object EMail {
def apply(user: String, domain: String) = user + "@" + domain
}
```
This can be made more explicit by inheriting from the function type:
``` scala
// equivalent to extends Function2[String, String, String]
object EMail extends ((String, String) => String) { ... }
```
Pattern matching in Scala checks for an `unapply` method to deconstruct the object. If `unapply` returns `None`, then the pattern doesn't match and it moves on to the next pattern, throwing a `MatchError` if there are no other patterns that match. Remember that a single-tuple parameter can omit the set of parentheses, i.e. `func((a, b)) -> func(a, b)`.
``` scala
selectorString match { case EMail(user, domain) => ... }
```
It's generally a good idea to ensure that the following property holds:
``` scala
Obj.unapply(Obj.apply(a, b)) == Some(a, b)
```
In the case of a single pattern variable, that single variable is wrapped in an `Option` value.
It's also possible for the pattern to not bind any variables, in which case a `Boolean` is returned to indicate whether the pattern matched:
``` scala
object UpperCase {
def unapply(s: String): Boolean = s.toUpperCase == s
}
```
It's also possible to define variable argument extractors, to be able to match on an arbitrary number of variables. This is done using the `unapplySeq` function:
``` scala
object Domain {
def unapplySeq(whole: String): Option[Seq[String]] =
Some(whole.split("\\.").reverse)
}
```
This allows matching like this:
``` scala
domain match {
case Domain("org", "acm") => println("acm.org")
case Domain("net", _*) => println("some .net domain")
}
```
## Compared to Case Classes
Modifying the case classes has an effect on client code, whereas extractors provide a layer of indirection between the data representation and the way it's viewed by clients. However, case classes to have advantages over extractors. They're easier and simpler to define. Case classes can also end up generating more efficient pattern matches because the compiler can optimize them since they're defined at the language level. Finally, case classes derived from a `sealed` base class can allow the compiler to check for pattern match exhaustiveness.
## Regular Expressions
Regular expressions can be constructed using the `Regex` constructor or with a `r` method on a string. It's possible to pattern match on a regular expression match using a predefined extractor. The matching is done by binding every matched group. If a group didn't match, it'll bind the variable to `null`:
``` scala
val Decimal = """(-)?(\d+)(\.\d*)?""".r
val Decimal(sign, integerpart, decimalpart) = "-1.23"
// sign = "-", integerpart = "1", decimalpart = ".23"
```
# Annotations
Annotations are like meaningful comments for the compiler. They can support generation of documentation (Scaladoc), for example. Annotations can be placed on any kind of declaration or definition, as well as expressions:
``` scala
@deprecated def oldMethod() = ...
(expr: @unchecked) match {
// non-exhaustive cases
}
```
The general form of annotations is as follows. The `@` prefixed to an annotation can read as `new` since underneath the compiler is simply instantiating a class named after the annotation. Passing an annotation as argument to another has the consequence that it must use `new` instead of `@`.
``` scala
@annotation(exp1, exp2, ...)
```
The `@deprecated` annotation can mark something as deprecated, which elicits a compiler warning if some code uses it. It accepts an optional message:
``` scala
@deprecated("use newOne() instead")
def oldOne() = ...
```
The `@volatile` annotation informs the compiler that the variable in question will be used by multiple threads. This makes reads/writes slower, but accesses from multiple threads behave more predictably.
The `@serializable` annotation marks a class as serializable. The `@SerialVersionUID(id)` annotation marks the current version of a class. The `@transient` annotation marks fields that should not be serialized.
The `@scala.reflect.BeanProperty` annotation generates automatic get and set methods, i.e. `age` generates `getAge` and `setAge`. This is useful for Java-centric frameworks. The generated methods are available only after compilation, i.e. not within one's own code.
The `@tailrec` annotation designates that a method should be tail-recursion optimized. If the optimization cannot be performed, a warning is emitted.
The `@unchecked` annotation tells the compiler not to check for exhaustive match cases.
The `@native` annotation means that the method's implementation is supplied by the runtime, via the Java Native Interface (JNI), rather than in Scala.
``` scala
@native
def nativeMethod() { /* empty body required */ }
```
# XML
Scala allows XML literals anywhere that an expression is valid. The type of an XML literal is `Elem`. Class `Node` is the abstract superclass of all XML node classes. Class `Text` is a node holding just text. Class `NodeSeq` corresponds to a sequence of nodes, in fact, `Node` extends from `NodeSeq`, so a `Node` can be thought of as a one-element `NodeSeq`.
It's possible to interpolate Scala code within XML literals using braces `{}`. The interpolated code can itself contain XML literals, effectively allowing arbitrary nesting of XML code. Two braces in a row are used to print literal braces.
``` scala
var res = {3 + 4}
// scala.xml.Elem = 7
var yearMade = 1955
var res2 = { if (yearMade < 2000) {yearMade}
else xml.NodeSeq.Empty}
// 1955
```
Scala's XML support can be leveraged to provide object serialization:
``` scala
abstract class SomeClass {
val description: String
val year: Int
def toXML =
{description}
{year}
}
```
There are various ways to deconstruct XML in Scala. The `text` method retrieves all of the text from the entire element tree, `flatMap` style. Scala also supports [XPath] through the methods `\` (for sub-elements) and `\\` (for recursive/deep search). Attributes can be extracted using the `@` sign before the attribute.
[XPath]: http://en.wikipedia.org/wiki/XPath
``` scala
var res = hello \ "b"
// hello
var res2 = hello \\ "c"
// hello
val employee =
val res3 = employee \ "@name"
// "Joe"
```
It's also possible to define a `fromXML` method to deserialize an object.
``` scala
object SomeClass {
def fromXML(node: scala.xml.Node): SomeClass =
new SomeClass {
val description = (node \ "description").text
val year = (node \ "year").text.toInt
}
}
```
It's possible to save a file from an XML node using, and conversely, load it:
``` scala
xml.XML.save("serialized.xml", node)
val loadednode = xml.XML.loadFile("serialized.xml")
```
It's possible to pattern match on XML. This is done by using XML literals where instead of interpolating expressions, variables are interpolated in order to bind to the matched data. In the context of XML, the `_*` is interpreted as matching any sequence of nodes down the XML tree.
``` scala
def proc(node: scala.xml.Node): String =
node match {
case {contents @ _*} => "It's an a: " + contents
case {contents @ _*} => "It's a b: " + contents
case _ => "Something else."
}
// proc(a b c) => "It's an a: ArrayBuffer(a, b, c)"
```
# Modular Programming
When a module grows too large for a single file, it may be useful to split the module up into separate traits defined in separate files, which can then be mixed into the original module.
A problem can arise when one such compartmentalized trait wants to refer to another trait that ultimately gets mixed into the same class. This can be circumvented by specifying the _self type_ which essentially defines the value of type of `this` for whenever it's referred to in the class. For example, assuming trait `Second` needs to refer to trait `First`'s method `count`, we can do:
``` scala
trait First {
def count = 5
}
trait Second {
this: First =>
def printCount { println(count()) }
}
```
It's also possible to select types at runtime:
``` scala
val db: Database =
if(args(0) == "student")
StudentDatabase
else
SimpleDatabase
```
Sometimes the compiler won't be able to determine that two types are the same, as in the following case:
``` scala
object Obj {
val db: Database = SomeDatabase
object browser extends Browser {
// compiler doesn't know `database` is
// same type as `db`
val database = db
}
// error: type mismatch
// found: db.SomeCategory
// required: browser.database.SomeCategory
for (category <- db.allCategories)
browser.displayCategory(category)
}
```
This can be resolved by using a singleton type through the `type` property, which is an extremely specific type that refers to the type of the specific object [^ruby_eigenclass]:
[^ruby_eigenclass]: This _definitely_ reminds me of Ruby's EigenClasses.
``` scala
object browser extends Browser {
val database: db.type = db
}
```
# Object Equality
To recap, Scala's equality operators are equivalent to Java's `equals` methods, that is, unlike Java's equality operators which refer to object identity for reference types. Object identity is available through the `eq` method though it's rarely used.
There are four main pitfalls when defining equals methods:
1. wrong signature
2. changing it without also changing `hashCode`
3. defining it in terms of mutable fields
4. failing to define it as an equivalent relation
## Wrong Signature
Given a simple type like the following, the accompanying `equals` method is too naive. What happens is that when it's added to a collection, it's static type becomes `Any`, which would trigger the `equals` method for `Any` since overloading in Scala and Java is based on the static type. It turns out that the `equals` method in `Any` is simply object identity.
``` scala
class Point(val x: Int, val y: Int) {
def equals(other: Point): Boolean =
this.x == other.x && this.y == other.y
}
```
A more robust `equals` operator would assume that the parameter is of static type `Any`, and then perform a pattern match for its dynamic type:
``` scala
override def equals(other: Any) = other match {
case that: Point => this.x == that.x && this.y == that.y
case _ => false
}
```
It's also a common mistake to want to define the `==` operator directly, which is impossible since it's defined as `final`:
``` scala
final def == (that: Any): Boolean =
if (null eq this) {null eq that} else {this equals that}
```
However, if this method is defined with a different parameter type, the compiler would regard it as an overloaded variant, and would allow the definition to occur, but the same problem would occur as above, where the parameter type isn't `Any`.
## Hash Code
When the `equals` method is redefined, it's also logically necessary to redefine the `hashCode` method, which by default is defined in `AnyRef` to be some transformation of the object's address. In fact, if two objects are determined to be equal as per the `equals` method, then they must return the same `hashCode`.
The `hashCode` method can be defined for `Point` so that it only references fields that were used in `equals` for equality determination.
``` scala
override def hashCode = 41 * (41 + x) + y
```
## Mutable Fields
Making the `hashCode` and as a result the `equals` method depend on mutable fields can have bad consequences. Adding an item to a collection and then changing that item's mutable field, which itself aids in equivalence determination, will mean that the collection will no longer find the object, since the collection (a `HashSet`) will keep looking in the wrong bucket, where it would expect to find the item based on its new data representation. Instead it may have been better to avoid redefining `hashCode` and create a separate comparison function such as `equalContents`.
## Equivalence Relation
The `equals` method should implement an equivalence relation on non-null objects, that is:
* it's _reflexive_: for any non-null value x
`x.equals(x) == true`
* it's _symmetric_: for non-null values x and y
`x.equals(y) == true == y.equals(x)`
* it's _transitive_: for non-null values x, y, and z
`x.equals(y) && y.equals(z) == true == x.equals(z)`
* it's _consistent_: for non-null values x and y, multiple invocations of `x.equals(y)` should consistently return the same value provided no information used in equality determination is modified
* `x.equals(null)` should return false
## Subtypes {#subtype-equality}
There's a mistake that can be made given a subtype that redefines its `equals` method. If a supertype is compared to a subtype and appears on the left hand side, it would invoke the supertype's `equals` method which would _only_ compare those properties common to both types, meaning that the equality would not be symmetric.
``` scala
class ColoredPoint(x: Int, y: Int, val color: Color.Value)
extends Point(x, y) {
override def equals(other: Any) = other match {
case that: ColoredPoint =>
this.color == that.color && super.equals(that)
case _ => false
}
}
val p = new Point(1, 2)
val cp = new ColoredPoint(1, 2, Color.Red)
p equals cp == true
cp equals p == false
```
A way to solve this is to define a `canEqual` method in classes that override `equals` and `hashCode`, which determines whether or not an object can be compared for equality with the given class. This allows subclasses of the class that redefined `canEqual` to continue to compare for equality with objects of the superclass. This is especially important for the anonymous class instantiation syntax:
``` scala
class Point(val x: Int, val y: Int) {
override def hashCode = 41 * (41 + x) + y
override def equals(other: Any) = other match {
case that: Point =>
(that canEqual this) &&
(this.x == that.x) && (this.y == that.y)
case _ =>
false
}
def canEqual(other: Any) = other.isInstanceOf[Point]
}
class ColoredPoint(x: Int, y: Int, val color: Color.value)
extends Point(x, y) {
override def hashCode = 41 * super.hashCode + color.hashCode
override def equals(other: Any) = other match {
case that: ColoredPoint =>
(that canEqual this) &&
super.equals(that) && this.color == that.color
case _ =>
false
}
override def canEqual(other: Any) =
other.isInstanceOf[ColoredPoint]
}
```
# Java Interop
The Scala compiler will attempt to use direct mappings to Java value types, such as an `Int`. However, sometimes this assumption can't be made, as in a collection, in which case it'll use wrapper classes like `java.lang.Integer`.
Singleton `objects` the compiler creates a Java class with suffix `$` which contains all of the methods and fields of the Scala singleton object as well as a `MODULE$` static field that holds the singleton instance created at runtime. If the singleton object has no companion class then a class without the `$` suffix is created which has a static forwarder method for each method in the singleton object.
Traits get compiled down to Java interfaces if they only contain abstract methods.
## Existential Types
Existential types in Scala are mainly used to facilitate Java's wildcard types and raw types. They take the form:
``` scala
type forSome { declarations }
```
For example, the following is equivalent to Java's `Iterator >` and reads as it being an `Iterator` of `T`'s for some type `T`. It's also possible to use _placeholder syntax_ which is similar to the one used in patterns. For each underscore present in the type a type parameter is added to the `forSome` clause:
``` scala
Iterator[T] forSome { type T }
// equivalently: placeholder syntax
Iterator[_]
```
Similarly, Java's `Iterator extends Component>` can be represented as follows in Scala, where it reads as being an `Iterator` of `T` for some type `T` that is a subtype of `Component`:
``` scala
Iterator[T] forSome { type T <: Component }
// equivalently: placeholder syntax
Iterator[_ <: Component]
```
Given a Java class with wildcards like this:
``` java
public class Wild {
Collection> contents() {
Collection stuff = new Vector();
stuff.add("a");
stuff.add("b");
stuff.add("see");
return stuff;
}
}
```
If we want to create a new `Set` from the contents of this collection, we'll have trouble naming the type:
``` scala
import scala.collection.mutable.Set
val iter = (new Wild).contents.iterator
val set = Set.empty[???] // what type?
while (iter.hasMore)
set += iter.next()
```
The solution to this is to create an abstract class with methods for each of the types in the `forSome` clause:
``` scala
abstract class SetAndType {
type Elem // required for defining `set`
val set: set[Elem]
}
def java2scala[T](jset: Collection[T]): SetAndType = {
val sset = Set.empty[T] // T can be used, inferred from parameter
val iter = jset.iterator
while (iter.hasNext)
sset += iter.next()
return new SetAndType {
type Elem = T
val set = sset
}
}
```
## Compiling Heterogenous Projects
Usually one would compile the Java code and add the output to the classpath when building the Scala code. However, this wouldn't work if the Java code references the Scala code, or vice versa. For this reason, Scala allows processing of Java source files as if they were Scala files. They won't be compiled, but they'll be scanned to determine what they contain. This is done with this sequence of commands:
``` bash
$ scalac -d bin FileAnalysis.scala FileItem.java File.java
$ javac -cp bin -d bin File.java FileItem.java FileManagement.java
$ scala -cp bin FileManagement
# program output
```
# Actors and Concurrency
Scala uses actors similar to Erlang's as its primary concurrency primitive. **Note** that this actors library is [deprecated]; [Akka] actors will be covered later and are the default actors library since Scala 2.10.
[deprecated]: http://docs.scala-lang.org/overviews/core/actors-migration-guide.html
[Akka]: http://akka.io/
``` scala
import scala.actors._
object PrintActor extends Actor {
def act() {
for (i <- 1 to 5) {
println("acting")
Thread.sleep(1000)
}
}
}
SillyActor.start() // prints "acting" five times
// equivalent: .start()s as soon as it's defined
val PrintActor2 = actor {
for (i <- 1 to 5) {
println("acting")
Thread.sleep(1000)
}
}
```
Actors can be sent messages using the `!` method. The following actor prints out the messages it receives:
``` scala
val echoActor = actor {
while (true) {
receive {
case msg => println("received: " + msg)
}
}
}
echoActor ! "testing"
// received: testing
```
The way this works is that the `receive` method accepts a partial function, which in this case is specified as a partial function literal [case sequence]. The actor will block until a message is received that matches a pattern in the partial function, and non-matching messages are silently ignored.
[case sequence]: #case-sequence
It's also possible to treat "native" threads as actors through `Actor.self`, which yields an actor representing the current thread. This is useful for debugging, by specifically setting a `receive` function:
``` scala
import scala.actors.Actor._
self ! "hello"
self.receive { case x => x}
// res1: Any = hello
// 1 second timeout to avoid blocking repl
self.receiveWithin(1000) { case x => x }
```
Every actor is given its own thread so that they can each perform the `act` method. This is resource heavy, so a `react` method can be used that is similar to `receive`. The `react` method doesn't return after it finds and processes a message, instead it only evaluates the message handler. This means that the implementation doesn't need to preserve the call stack of the current thread, allowing it to reuse the thread for the next actor that wakes up.
Since `react` doesn't return, it's usually common to return the information (if any) by sending the message to another actor, then recursing the message handler `act` to continue to process messages. This recursion pattern is common enough that there's a method `loop` that can wrap around the `react` method so that it loops infinitely.
``` scala
object Resolver extends Actor {
import java.net.{InetAddress, UnknownHostException}
def act() {
react {
case (name: String, receiver: Actor) =>
receiver ! getIp(name)
act()
case "EXIT" =>
println("exiting") // doesn't recurse again
case msg =>
println("unhandled msg: " + msg)
act()
}
}
def getIp(name: String): Option[InetAddress] = {
try {
Some(InetAddress.getByName(name))
} catch {
case _:UnknownException => None
}
}
}
Resolver.start()
Resolver ! ("www.scala-lang.org", self)
self.receiveWithin(0) { case x => x }
// Some(ww.scala-lang.org/)
```
Actor's shouldn't block, such as while processing a message, since this can even cause deadlocks while actors wait on each other. If an actor needs to perform an operation that blocks in order to process a message, a separate actor should be created that performs the blocking operation and then notifies the original actor when its work is complete. In the following example, the actor will continue to accept requests even while it continues to perform blocking operations, which it delegates to a sub-actor.
``` scala
val someActor = actor {
def emoteLater() {
val mainActor = self
actor {
Thread.sleep(1000)
mainActor ! "Emote"
}
}
var emoted = 0
emoteLater()
loop {
react {
case "Emote" =>
println("acting")
emoted += 1
if (emoted < 5)
emoteLater()
case msg =>
println("received: " + msg)
}
}
}
/*
acting
acting
someActor ! "hello"
Received: hello
acting
acting
*/
```
It's very important to communicate with actors only via messages. However, unlike Erlang's actors, Scala allows mixing the traditional shared data & locks model with actors. For example, if multiple actors were to share a common mutable map, there are two approaches that can be taken to achieve this. The first would entail creating an actor that owns the map, defining messages for every kind of map operation that would be required such as getting and setting values. An alternative approach, however, would be to pass a thread-safe map such as `ConcurrentHashMap`.
It's also a good idea to keep actors self-contained. A good idea is to send contextual information about the request (if not the request itself) along with the response, to the original actor, since some time may have passed since the actor performed the request. It's also more readable to create case classes when possible:
``` scala
case class LookupIP(name: String, respondTo: Actor)
case class LookupResult(name: String, address: Option[InetAddress])
object Resolver extends Actor {
def act() {
loop {
react {
case LookupIP(name, actor) =>
actor ! LookupResult(name, getIp(name))
}
}
}
def getIp = ...
}
```