Scala Collections – Lists, Sequences & Vectors

Below, I have provided some of the collections which are usually the most used. Starting from the regular ones like Sequences, Lists, Maps, Sets we go on to discuss some of the more interesting ones like Vectors, Streams.

Lists

List is probably the easiest to understand and hence most used data structure/collection. In scala a list is implemented as a class. I quote

A class for immutable linked lists representing ordered collections of elements of type A

By default, all lists are immutable. However the devil is in the detail. Check out the scala documentation here.

Despite being an immutable collection, the implementation uses mutable state internally during construction. These state changes are invisible in single-threaded code but can lead to race conditions in some multi-threaded scenarios

You can declare lists in various ways and they are documented below

//Creates a list of 5 integers with elements
val num1 = List(10,20,30,40,50)

//Creates a list of strings
val fruits = List[String]("Apple","Banana","Mango")

//Creates a list of 5 integers with elements 1,2,3,4,5
val num2 = List.range(1,5)

//Creates a list of 5 elements all set to 0
val num3 = List.fill(5)(0)

Lists are efficient when the programs which use them are only processing head of the list – i.e adding, retrieving, modifying. Use lists when the requirement is small, linear and immutable data type is required.

For a mutable List use scala.collection.mutable.ListBuffer package.

import scala.collection.mutable.ListBuffer
//Creates an empty list buffer
var cities = new ListBuffer[String]

//Add elements to the list.
cities +="London"
cities +="New York"
cities +="Paris"

//Prints London, New York, Paris
cities.foreach(println)

There are various operations which can be performed on List and other collections. Many of them are common. These will be discussed in the next blog entry.

Sequences

Sequences are full of surprises!

Sequences are very similar to Lists in scala and share various methods. Most common methods have the same syntax. Interestingly when you create a sequence what is returned is a List – surprise!!

Sequence is nothing but a sequence of indexed elements. Elements in a sequence have a defined order – order of insertion.  Sequences are implemented as a trait in Scala. Sequences by default are mutable – surprise!!. This is where it gets more interesting, if you are passing a sequence to a method and want it to be immutable make sure that you import the correct collections library else even though you may pass a an immutable sequence it will be passed to the method as an mutable sequence – surprise!!.

//Creates a Sequence but returns a List
var num = Seq(1,2,3,4,5)

//Prints List(1,2,3,4,5) - Surprise!
println(num)

//Append a new element to the seq
num = num :+ 6

//Prints List(1,2,3,4,5,6) - Surprise!
println(num)

//Empty Sequence
var str = Seq.empty[String]

If you are after performance then try the sub-traits of Sequences – LinearSeq and IndexedSeq. If you are using a lot of head and tail methods try using LinearSeq and if you are after updates try the IndexedSeq – more performance characteristics details can be found here.

Finally, how are lists and sequence different – well if you haven’t guessed it here are the two main differences

  • Lists are immutable by default. Sequence are not.
  • Lists are implemented as class. Sequences is a trait.

Vectors

Vector is a powerful data structure provides random access to elements stored in it. Vectors in Scala are by default immutable and implemented as a sub-class of IndexedSeq. They are created as trees and have a high branching factor. Each node can hold up to 32 other tree nodes or elements. However, Vectors are created in a similar way as Lists or Sequences.

//Creates a vector of with elements - 1,2,3,4,5
val num1 = Vector(1,2,3,4,5)

//Creates an empty vector
val num2 = Vector.empty[Int]

Similar to the implementation of List – Vectors have a similar issue in their implementation. I quote from here

Despite being an immutable collection, the implementation uses mutable state internally during construction. These state changes are invisible in single-threaded code but can lead to race conditions in some multi-threaded scenarios.

Vectors address the performance issue of random access of elements in Lists and Sequences and can access elements in near constant access time, however it is more than that of a List. Vector come into their own when working on larger data sets. Hence, it is advisable to use Vectors instead of Lists or Sequences when the data set is large and there is a requirement of random access to the data set.

Leave a Comment