Playing With Fire

Exploring the web one Elixir at a time

Strings in Elixir

Common features

As is common in programming, Elixir has two string formats: single-quoted and double-quoted. These two representations have many things in common:

  • UTF-8 character encoding
  • they can contain escape sequences
  • they allow string interpolation
  • they have a heredoc representation

Don’t let these similarities fool you, they are represented internally very differently and have different behaviours. This means that they are not interchangable and functions that are designed to work on single-quoted strings will not work on double-quoted strings and vice versa.

The reason for this is as follows: double-quoted strings are more similar to what you might expect a string to be if you program in any other language. These are real strings. Single-quoted strings are lists of character codes, referred to as char lists. IEX will represent a char list as a series of letters if the character code used falls within the range of printable characters.

String handling in Elixir might seem to be a little strange at first, but that is because of the underlying Erlang environment. You can find out more about Erlang strings at Erlang.org

 

String Interpolation

Many languages have a form of string interpolation - the process of evaluating a string that contains one or more placeholders that can be used to yield a result that can be used in place of the placeholder in the string.

In Elixir this is represented with the following syntax : #{[variable]|[function]}, where the [variable] needs to evaluate to a string, or the [function] needs to return a string.

This is best explored in IEX.

$ iex
Erlang/OTP 17 [erts-6.3] [source-f9282c6] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Interactive Elixir (1.0.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> myword = "world"
"world"
iex(2)> "hello #{myword}!"
"hello world!"
iex(3)> "hello #{String.upcase myword}!"
"hello WORLD!"
iex(4)> myfunc = fn -> "pretty green world" end
#Function<20.90072148/0 in :erl_eval.expr/5>
iex(5)> "hello #{myfunc.()}"
"hello pretty green world"
iex(6)> "hello #{String.capitalize myfunc.()}"
"hello Pretty green world"
iex(7)> "hello #{String.reverse myfunc.()}"   
"hello dlrow neerg ytterp"
iex(8)> 

You get the gist.

 

Single-Quoted Strings - Char Lists

As noted above, single-quoted strings, or char lists, are internally represented as lists of integers. As they are lists, they have the behaviour of lists and can be treated as such. What you will see when using char lists in IEX is that when IEX renders the list out to you, if it consists of purely printable characters, then it will print it a set of characters enclosed in single-quotes.

This behaviour can be easily explored in IEX:

iex(1)> str = 'cat'
'cat'
iex(2)> is_list str
true
iex(3)> length str
3
iex(4)> Enum.reverse str
'tac'
iex(5)> str ++ [115]
'cats'
iex(6)> str ++ [0]
[99, 97, 116, 0]
iex(7)> str ++ 'nip'
'catnip'
iex(8)> [ head | tail ] = str
'cat'
iex(9)> head
99
iex(10)> tail
'at'
iex(11)> ?c   
99

Here you can see that you can treat it exactly like a list - because that is what it is, including with pattern-matching. But what you will notice is that extracting the head from the list turns the character back into its numerical representation, as its no longer part of the list.

What I’m also showing you here is the ? notation. Using ? in front of a printable character will return the ascii number for that character. This can be useful in parsing when parsing lists.

 

Double-Quoted Strings - Binaries

Firstly a quick word on binaries. Binaries represent a sequence of bits. They have their own syntax in Elixir. To write a binary literal, enclose the terms in << >>.

iex(1)> test = << 1,2,3,4,5 >>
<<1, 2, 3, 4, 5>>
iex(2)> test = << "abc" >>
"abc"
...
iex(4)> test = << "abc" >>
"abc"
iex(5)> test = << "a", "b", "c" >>
"abc"

The last two examples hint at the link between binaries and double-quoted strings. As you might have determined, the internal representation of double-quoted is as a sequence of bits in a binary, just as the internal representation of a single-quoted string is the elements of a list.

As you might realise, this representation makes if more efficient for the storage of UTF-8 encoded characters which might need to use more than one byte for its representation. What this also means is that a UTF-8 character holding binary may have a size that is not equal to its length.

Manipulation of double-quoted strings is enabled using the String module.

Pattern matching using double-quoted strings is also slightly different from what you have looked at with lists. Instead of using the familiar [ head | tail ]you will need to adjust the pattern to look something like &lt;&lt; head::utf8, tail::binary &gt;&gt; and instead of terminating on an empty list ([]), you will need to use an empty binary (<<>>).

If you need to do a lot of string processing then it pays to become incredibly familiar with the String module and also to read more on [http://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html#6.2-binaries-%28and-bitstrings%29](binary processing)

 

Special forms

 

Heredocs

As you may, or indeed may not, have figured out, strings or charlists can span several lines. Heredocs are represented using triple delimiters (like in python) and maintain the newlines contained within the string. What is important to rememeber is that they strip any white space from the beginning and the end of the string.

They are used mainly for documentation of code (functions and modules).

 

Sigils

This is a special syntax for representing different aspects of strings. You may already be somewhat familiar with the syntax if you know Ruby.

Sigils start with a tilde ~, followed by a letter (either upper- or lower case) and the a set of delimiters usually {}, although the delimiters can be any non-alphanumeric set of complimentary characters, i.e. <>, {}, [], “”, ‘’ //, ||. Using triple quoting (either with the single-quote character or the double-quote character) the this represents a heredoc.

The letter used determines what type of sigil it is. You will normally use sigils for regular expressions (r{}), string lists (s{}) or word lists (~w{}). A more comprehensive list of these can be found on the Sigils page of the Elixir documentation.