Connecting Elixir nodes in Azure

What is location transparency

"In computer networks, location transparency is the use of logical names to identify network resources, independent of both the user's location and the resource location." - Wikipedia

In distirbuted systems its is very beneficial if you can program across various nodes just like you program locally. This is what we get in Erlang/Elixir. The actor model makes this a reality. When you create an elixir process you don't care whether that process resides locally or across the globe in another server, elixir takes care of the detail of communication between them.

In this tutorial we will see how easy is it to create an Elixir cluster with two nodes in Azure and then run map reduce across that cluster.

Elixir Nodes in Azure:

Setting up VMs:

We are going to work on two seperate instances of Elixir running on Ubuntu 14.04 using Docker. For that we have to create two Azure VMs. Go to Azure Portal and create two new VM with Ubuntu 14.04. Once the VMs have been created we need to add some endpoints so the firewall allows communication through those ports. Go to your VM, All settings > Endpoints and then add the following endpoints

Elixir VM endpoints

The epmd is the Eralng Port Mapper Daemon, the ports after the 900x are for the Elixir nodes.

Now we have to install docker in the VM and pull the docker elixir image we are going to use

$ sudo apt-get update
$ sudo apt-get install docker.io
$ docker pull zabirauf/elixir

Do the same thing for the second VM. I will refer to the first VM as Batman and second VM as Superman.
Now we are ready for some Elixir fun.

Start Elixir Nodes:

Lets start docker with the pulled image zabirauf/elixir

$ docker run -i -t -p 4369:4369 -p 9001:9001 img-elixir /bin/bash

This will start the container with Elixir installed and forward the traffic comming to 4369 and 9001 to the container. Lets start the interactive elixir

$ iex --name [email protected] --cookie monster --erl "-kernel inet_dist_listen_min 9001 inet_dist_listen_max 9001"

This is where the magic happens. We start an Elixir node here. The inet_dist_listen_min & inet_dist_listen_max tells the erlang kernel that what is the minimum & maximum range of ports it can use. In our case we only want to start one node per VM so its just 9001 but this range should never be less than the number of nodes you want to start in one VM. The --name assigns a name to the distributed elixir node, it should contain the URL to the VM. The --cookie is how one node trusts the other as this cookie should be same across the nodes. It should be some random number but for simplicity I just changed it to "monster".
Do the same thing for the other VM but with the --name [email protected].

You should now have two elixir nodes started in two different VMs. Lets connect the Batman node with the Superman node

Node.connect :'[email protected]'
Node.list

The Node.list should show you the list of connected nodes. You are now ready to communicate with processes across the nodes. Thats how easy it is in Elixir to setup cluster (at least with two nodes).

Distributed map reduce

After setting up the Elixir cluster of two nodes what do we do?
Reduce

Lets do just that, lets write a very naive distributed map reduce in Elixir. We will go with the classic map reduce example of computing the count of words in documents. We will have two distinct files in each of the nodes and then call map on both the nodes, get the result and reduce it. The map step will count the occurence of words in the document and create a Map and then in reduce step we will merge the results from both map.

defmodule MapReduce do
  def map() do
    # Wait for a map message and then count the occurence of words
    receive do
      {:map, client, filename} ->
        file = File.read!(filename)
        word_count = file |> String.split |> create_map
        send client, {:result, word_count}
        map()
    end
  end

  def reduce(dict1, dict2) do
      Dict.merge(dict1, dict2, fn(k, v1, v2) -> v1+v2 end)
  end

  def create_map(words), do: create_map(words, %{})
  def create_map([], dict), do: dict
  def create_map([word | rest], dict) do
    create_map(rest, Dict.update(dict, word, 1, &(&1+1)) )
  end

end

As you can see that the above code does not have any knowledge about nodes. map() is just waiting for a message and after it receives a command to map, it reads the content of file, split into words and then counts the occurence of each word. The reduce is also simple where we just merge two maps.

Lets place the files in each of the node
On Batman node run

curl http://pastebin.com/raw.php?i=c4Dp7qLT > data.txt
curl http://pastebin.com/raw.php?i=i3XuPMb0 > mapreduce.ex

On Superman node run

curl http://pastebin.com/raw.php?i=9QR7yazZ > data.txt
curl http://pastebin.com/raw.php?i=i3XuPMb0 > mapreduce.ex

Start the iex as before on both nodes and then on Superman node run c("mapreduce.ex") to load the code of the MapReduce module. We will use Batman node to start the map reduce. Run the following on Batman node

c("mapreduce.ex")

#Connect with superman node
Node.connect :'[email protected]'

# Start the map on both nodes
n1 = Node.spawn(:'[email protected]', &MapReduce.map/0)
n2 = Node.spawn(:'[email protected]', &MapReduce.map/0)

# Send the map message to the started nodes
send n1, {:map, self, "data.txt"}
send n2, {:map, self, "data.txt"}

# Receive the results
receive do
  {:result, d1} -> word_count1 = d1
end

receive do
  {:result, d2} -> word_count2 = d2
end

# Reduce
result = MapReduce.reduce(d1, d2)

As you can see how easy it is in Elixir to build distributed system, something which is really complex in other languages+runtimes is really trivial in Elixir/Erlang world.

Reference

Connecting Elixir nodes on the same LAN