Connecting Elixir nodes in Azure
What is location transparency
"In computer networks, location transparency is the use of logical names to identify network resources, independent of both the user's location and the resource location." - Wikipedia
In distirbuted systems its is very beneficial if you can program across various nodes just like you program locally. This is what we get in Erlang/Elixir. The actor model makes this a reality. When you create an elixir process you don't care whether that process resides locally or across the globe in another server, elixir takes care of the detail of communication between them.
In this tutorial we will see how easy is it to create an Elixir cluster with two nodes in Azure and then run map reduce across that cluster.
Elixir Nodes in Azure:
Setting up VMs:
We are going to work on two seperate instances of Elixir running on Ubuntu 14.04 using Docker. For that we have to create two Azure VMs. Go to Azure Portal and create two new VM with Ubuntu 14.04. Once the VMs have been created we need to add some endpoints so the firewall allows communication through those ports. Go to your VM, All settings > Endpoints and then add the following endpoints
The epmd is the Eralng Port Mapper Daemon, the ports after the 900x are for the Elixir nodes.
Now we have to install docker in the VM and pull the docker elixir image we are going to use
$ sudo apt-get update
$ sudo apt-get install docker.io
$ docker pull zabirauf/elixir
Do the same thing for the second VM. I will refer to the first VM as Batman and second VM as Superman.
Now we are ready for some Elixir fun.
Start Elixir Nodes:
Lets start docker with the pulled image zabirauf/elixir
$ docker run -i -t -p 4369:4369 -p 9001:9001 img-elixir /bin/bash
This will start the container with Elixir installed and forward the traffic comming to 4369 and 9001 to the container. Lets start the interactive elixir
$ iex --name batman@hostname.cloudapp.net --cookie monster --erl "-kernel inet_dist_listen_min 9001 inet_dist_listen_max 9001"
This is where the magic happens. We start an Elixir node here. The inet_dist_listen_min & inet_dist_listen_max tells the erlang kernel that what is the minimum & maximum range of ports it can use. In our case we only want to start one node per VM so its just 9001 but this range should never be less than the number of nodes you want to start in one VM. The --name assigns a name to the distributed elixir node, it should contain the URL to the VM. The --cookie is how one node trusts the other as this cookie should be same across the nodes. It should be some random number but for simplicity I just changed it to "monster".
Do the same thing for the other VM but with the --name superman@hostname.cloudapp.net
.
You should now have two elixir nodes started in two different VMs. Lets connect the Batman node with the Superman node
Node.connect :'superman@hostname.cloudapp.net'
Node.list
The Node.list
should show you the list of connected nodes. You are now ready to communicate with processes across the nodes. Thats how easy it is in Elixir to setup cluster (at least with two nodes).
Distributed map reduce
After setting up the Elixir cluster of two nodes what do we do?
Lets do just that, lets write a very naive distributed map reduce in Elixir. We will go with the classic map reduce example of computing the count of words in documents. We will have two distinct files in each of the nodes and then call map on both the nodes, get the result and reduce it. The map step will count the occurence of words in the document and create a Map and then in reduce step we will merge the results from both map.
defmodule MapReduce do
def map() do
# Wait for a map message and then count the occurence of words
receive do
{:map, client, filename} ->
file = File.read!(filename)
word_count = file |> String.split |> create_map
send client, {:result, word_count}
map()
end
end
def reduce(dict1, dict2) do
Dict.merge(dict1, dict2, fn(k, v1, v2) -> v1+v2 end)
end
def create_map(words), do: create_map(words, %{})
def create_map([], dict), do: dict
def create_map([word | rest], dict) do
create_map(rest, Dict.update(dict, word, 1, &(&1+1)) )
end
end
As you can see that the above code does not have any knowledge about nodes. map()
is just waiting for a message and after it receives a command to map, it reads the content of file, split into words and then counts the occurence of each word. The reduce is also simple where we just merge two maps.
Lets place the files in each of the node
On Batman node run
curl http://pastebin.com/raw.php?i=c4Dp7qLT > data.txt
curl http://pastebin.com/raw.php?i=i3XuPMb0 > mapreduce.ex
On Superman node run
curl http://pastebin.com/raw.php?i=9QR7yazZ > data.txt
curl http://pastebin.com/raw.php?i=i3XuPMb0 > mapreduce.ex
Start the iex as before on both nodes and then on Superman node run c("mapreduce.ex")
to load the code of the MapReduce module. We will use Batman node to start the map reduce. Run the following on Batman node
c("mapreduce.ex")
#Connect with superman node
Node.connect :'superman@hostname.cloudapp.net'
# Start the map on both nodes
n1 = Node.spawn(:'superman@hostname.cloudapp.net', &MapReduce.map/0)
n2 = Node.spawn(:'batman@hostname.cloudapp.net', &MapReduce.map/0)
# Send the map message to the started nodes
send n1, {:map, self, "data.txt"}
send n2, {:map, self, "data.txt"}
# Receive the results
receive do
{:result, d1} -> word_count1 = d1
end
receive do
{:result, d2} -> word_count2 = d2
end
# Reduce
result = MapReduce.reduce(d1, d2)
As you can see how easy it is in Elixir to build distributed system, something which is really complex in other languages+runtimes is really trivial in Elixir/Erlang world.