This post is mirrored on my blog.

In order to effectively write applications that communicate via sockets, there were some realizations I needed to make that weren't explicitly told to me by any of the documentation I read.

If you have experience writing applications using sockets, all of this information should be obvious to you. It wasn't obvious to me as an absolute beginner, so I'm trying to make it more explicit in the hopes of shortening another beginner's time getting their feet wet with sockets.

TCP reliability vs. application reliability

TCP guarantees reliability in regards to the stream; it does not guarantee that every send() was recv()'d by the connection. This distinction is important. It took me a while to realize it.

The core problem I was trying to solve is how to cleanly handle a network partition, which is when a machine A and another machine B become completely disconnected. TCP, of course, cannot ensure your messages are delivered if the machine is off, or disconnected from the network. TCP will keep the data in its send buffers for a while, then eventually time out and drop the data. I'm sure there's more to it than that, but from an application perspective that's all I need to know.

The implications of this are important. If I send() a message, I have no guarantees that the other machine will recv() it if it is suddenly disconnected from the network. Again, this may be obvious to an experienced network programmer, but to an absolute beginner like me it was not. When I read "TCP ensures reliable delivery" I mistakenly thought that meant that e.g. send() would have blocked and returned success after the recv() end successfully received the message.

Such a send() could be written, and it would then guarantee at the application level that the messages definitely got to the receiving application, and that they were read by the receiving application. However, this would grind application interaction to a halt, because every call to this send() would cause the application to wait for confirmation that the other application received it.

Instead, we hope that the other application is still connected, and fill up one or many send() calls in a buffer that TCP handles for us. TCP then does its best to get that data to the other application, but in the event of a disconnect we effectively lose all of it.

Application reliability

Application developers need to decide how their application reacts to unexpected disconnects. On each bit of data you send, you need to decide how hard you'll try to know it actually got to the receiving application.

The counter-intuitive thing about this is it means implementing acknowledge messages, labeling messages with IDs, creating a buffer and system to re-send messages, and/or (depending on the application) possibly even timeouts associated with each message. That sounds a lot like TCP, doesn't it? The difference is that you are not dealing with the unreliability of UDP like TCP is. You are dealing with the unreliability of networked machines staying on and connected in general.

It may seem annoying that you may need to implement all of these things, but it does allow your application to gain some interesting abilities:

  • You can store packets of data on the hard drive, then if the entire application or machine crashes, you can still try to send that data when everything starts back up.
  • You can allow disconnects during long-running operations, then when the two machines eventually reconnect, the operation's results can be shared.
  • You can decide on a case-by-case basis how hard you want to try to confirm delivery. For example, I might try very hard to report a long operation is completed, but I might not care as much about dropping the data which reports the operation's progress over time. The former might make the user think they must re-run the potentially costly operation, but the latter might just make a progress bar move a little more erratically.

You may not need to care about application-level reliability. Many applications simply exit when a disconnect occurs at an unexpected time. In my case, I wanted my applications to gracefully continue by attempting to reestablish the connection every so often. This meant I needed a separate reconnect loop which would sleep for a bit, then attempt to reconnect and resume normal operation if successful.

I have not implemented the application-level reliability layer in my application yet because I am not too concerned if any of the data isn't eventually received. This is a decision that must be made on a case-by-case basis, however. If, for example, I run a build that takes two hours, but the "build success" message is dropped due to a disconnect, I might end up wasting another two hours re-running the build unnecessarily. If I had application-level reliability, I would know that the build succeeded. The trade-off to implementing this is added development time and system complexity, but it may be worth it.

recv() and SIGPIPE

I found it very confusing that I had to attempt to recv() from a socket and fail in order to even tell that the connection was no longer active. I expected that I would call e.g. isconnected() on the sockets after accept() tells me something happened to it. It does make sense to me now that it's better to have recv() fail and tell me about the disconnect. Otherwise, I might mistakenly assume that if I call isconnected() I am then guaranteed to have a good recv(). By keeping the disconnect tied to recv() failing, I know I need to handle potential disconnects at any recv() invocation. The same goes for send().

On Linux, I also needed to disable signaling on the recv() so that I could handle the connection error inline rather than need to register a signal handler. I opted to add the MSG_NOSIGNAL to both send() and recv() and handle potential disconnect errors at each call. This might not be as idiomatic on Linux, where a signal handler might be more common, but it gives me a bit more control as an application developer. It also works better when I port to Windows, which doesn't use signals to report disconnects.

Don't use Linux "everything is a file" APIs with sockets

Linux allows you to treat sockets as if they are file descriptors. This is neat because you can then make your application support streaming to/from a file or a socket with the same code.

However, Windows does not treat sockets the same as files. If you want to use native Windows APIs, you must use the functions dedicated to them: send(), recv(), closesocket(), etc.

I would argue that the Linux abstraction should not be used from a robustness standpoint. How you handle a file no longer existing vs. a socket disconnection are not likely to be very similar. I'm sure I'll get counter arguments to this, and that you should write your applications to treat these the same. I care about strong Windows support, so even if I'm wrong, my hands are tied anyways.

You could of course write your own abstraction layer for these, but again, the performance and reliability factors of files vs. sockets are quite different. It seems if you can treat them differently, you should, if only for the awareness and control. I will also ask: how often are you writing applications that want to accept either files or sockets? In my experience that sort of thing is a definite minority of cases. I usually know where my data is going, and usually want to know so that I can make more educated decisions about performance.

The application's main select() loop

The application knows when it needs to write to a socket. It does not necessarily know when it needs to read from a socket. This means that I should only add sockets to the write list of select() when I have a message ready to send. I should always add all sockets to the read list of select() if I want the application to be flexible to receiving messages at any time.

If there are several rounds of back-and-forth that need to happen for a single operation, I could still code that in, but it becomes less flexible. It is easier to try to keep it to a single send, then handle the receive in the main select loop. This might require storing state in your metadata associated with each connection, or adding IDs to messages to associate them with other state.

By keeping rounds of select() to only sends or only receives on each socket, you handle multiple connections better. For example, you can send an order to start a long operation on another machine, then receive messages from other connections while the long operation is running. Otherwise, you would have to put the long operation's send and receive code on another thread or something to allow for other connections to be handled.

It is less of a concern if you e.g. receive a request, then can quickly put together and send a response. In those cases, you might as well just receive and send in the same iteration of select() on that connection to keep things simple. If the receiving application is coded with a similar setup, they also can decide whether to receive right after they send or go back into their select() loop.

Sockets are still cool

It took a while for me to understand what I needed to write applications to use sockets effectively. Now that I have paid that price, it feels like I've gained a new super power.

I felt similar feelings when I learned how to run sub-processes, and when I learned how to load code dynamically[^1]. These things break down barriers and open doors to new and exciting functionality.

While I have spent much longer than I expected building the project which required me to learn sockets, I am glad I did.

[^1]: If you haven't learned these, you really should. Here are the functions, to give you something to search: For running sub-processes, on Windows: CreateProcess, on Linux: fork, exec.

For dynamic loading, on Windows: LoadLibrary, GetProcAddress, on Linux: dlopen, dlsym. If you want to load code without using dynamic linking, you'll want to learn about virtual memory and mmap() (Linux) or VirtualAlloc() (Windows).

By using both sub-process execution and dynamic loading, you can have applications e.g. invoke a compiler to build a dynamic library, then immediately load that library into the same application. This is one way you could allow your users to modify and extend your application while it stays running.