Danga::Socket
From FlimzyWiki
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Contents |
Overview
Danga::Socket is an "event loop and event-driven async socket base class" perl module written by Brad Fitzpatrick. The documentation, however, is sparse. The official documentation says "For now, see servers using Danga::Socket for guidance. For example: perlbal, mogilefsd, or ddlockd."
So, that's exactly what I've been doing... and this page is the result of my reading of source code from projects that use Danga::Socket, and making notes for my own reference. It is my hope that these notes may also be useful to other folks out there somewhere.
Why Async?
See The C10K problem for a somewhat outdated, but still mostly relevant, discussion of why asynchronous program design makes sense.
For server applications (on Unix, specifically), there are a number of common models, that I will mention as a back-drop for why async is better (at least for some applications).
- Fork -- Each new request results in a fork of a new process to handle that request, then dies. One example is the 'init' server.
- Prefork -- This is the model most commonly used by Apache, SpamAssassin, and many other common daemons. A single master process keeps a number of child processes running. Each child process handles a single request at a time, but may handle multiple requests over time. This is generally much more efficient than the Fork model, because startup overhead only happens once per child process, rather than once per network request.
- Async -- A single process handles and juggles all requests simultaneously, handing work off to worker threads or processes. This makes more much more complicated program design (probably the main reason it's not very prevalent), but can be far more efficient. This model generally uses much less memory and has less overhead than even prefork, because while one connection may be waiting for something to happen, the same process(es) can handle other connections.
Process Flow
As someone who has never attempted async programming before, in any language, the first hurdle I had to overcome was simply understanding the theory of how async programming works.
This is by no means intended to be an all-inclusive discussion of async programming. Any claims of "how its done" are meant only to mean "how I've seen it done" or "how I do it". If you know of another way that you think is better, please use it (and let me know about it if you feel like it).
The Parent Process
When the program starts, it binds to a TCP or Unix socket, possibly spawns some worker processes, then enters an event loop. In the event loop, each active connection is checked for activity, and when activity occurs, it does something, then goes on to check the next connection. Note that while it is possible to do all work in threads, I choose to investigate how to do them in worker processes, because this is the model I intend to use, for various reasons.
Here is my (most likely poor) attempt at ASCII art demonstrating the concept:
----- Beginning of event loop | +-(1)-------- Listen to TCP or Unix socket | +-(2)-------- A network connection from somewhere | +-(3)-------- A worker process | +-(4)-------- A worker process | +-(5)-------- A network connection from somewhere else | ----- End of event loop
In the diagram, there are two worker processes (communicating over a pipe with the parent process) and two network connections. When a network request comes in to do something, the parent process assigns that work to an idle worker process, then enters the event loop again. When the worker process is done, the parent process takes the result and hands it to the network connection.
As an example, the network client connected to slot 2 makes a request for a directory listing of files it can download. The parent process, on its next iteration through the loop, takes the request at slot 2. It finds that worker process (4) is idle, so asks worker process (4) to formulate a directory listing. It then continues through the loop, slots 3,4, and 5 are all idle. It loops again a few dozen more times, then worker process (4) finishes its directory listing, and returns it to the parent process. The parent process takes the output from the worker and passes it on to the client network connection in slot 2.
The advantage here is that if the client network connection in slot 5 had also requested some information at any time, the parent process could have handed it off to worker process 3, while worker process 4 was still formulating the directory listing for client network connection 2.

