webservice daemon: php vs node.js vs python/twisted

The same web service was developed in php, node.js and Twisted/python. The winner is Twisted/python, primarily because it provides a robust asynchronous programming model, can be confortably debugged and deployed.

Warning: the code for all three implementation is available at http://wspool.dachary.org/ but the services are not running on the machine. Despite of this, some parts of the JavaScript interface may appear to be working because they run on the user browser. Please do not submit complaints because the web services are not active : they are not supposed to be 😉

Implementation and tests

The following table links to the code for each implementation and their associated tests:

hofosm

php based implementation with phpunit and xdebug based webservice tests
JQuery based implementation with qunit and jscoverage based user interface tests

seeks monitor / node.js

node.js based implementation with expresso based webservice tests
JQuery based implementation with qunit and jscoverage based user interface tests

wspool

Python/Twisted based implementation with Twisted Trial and python-coverage based webservice tests
JQuery based implementation with qunit and jscoverage based user interface tests

pros and cons

Each webservice implementation has pros and cons but the overall winner is wspool and its Twisted/Python implementation. Instead of trying to present a pseudo impartial side by side comparison the idea is to explain why it is preferred. Although node.js is not be mature in a number of ways, it is still in its infancy and there is a good chance that it will mature quickly to resolve these issues. PHP is probably never going to be a challenger because it is not designed or used for standalone daemons.

Libraries

The only two libraries needed are SQLite and something similar to cURL. Regarding URL fetching, both PHP and Twisted provide libraries that allow to follow redirections and attach distinctive behavior when an error status is returned. By contrast the node.js library requires about a hundred lines of code to implement the same level of service. In addition the SQLite interface available at the time was unstable and a few attempts were necessary to figure out a working combination.

Daemon

PHP rely on a server such as apache and there is no wide spread daemon implementation and a custom server must be used and adapted. node.js provides a daemon but when seeks-monitor was run for more than 24h it crashed with an error while trying to load a URL. It seems that long lived node.js daemons are still uncommon and a majority of users tend to restart their server from time to time. In other words, running node.js daemon for a long time is possible but is still a fairly rare expertise.
For deployment, Twisted handles logging, rotating logs, storing the PID etc. by default. These features must be explicitly supported for node.js and the custom PHP server. init scripts can be found and adapted for Twisted but must be hand crafted for node.js and PHP.

Learning curve

For PHP or Twisted/Python there is a need to learn a language and the associated environment in addition to JavaScript which is a significant effort. Because PHP has no support for asynchronous programming, there is no concepts to learn about. With both Twisted and node.js, getting to understand the programming style associated to deferred and async.js is a significant effort.

Debugging

Although a debugger is available for both PHP and node.js it is rarely used. Debugging is most commonly done by adding log messages to the code. With Twisted the pdb debugger is widely used as well as manhole to examine the state of a running process.

Performances

When stressed with Apache benchmark with a 10,000 URLs database to retrieve the first 500 entries, PHP behind nginx takes ~50ms, node.js and Twisted both take 15ms. The node.js daemon leaked because of the SQLite library adaptor.

Packaging

To create a compliant Debian GNU/Linux package all the dependencies for PHP and Twisted can be found in the latest stable distribution. Packaging node.js requires to explicitly name the versions that are known to work together, as shown at the end of the Makefile. It is then necessary to figure out a way to embed node.js itself and all its dependencies within the package source because the Debian policy forbids loading external resources while installing a package. The alternative would be to create independent packages for node.js, the associated packaging system and all the dependencies.

Event based asynchronous programming model

CPU, Disk and RAM requirements are very low and the daemon should be able to handle thousands of simultaneous connections at the same time while keeping a low CPU usage and memory footprint. That rules out apache style prefork and even threads in favor of fibers or Green Threads.
PHP does not include a library to help in this regard, mainly because this is not how most people use it. This requires writing a fair amount of infrastructure code that is unrelated to the deamon functionalities.
node.js has developped a systematic approach where all functions are given a callback which is called when the function completes (error or success). It encourages various programing styles which are supported by a number of libraries. The async.js library was picked and using it looks like this:

  self.handle = function(req, res, callback) {
    async.waterfall([
                     self.db_check,
                     function(exists, callback) {
                       if(req.param('submit') !== undefined) {
                         self.submit(req, callback);
                       } else {
                         self.get(req, callback);
                       }
                     }
                    ],
                    function(error, result) {
                      var buffer = JSON.stringify(error ? {'error': error} : result);
                     res.send(buffer);
                      callback(undefined, buffer);
                    });
  };

and the associated test look like:

 test_submit: function(done) {
    var monitor = new monitor_class();
    monitor.path = ':memory:';
    var asserted = 0;
    var url = 'URL';
    var comment = 'COMMENT';

    async.waterfall([
                     function(callback) {
                       monitor.handle(
                                      { param: function(name) {
                                          if(name == 'submit') { return 1; }
                                          if(name == 'url') { return url; }
                                          if(name == 'comment') { return comment; }
                                        }
                                      },
                                      { send: function(buffer) {
                                          assert.equal(buffer, '{}');
                                          asserted++;
                                        } },
                                      callback);
                     },
                     function(buffer, callback) {
                       assert.deepEqual(buffer, '{}');
                       monitor.db.prepare("SELECT * FROM urls", callback);
                     },
                     function(statement, callback) {
                       statement.fetchAll(callback);
                     }
                    ],
                    function(error, rows) {
                      var row = rows[0];
                      assert.deepEqual(url, row.url);
                      assert.deepEqual(comment, row.comment);
                      done();
                    });
  },

Twisted uses deferred which are also distributed by default since JQuery 1.5. It leads to code that looks like:

    def get(self):
        d = self.db.runQuery("SELECT * FROM urls ORDER BY alive DESC, url ASC")
        d.addCallback(lambda result: {'rows': result})
        return d

Where d is a deferred object. The associated tests looks like:

    @defer.inlineCallbacks
    def test02_get(self):
        url1 = 'URL1'
        yield self.service.submit({ 'url': [url1]})
        url2 = 'URL2'
        yield self.service.submit({ 'url': [url2]})
        yield self.service.db.runOperation("UPDATE urls SET alive = datetime('now') WHERE url = '%s'" % url1)
        yield self.service.db.runOperation("UPDATE urls SET alive = datetime('now','+1 hour') WHERE url = '%s'" % url2)
        rows = yield self.service.get()
        self.assertEquals(url2, rows['rows'][0][1])
        self.assertEquals(url1, rows['rows'][1][1])

Web service description

The hofosm webservice is designed to store URLs being submitted for review. The seeks monitor and wspool webservices periodically fetch a set of URLs and store their status.

API

The web service provides an interface to store URLs and to retrieve the list of all URLs together with meta information.

hofosm

Add url1 and url2

curl --form comment=comment1 --form url=url1 http://localhost/pool/hofosm.php?submit=1
curl --form comment=comment2 --form url=url2 http://localhost/pool/hofosm.php?submit=1

Vote for url2 assuming it has id=2

curl --form id=2 http://localhost/pool/hofosm.php?rate=1

Display database content

curl http://localhost/pool/hofosm.php

seeks monitor

Add url1

curl --form comment=comment1 --form url=url1 http://localhost:8345/resource?submit=1

Display database content:

curl http://localhost:8345/resource

wspool

curl --form url=url1 http://localhost:4923/resource?submit=1

Display database content:

curl http://localhost:4923/resource

On error the webservice will return a JSON object such as:

{"error":"Missing comment parameter"}

On success the submission of a new URL will return

{}

and the retrieval of the database will return a list of rows. In the following example the list is empty because the database is empty:

{"rows": []}

Each row is a list containing the fields. For instance:

{"rows": [[1,'URL','COMMENT','2011-03-06']]}

The actual content of the row depends on the table used to store the meta information which is slightly different between the three implementations. However, they all contain a unique numerical id and a url.

User interface

The JavaScript user interface displays a jquery based sortable list of the database content and a form to submit a new entry. It is
redundant with the webservice interface and meant to be embedded in a web page as follows:

        <div id='pool' />
        <script>
          $('#pool').hofosm();
        </script>