4.4. Configuration

To date, operational tests show that a Tornado Front End server offloads 30-40% of the reader work for a given NNTP connection from a Tornado Back End spool server. The work includes most of the seek-intensive jobs that substantially slow down a disk array leaving only the high throughput/low seek time tasks RAID arrays are best at. Empirical evidence suggests that using Tornado should increase a disk array's useful reader load by as much as an order of magnitude.

A full, production installation of a Tornado Front End will only require 100-180G of disk, but be able to easily represent 100 times that in articles by communicating with Tornado Back Ends.

4.4.1. Tornado Back End Installation

A Tornado Back End is installed and configured as a normal Twister would be, in fact, an existing Twister installation can be used, unmodified, making Tornado integration into a network quite simple and straightforward, with no impact to existing users.

If the Tornado Back End is going to serve regular readers in addition to being the back-end spool server, then no special configuration needs to be made. On the other hand, if the Tornado Back End is only going to serve Tornado Front End servers (the strongly recommended architecture), then the overview caches are not needed and the disk space (50G or more) can be recovered. Using overview caches will offer no benefit and additionally cause a performance hit during a cache write.

4.4.2. Tornado Back End Configuration

4.4.3. Tornado Back End Operation

When a Tornado Back End server is built, two directives existing in tornado_be.conf define a Dummy Feed; DummyFrontendQueuePath <path> and `DummyFrontendQueueSize <number>`. A dummy feed in Tornado Back End is similar to a Feed to Nowhere in Cyclone. The dummy feed is a queue of DummyFrontendQueueSize articles that Tornado Back End has stored up ready to unload to any new Tornado Front End defined. If the dummy feed were not used, Tornado Back End would need to comb its spools to backfill articles to the Tornado Front End servers, a costly proposition.

Several very important events happen when a Tornado feed object is defined with BackFillTornado enabled. First, if a dummy feed exists, the following process is all implicit and has already taken place.

If a dummy feed is new, all articles received by the Tornado Back End are queued up (based on subscription) to every defined Tornado Front End, regardless of whether or not that particular Tornado is connected. Once this queue is full, older articles are pushed from the back. Two files are maintained to store a downstream Tornado Back End's backlog: <name of server>.trak and <name of server>.spool.

The trak file contains a position log of all spools marked for backfill (See below) and is kept current, even through multiple restarts. The spool file is the backlog file. Since the trak file is the only means the Tornado Back End has of knowing what it has sent, if the trak file is deleted, the entire contents of the spools will be re-sent.

During operational tests, a heavyweight server with very fast RAID, a 100baseT NIC and no other load was able to pull 2,000 headers/sec from 3 Back End servers (aggregate). A fast x86 box with a good fibre channel JBOD averages 300/sec. Tests continue in efforts to relieve any bottlenecks. Since a normal feed is 15-20 articles/sec, these rates are quite sufficient.

Note

If more than one full feed is presented to a Tornado Front End, that must be taken into account for header transferal. Theoretically, Tornado Front End supports up to 254 full feeds, but this would result in 5,000+ headers per second attempting to arrive, even a 4-processor high-end server with a dedicated enclosure full of RAID could only manage half of this. The practical limitation is likely 20-30 Tornado Back Ends. Scalability for partial feeds increments proportionally.

The second event that happens when a new Tornado object is defined, is the queuing of all existing articles. The Tornado Back End server travels all of its spools, pulls out articles, and queues them up for all Tornado objects whose subscriptions match.

The backfill operation is usually only desirable for some of the Tornado Back End's spools, namely the text spools and is not recommended for binary spools. Sending binary spool data is a very heavyweight operation with diminishing returns that will degrade performance over several days.

A spool is not sent to a downstream Tornado Front End unless it is marked as BackFillTornado in twister.conf/tornado_be.conf.

For Example:

<spool>
.
.
BackFillTornado         Yes
.
.
</spool>

4.4.4. Tornado Front End Configuration

A Tornado Front End is set up just like Typhoon, with a few minor, but important differences.

BackEndInterface is a synonym for OutgoingInterface in tornado_be.conf. Setting the BackEndInterface directive allows you to control what network interface Tornado Back End will be contacted on.

Tornado Front End has several new Feed directives:

<Feed>
BackendFeedName         backend_server      [text string]
BackendHostName         backend_server.com  [valid hostname]
PortNumber              219                 [1-65536 - default 119]
BackendHostNumber       10                  [1-254]
BackendPriority         5                   [1-254]
NumberOfStreams         15                  [optional but HIGHLY recommended]
CommitPercent           0                   [0-100]
DelayTCPFailover        False               [boolean]
RetentionHours_1        72                  [positive integer]
RetentionGroups_1       *                   [glob]
RetentionHours_2        72                  [positive integer]
RetentionGroups_2       *,!*binaries*       [glob]
TornadoSpoolMode        nfs|direct          [optional, do not use unless presented with I/O problems
MessageIDDepth          50                  [positive integer]
MessageIDMultiPath      /data/spool01,\     [comma separated list of file system paths.  Maximum 10 paths]
                        /data/spool02,\
                        /data/spool03,\
                        /data/spool04,\
                        /data/spool05,\
                        /data/spool06
.
.
.
</Feed>

BackendFeedName and BackendHostName

Describe the name and location of the back end server and must be unique. Initial versions of Tornado Back End used UpstreamFeedName and UpstreamHostname. Use of these directives was confusing and interfered with UpstreamHostNames currently used in tornado.conf/typhoon.conf,twister.conf/tornado_be.conf. As a result, the tokens were changed and Backend is now the recommended usage. Upstream will still work, but is deprecated and will eventually be removed. UpstreamFeedName, UpstreamHostname, UpstreamHostNumber, and UpstreamPriority all fall into this category..

BackendHostNumber

A numeric value used to describe/locate this upstream Tornado Back Ends offerings. BackendHostNumber must never change once assigned. Once an BackendHostNumber is assigned, if it is changed or re-assigned to a different upstream server, the behavior of Tornado Front End is undefined but will certainly result in massive retention and article loss.

BackendPriority

Describes the order in which an upstream Tornado Back End is checked should a given article exist on multiple Tornado Back End. The server defined by the above feed object has an BackendPriority of 5 and will be checked before 6-254, but after 1-4 if those BackendPriority directives exist. If servers are given the same priority, and an article is found to be on all of them, they will be visited in random order before any lower priority Tornado Back End is checked, thus load-balancing servers of equal priority.

NumberOfStreams

One of Tornado Front End's shining points. Chaining would require anywhere from 50-150 connections to be opened to the back end, however because of the massively reduced transaction count and load-balancing of high-speed sources, Tornado Front End requires far fewer. Preliminary tests show ratios as high as 50 users to 1 stream are sufficient. If a connection is dropped for any reason, the Tornado Front End re-establishes it with Tornado Back End.

CommitPercent

Dictates what percentage of articles fetched for reading will be committed to the local cache (If local cache is defined with Spool objects in tornado.conf. Default is 100.

DelayTCPFailover

The default behavior of the Tornado Front End (when using shared spools) is to attempt to find an article on the specified locally mounted filesystem. If there is a problem of any kind (fs not mounted,or the article is corrupt) it immediately attempts to request the article directly from Tornado Back End via the TCP link. If there are multiple local servers presenting shared spools, it might be desirable to have Tornado Front End NOT ask for the article over the local link until it has first checked all shared spools, then go back in a second pass and query the TCP links. The default is false (the check is not delayed) but may be set true on a per-feed basis. Though in typical use it will either be set true for all or none, it may be desirable to set it true for only some.

RetentionHours/RetentionGroups

Allow fine-tuning of the retention of various groups (72 hours for binary, 72 for !binary as an example).

TornadoSpoolMode

Like running Typhoon or Twister with -nfs or -direct. nfs bypasses mmap() calls and uses pread() instead. direct only works with some file systems (like UFS) and circumvents the OS file system cache.

MessageIDDepth

Tornado Front End makes uses of a custom, distributed history database for looking up messages by Message ID. The depth is defined as the millions of Message ID's to keep for this Tornado Back End. The older -history option is not needed when using this feature. With the rising trend of news clients using Message-ID lookup lists, this directive is strongly recommended. For binary servers, set the depth to at least 16 (Million) and for Text servers, 150 (Million) will cover many months worth of articles.

MessageIDMultiPath

List of file system paths to spread this history over. Delimit paths with commas.