One advantage of the Citadel system over other, less tightly integrated groupware packages is that it has the ability to defer potentially resource-intensive operations until off-hours, improving the interactive performance of the system during the hours that users are online and active. This is primarily used for performing “delete” operations in a batch mode. This article explains the technological underpinnings, and is mainly intended for developers.
In order to understand what's going on under the covers, there are several things you need to know about Citadel's data model:
- Any given message may exist in one or more rooms. The second and subsequent copy are not actual copies, but simply additional references to the same message number, similar to hard links in a POSIX filesystem.
- We keep a reference count of how many rooms are holding any particular message. When the message is originally saved to a room, it is set to 1. When the message is copied to an additional room, it is incremented. When the message is deleted from a room, it is decremented.
- Rooms which exist only in the namespace of a single user (in other words, a private mailbox) actually have the account's user number prepended to the name. For example, if an account has user number 12345, his inbox is actually named “0000012345.Mail”. This prefix is hidden from the client.
Here are some activities which are performed synchronously – in other words, the user must wait while they are completed.
- Saving a new message, deleting a message, copying a message: we have to adjust the reference count. This is performed using the
AdjRefCount()function, which accepts a message number and a delta (increment or decrement). In order to make this operation take as little time as possible, all we do is write these values to the end of a file called
- Deleting a room: in the past, this was a potentially time consuming operation, because we had to wait around for the system to delete every message in the room. So instead, we rename the room, prepending a bogus namespace (9999999999) for a user who does not exist. We also insert a timestamp and sequence number into the name to ensure that we don't accidentally create a name which already exists. Since the room now only exists in the namespace of a user who does not exist, it appears to have been deleted, and we only consumed a few milliseconds of server time. The real work will happen later…
Here's where the magic happens. We run a nightly batch job, affectionately known as The Dreaded Auto-Purger, which is responsible for cleaning everything up. It does a lot of work, in a very specific order to ensure that it doesn't have to run twice to get everything. The code can all be found in
modules/expire/serv_expire.c. Here's how it works.
- Purge users. If the system is configured to automatically delete inactive accounts, the user file is scanned, and the date of last login is calculated. Accounts which have not been accessed in the configured amount of time are deleted. If the system is using an external source of authentication (such as a PAM database), we instead delete accounts which no longer exist on the host system. Either way, you will note that we only delete the account itself – we are not yet deleting rooms or messages which belong to the account.
- Purge messages. For rooms which are configured to automatically expire messages older than a certain age, and for rooms which are configured to keep no more than a specific maximum number of messages online, we go into those rooms and delete the old messages. This is done similar to an interactive delete: the message pointer is removed and its reference count is decremented.
- Purge rooms. The system may be configured to automatically expire rooms which have not been accessed in a certain amount of time; if so, these rooms are deleted now. We also delete any rooms which exist in a namespace belonging to a user who does not exist. The latter conditon conveniently removes rooms which were deleted, or which belonged to a user who was deleted. Before deleting a room, we of course delete every message in the room (again with the same operation: remove the pointer, decrement the reference count).
- Purge visits. The “visits” table contains records which describe the relationship between one user and one room. It handles things like access control, seen/unseen message flags, and other flags. At this time we delete any record which refers to a user or room which no longer exists.
- Purge Use Table. The “use table” keeps track of the Message ID's of messages which recently arrived over a network, including a Citadel network, or RSS aggregation, or POP3 aggregation. In the latter two cases, these records are refreshed every time a message re-appears. We keep this data around in order to keep the same message from being imported multiple times. At this time, we delete any records which are older than a certain age.
- Purge EUID Index Table. This table is simply an index of messages by EUID, for rooms which require it. We delete records which are no longer in use.
- Purge stale OpenID associations. The OpenID Associations table maps OpenID identifiers to user numbers. At this time we delete any records which point to a user who no longer exists.
- Process the reference count adjustment queue. By this time we now have a lot of data in the reference count adjustment queue (which, you will remember, is in a flat file called
refcount_adjustments.dat). Now it is time to process this data. So we rename it to a temporary file, so that a new file can be created and written by other users that are still on the system.
- Reference count adjustments in the temporary file are then processed one at a time. The reference count for each message is kept in the message's metadata record, and we adjust it by whatever value each record specifies.
- When a message's reference count reaches zero, we know that there are no longer any references to the message anywhere on the system.
- Before deleting the message from disk, however, we first must remove it from the full-text index. That operation is performed at this time.
- After the message is de-indexed, it is finally deleted from the message database. Remember, however, that you will not see an immediate reduction of disk utilization on the host system, because Berkeley DB does not shrink its files when records are deleted. This space will be marked as unused, and new messages can potentially be stored there. Therefore on a well-managed system with a fairly consistent traffic rate and a sensible expire policy, disk utilization will initially grow until it reaches an equilibrium of new messages vs. expiring messages, and then it will stay there. On the other hand, if you have no expire policy and your users never empty their trash folders, you may expect disk utilization to grow indefinitely.