Migrating from mbox to maildir

Armijn Hemel, May 28, 2010, 6895 views.

Two popular formats for storing mail on Unix machines are mbox and maildir. Both have their advantages and disadvantages. Converting from mbox to maildir is easy to do using some Python.

Tags: ,

In the past we had a system where we were using both mbox and maildir. The main reason for this was feature creep. When we started mbox files for POP3 were OK. When people (like ourselves) wanted to use IMAP maildir was a much better solution. So we had a hybrid system with mbox for the POP3 accounts and maildir for IMAP.

When we migrated everything to a new server we decided to standardize on maildir. We still had quite a few accounts that we had to convert from mbox to maildir. With some Python code this was done easily.

First we import all the necessary modules and suck in the mbox file:

import email
import email.Errors
import mailbox
import os
import sys
import time
def msgfactory(fp):
        return email.message_from_file(fp)
    except email.Errors.MessageParseError:
        # Don't return None since that will
        # stop the mailbox iterator
        return ''
domain = sys.argv[1]
inbox = sys.argv[2]
fp = open(inbox, 'rb')
mbox = mailbox.UnixMailbox(fp, msgfactory)

We specified a parameter domain since the system hosts mail for quite a few virtual domains. For each domain we have a directory per user. In our case the mbox names equaled the user names without the domain part, so it is easy to create a directory to store the mails in.

dirname = domain + "/" + inbox 
# create top level directory, pass if it already exists
        storedir = os.mkdir(domain, 0750)
# create user directory, pass if it already exists
        storedir = os.mkdir(dirname, 0750)

In each maildir directory you will find a few directories, including new and cur, which store new mails and read mails respectively. These directories are easily created as well. Note that in our case we knew these directories did not exist, so we simply created them, without any safeguards.

os.mkdir(dirname + "/new", 0750)
os.mkdir(dirname + "/cur", 0750)

After everything had been set up it was simply a matter of reading messages from the mbox file and writing them back into the maildir directory.

mailmsg = mbox.next()
count = 0
hostname = "imap.example.org"
while (mailmsg) != None:
        hammertime = time.time()
        filename = dirname + "/cur/%s%d.%s:2,S" % (hammertime, count, hostname)
        mail = open(filename, 'w+')
        mailmsg = mbox.next()
print "mails converted:", count

One thing you notice is that we marked the mails as 'seen' (the S in the mail filename implies this). Although this would not always be correct (in case of new mail that had arrived before the user had read the mail) we made this choice deliberately: the users who would have a lot of mail in their POP account almost always had the 'leave mail on server' option enabled in their mail client. Marking messages as 'new' would have confused them, since they had already read those mails. For a few customers some new mails would indeed be marked as read, but we warned them in advance that this might happen and this turned out to be no problem at all.

Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl


respond to this article