Migrating from mbox to maildir
May 28, 2010,
Two popular formats for storing mail on Unix machines are mbox and maildir. Both have their advantages and disadvantages. Converting from mbox to maildir is easy to do using some Python.
In the past we had a system where we were using both mbox and maildir. The main reason for this was feature creep. When we started mbox files for POP3 were OK. When people (like ourselves) wanted to use IMAP maildir was a much better solution. So we had a hybrid system with mbox for the POP3 accounts and maildir for IMAP.
When we migrated everything to a new server we decided to standardize on maildir. We still had quite a few accounts that we had to convert from mbox to maildir. With some Python code this was done easily.
First we import all the necessary modules and suck in the mbox file:
#!/usr/bin/python import email import email.Errors import mailbox import os import sys import time def msgfactory(fp): try: return email.message_from_file(fp) except email.Errors.MessageParseError: # Don't return None since that will # stop the mailbox iterator return '' domain = sys.argv inbox = sys.argv fp = open(inbox, 'rb') mbox = mailbox.UnixMailbox(fp, msgfactory)
We specified a parameter domain since the system hosts mail for quite a few virtual domains. For each domain we have a directory per user. In our case the mbox names equaled the user names without the domain part, so it is easy to create a directory to store the mails in.
dirname = domain + "/" + inbox # create top level directory, pass if it already exists try: storedir = os.mkdir(domain, 0750) except: pass # create user directory, pass if it already exists try: storedir = os.mkdir(dirname, 0750) except: pass
In each maildir directory you will find a few directories, including new and cur, which store new mails and read mails respectively. These directories are easily created as well. Note that in our case we knew these directories did not exist, so we simply created them, without any safeguards.
os.mkdir(dirname + "/new", 0750) os.mkdir(dirname + "/cur", 0750)
After everything had been set up it was simply a matter of reading messages from the mbox file and writing them back into the maildir directory.
mailmsg = mbox.next() count = 0 hostname = "imap.example.org" while (mailmsg) != None: count+=1 hammertime = time.time() filename = dirname + "/cur/%s%d.%s:2,S" % (hammertime, count, hostname) mail = open(filename, 'w+') mail.write(mailmsg.as_string()) mailmsg = mbox.next() print "mails converted:", count
One thing you notice is that we marked the mails as 'seen' (the S in the mail filename implies this). Although this would not always be correct (in case of new mail that had arrived before the user had read the mail) we made this choice deliberately: the users who would have a lot of mail in their POP account almost always had the 'leave mail on server' option enabled in their mail client. Marking messages as 'new' would have confused them, since they had already read those mails. For a few customers some new mails would indeed be marked as read, but we warned them in advance that this might happen and this turned out to be no problem at all.