Migrating from Plone

Important things to have in mind

The migration script works by reading Plone’s web site; it does not read the Zope database. I don’t expect this change; people comfortable with the Plone/Zope API will probably not be interested in becoming Zadig developers, whereas Zadig developers are unlikely to learn the Plone/Zope API.

The fact that the migration script reads Plone’s web site has several side-effects:

  • Migration is slow.
  • Parsing of the Plone pages could be unreliable and may depend on the skin Plone is using. For example, the main content of a page is assumed to be the content of a div with id=’parent-fieldname-text’, but this might not always be the case. (My Plone sites were using the classic skin.)
  • Much of the information the migration script needs to dig out is in the Plone’s “Edit” page. Therefore, the script (which must log on to Plone as an administrator) visits the Edit page; if the page is locked, it unlocks it; and since clicking the “Cancel” button afterwards is not trivial for the script and was not very important for my sites, the script currently leaves all pages locked. Since migration may take you days or weeks of tests, your users may be viewing locked pages all the time, and, worse, their locks may be unlocked. A workaround, besides making a copy of the entire Plone site, is to do tests at a small subdirectory each time; or, maybe better, modify the migration script to skip locked pages, and to unlock those it locked after it’s done.

Another problem is that the migration is a one-off thing; once you do it, you don’t care much about it again. I’ve migrated two of my sites, there are two remaining (and they are easier), and once I’m done, I don’t think I will ever care about the migration script again; the same applies more or less to everyone, which means that the migration script will be poorly maintained. In addition, when I wrote the migration code, I did do careful work, but I did not put into it the care that I put in Zadig core (which I don’t know how many times I’ve rewritten and reorganized).

That said, I did migrate my sites with the migration script, and it worked wonderfully. Consider it, therefore, a good start, which you can use further develop until it does what you want. Further below I document its internals in order to make it easier for you. Please consider submitting your patches.

Using the migration script

The migration script is in bin/plone2zadig. It converts a Plone site to a Zadig site. It does so by visiting the Plone site on the web. The script must log on the Plone site as a user who has edit permissions on all pages (the administrator for example). Use it like this:

bin/plone2zadig config_file

The config_file a collection of lines of the form OPTION=value. It is actually a Python program that defines several variables. The options are:

The options are:

PLONE_URL

The topmost Plone url to start converting. That page and all its subpages (except those marked for exclusion) will be read and converted.

PLONE_EXCLUDE

The URIs of this tuple will be excluded from conversion. The URIs can be full URLs or relative paths whose base is PLONE_URL.

PLONE_AUTH

The value the Plone __ac cookie should have. It must be a user capable of editing all the objects from PLONE_URL downwards.

TARGET_PATH

Will migrate PLONE_URL to this Zadig path, and subobjects to subpaths.

REMOVE_TARGET

Specifies what happens if TARGET_PATH already exists. If True, the TARGET_PATH and its subentries are deleted before migration; if False (the default), its content is overwritten, as is the content of any already existing subentry. When such overwriting occurs, the existing object and the overwriting object must be of the same type, or an exception is raised.

OWNER

The objects created in Zadig will belong to this user.

DEFAULT_LANGUAGE

When a Plone page is “language neutral”, it will be considered to be in this language.

LOG_LEVEL

The logging level; one of DEBUG, INFO, WARNING, ERROR, and CRITICAL. Default is WARNING. Don’t use CRITICAL, it will miss the errors.

All the above except LOG_LEVEL and PLONE_EXCLUDE are compulsory.

Migration script internal API

In case you haven’t noticed, the migration script is quite small—530 lines at the time of this writing. This is because practically all the work is done by the Zadig core API and BeautifulSoup. You need to be comfortable with both.

The most important item in the migration script is the PloneObject class, which represents, unsurprisingly, a Plone object.

class PloneObject(url, soup=None)

PloneObject is always subclassed, as a PloneFolder, a PlonePage, and so on. However, in order to create a PloneObject instance, you merely supply the object’s URL, without needing to know what kind of object it is; the constructor will retrieve the data from the URL, find out what kind of object it is, and return an instance not of PloneObject, but of the appropriate subclass. It will visit and parse both the URL and the equivalent “edit” page. It will unconditionally unlock the page if it is locked, and when it’s finished it will leave it locked (this is a bug).

The soup argument must not be supplied, or it must be None. It is used internally, because the constructor reads the URL, decides what type of object it is, and calls the constructor of its appropriate subclass.

PloneObject instances have the following attributes:

soup

The BeautifulSoup object that represents the parsed HTML retrieved from the URL.

editsoup

The BeautifulSoup object that represents the parsed HTML of the Plone’s “edit” view for the object.

title
short_title
description

These three attributes store the title and description of the Plone object. The short_title is always empty, as Plone does not have this feature; however, sometimes it is set by subclasses; in particular, when we have a Plone folder with a default view, we consider the default view’s title to be the title and the folder’s title to be the short title.

state

The state of the Plone object as a string, such as “Private” or “Published”.

PloneObject instances of subclasses have the following method:

migrate(request, path)

This method does the migration. First it calls the same method of the superclass (which essentially creates and saves the entry), then it creates and saves the vobject, and the metatags, and finally it calls the post_migrate() method of the superclass, which does some common endwork (sets alternative language).

Now all the above is only the basic idea. It’s not perfectly documented, and the code is a bit messy, but the above should be more than enough to get you find your way in the code.