mercurial: implement import hook for handling C/Python modules
authorGregory Szorc <gregory.szorc@gmail.com>
Thu, 03 Dec 2015 21:37:01 -0800
changeset 27220 4374d819ccd5
parent 27219 beb60a898dd0
child 27221 ab776610fc6d
mercurial: implement import hook for handling C/Python modules There are a handful of modules that have both pure Python and C extension implementations. Currently, setup.py copies files from mercurial/pure/*.py to mercurial/ during the install process if C extensions are not available. This way, "import mercurial.X" will work whether C extensions are available or not. This approach has a few drawbacks. First, there aren't run-time checks verifying the C extensions are loaded when they should be. This could lead to accidental use of the slower pure Python modules. Second, the C extensions aren't compatible with PyPy and running Mercurial with PyPy requires installing Mercurial - you can't run ./hg from a source checkout. This makes developing while running PyPy somewhat difficult. This patch implements a PEP-302 import hook for finding and loading the modules with both C and Python implementations. When a module with dual implementations is requested for import, its import is handled by our import hook. The importer has a mechanism that controls what types of modules we allow to load. We call this loading behavior the "module load policy." There are 3 settings: * Only load C extensions * Only load pure Python * Try to load C and fall back to Python An environment variable allows overriding this policy at run time. This is mainly useful for developers and for performing actions against the source checkout (such as installing), which require overriding the default (strict) policy about requiring C extensions. The default mode for now is to allow both. This isn't proper and is technically backwards incompatible. However, it is necessary to implement a sane patch series that doesn't break the world during future bisections. The behavior will be corrected in future patch. We choose the main mercurial/__init__.py module for this code out of necessity: in a future world, if the custom module importer isn't registered, we'll fail to find/import certain modules when running from a pure installation. Without the magical import-time side-effects, *any* importer of mercurial.* modules would be required to call a function to register our importer. I'm not a fan of import time side effects and I initially attempted to do this. However, I was foiled by our own test harness, which has numerous `python` invoked scripts that "import mercurial" and fail because the importer isn't registered. Realizing this problem is probably present in random Python scripts that have been written over the years, I decided that sacrificing purity for backwards compatibility is necessary. Plus, if you are programming Python, "import" should probably "just work." It's worth noting that now that we have a custom module loader, it would be possible to hook up demand module proxies at this level instead of replacing __import__. We leave this work for another time, if it's even desired. This patch breaks importing in environments where Mercurial modules are loaded from a zip file (such as py2exe distributions). This will be addressed in a subsequent patch.
mercurial/__init__.py
setup.py
--- a/mercurial/__init__.py	Thu Dec 03 10:56:05 2015 -0500
+++ b/mercurial/__init__.py	Thu Dec 03 21:37:01 2015 -0800
@@ -0,0 +1,109 @@
+# __init__.py - Startup and module loading logic for Mercurial.
+#
+# Copyright 2015 Gregory Szorc <gregory.szorc@gmail.com>
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+
+from __future__ import absolute_import
+
+import imp
+import os
+import sys
+
+__all__ = []
+
+# Rules for how modules can be loaded. Values are:
+#
+#    c - require C extensions
+#    allow - allow pure Python implementation when C loading fails
+#    py - only load pure Python modules
+modulepolicy = '@MODULELOADPOLICY@'
+
+# By default, require the C extensions for performance reasons.
+if modulepolicy == '@' 'MODULELOADPOLICY' '@':
+    # TODO change to 'c' once installer is changed.
+    modulepolicy = 'allow'
+
+# Environment variable can always force settings.
+modulepolicy = os.environ.get('HGMODULEPOLICY', modulepolicy)
+
+# Modules that have both Python and C implementations. See also the
+# set of .py files under mercurial/pure/.
+_dualmodules = set([
+    'mercurial.base85',
+    'mercurial.bdiff',
+    'mercurial.diffhelpers',
+    'mercurial.mpatch',
+    'mercurial.osutil',
+    'mercurial.parsers',
+])
+
+class hgimporter(object):
+    """Object that conforms to import hook interface defined in PEP-302."""
+    def find_module(self, name, path=None):
+        # We only care about modules that have both C and pure implementations.
+        if name in _dualmodules:
+            return self
+        return None
+
+    def load_module(self, name):
+        mod = sys.modules.get(name, None)
+        if mod:
+            return mod
+
+        mercurial = sys.modules['mercurial']
+
+        # Unlike the default importer which searches special locations and
+        # sys.path, we only look in the directory where "mercurial" was
+        # imported from.
+
+        # imp.find_module doesn't support submodules (modules with ".").
+        # Instead you have to pass the parent package's __path__ attribute
+        # as the path argument.
+        stem = name.split('.')[-1]
+
+        try:
+            if modulepolicy == 'py':
+                raise ImportError()
+
+            modinfo = imp.find_module(stem, mercurial.__path__)
+
+            # The Mercurial installer used to copy files from
+            # mercurial/pure/*.py to mercurial/*.py. Therefore, it's possible
+            # for some installations to have .py files under mercurial/*.
+            # Loading Python modules when we expected C versions could result
+            # in a) poor performance b) loading a version from a previous
+            # Mercurial version, potentially leading to incompatibility. Either
+            # scenario is bad. So we verify that modules loaded from
+            # mercurial/* are C extensions. If the current policy allows the
+            # loading of .py modules, the module will be re-imported from
+            # mercurial/pure/* below.
+            # TODO uncomment once setup.py is updated to actually install
+            # into mercurial/pure.
+            #if modinfo[2][2] != imp.C_EXTENSION:
+            #    raise ImportError('.py version of %s found where C '
+            #                      'version should exist' % name)
+
+        except ImportError:
+            if modulepolicy == 'c':
+                raise
+
+            # Could not load the C extension and pure Python is allowed. So
+            # try to load them.
+            from . import pure
+            modinfo = imp.find_module(stem, pure.__path__)
+            if not modinfo:
+                raise ImportError('could not find mercurial module %s' %
+                                  name)
+
+        mod = imp.load_module(name, *modinfo)
+        sys.modules[name] = mod
+        return mod
+
+# We automagically register our custom importer as a side-effect of loading.
+# This is necessary to ensure that any entry points are able to import
+# mercurial.* modules without having to perform this registration themselves.
+if not any(isinstance(x, hgimporter) for x in sys.meta_path):
+    # meta_path is used before any implicit finders and before sys.path.
+    sys.meta_path.insert(0, hgimporter())
--- a/setup.py	Thu Dec 03 10:56:05 2015 -0500
+++ b/setup.py	Thu Dec 03 21:37:01 2015 -0800
@@ -177,11 +177,9 @@
 
 version = ''
 
-# Execute hg out of this directory with a custom environment which
-# includes the pure Python modules in mercurial/pure. We also take
-# care to not use any hgrc files and do no localization.
-pypath = ['mercurial', os.path.join('mercurial', 'pure')]
-env = {'PYTHONPATH': os.pathsep.join(pypath),
+# Execute hg out of this directory with a custom environment which takes care
+# to not use any hgrc files and do no localization.
+env = {'HGMODULEPOLICY': 'py',
        'HGRCPATH': '',
        'LANGUAGE': 'C'}
 if 'LD_LIBRARY_PATH' in os.environ: