Using python3 import hooks to fix import module code on-the-fly
My current company was genenrating scala code from thrift files. The problem was, since the main application code was going to be written in scala, they never bother to make sure to name their thrift attributes to avoid collisions with keywords from other programming languagues other than scala. That prevents me and others from writing client for our services in Python for example. Here’s an example:
enum ServiceErrorCode {
None = 0,
Exception = 1,
InvalidInput = 3,
NotFound = 4,
Duplicate = 5,
AuthenticationFailed = 10
}
And here’s another example:
enum TrueFalse {
False = 0,
True = 1,
Unknown = 2147483647
}
Having worked on another project where we were using generated code before, I knew that changing code manually and checked in manually-edited code would never work. It just takes too much effort to maintain the modified repo. This time, I was lucky because the target language is python. Python3 supports import hooks where one can add custom logic while importing code. This is python3 could support loading modules from zipfile, from binary code, etc…
The idea is to write a importer object which has 2 parts:
- The module finder: an object with method
find_module(self, fullname, path)
whose job is when given a module name and module path, return a module loader (could be itself) to say that it can process this module. If it returnsNone
, it means that this module finder can not process this module - The module loader: an object with method
load_module(self, name)
whose job is given a module name, load and return a module object. Reaching this step, the importer object has to do the importing (and put the module object intosys.modules
) or else the module is ignored. That module will not be processsed by other importer in the importer list. This is where most of the magic happens
Some existing application of the import hooks are loading modules from database, from binary code, from zip files, loading over the network and so on.
Concretely, the code is below:
"""
This file is used to fix syntax error caused by generated thrift code
on the fly when importing
"""
import logging
import imp
import sys
import pdb
import re
import importlib
from importlib.machinery import BuiltinImporter
class ThriftImporter(object):
def __init__(self, *args):
self.module_names = args
self.path = None
def find_module(self, fullname, path=None):
"""
Module finder. If we are to handle the module, return ourself so that we
can continue to load the module. Returning None means we don't handle
this module
:param fullname:
:param path:
:return:
"""
# 'ttypes' is the module name where generated code will live in
if fullname in self.module_names or 'ttypes' in fullname:
# save the path so that it could be used later by `load_module`
self.path = path
return self
return None
def repair_code(self, source):
"""
Fix source code error. This code prefixes 'Service' into each
python keyword found in the source code
TODO: use re instead of code replacing
:param source:
:return:
"""
source = source.replace("None = 0", "ServiceNone = 0")
source = source.replace("False = 0", "ServiceFalse = 0")
source = source.replace("True = 1", "ServiceTrue = 0")
return source
def load_module(self, name):
"""
Load the module source code, fix the source and import manually
:param name:
:return:
"""
if name in sys.modules:
return sys.modules[name]
# TODO: use os.path instead of manual path manipulation
module_name = name.split('.')[-1]
module_path = self.path[0] + '/' + module_name + '.py'
# create the module spec object
spec = importlib.util.spec_from_file_location(name, module_path)
# read the source code and modify on-the-fly
source = open(module_path).read()
new_source = self.repair_code(source)
# create the module object based off the module spec
module = importlib.util.module_from_spec(spec)
# compile the source code into a code object where it
# could be imported with `exec` call.
codeobj = compile(new_source, module.__spec__.origin, 'exec')
# module.__dict__ is required for referencing variables in the module
exec(codeobj, module.__dict__)
# put the loaded module into sys.modules so that if the module is imported
# again it could be found.
sys.modules[name] = module
# return the module itself so that it could be used
return module
# sys.meta_path is where we hook up our thrift importer:
# >>> print(sys.meta_path) => [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
# we need to put our importer just before the last importer so that we don't have
# to handle importing built-in modules
sys.meta_path.insert(-1, ThriftImporter('ttypes'))
What I was surprised to find was that there was not an existing class or module to do this already, as you can see that we are not extending our ThriftImporter
from anywhere besides our plain old object
.
Reference: