Splitting a Large Class and Multiple Inheritance in Python

When I started refactoring EFB Telegram Master Channel (ETM) for 2.0 updates, I was investigating ways to organize code into different files in a decent manner. In this article I’d like to talk about the strategy I used, comparing to another codebase I was reading back then, itchat.

In ETM version 1, most of the code is written the heavy and ugly 1675-line-long __init__.py. As more features planned to be added to ETM, it was really hard for me to navigate through the code, which have brought up my need of refactoring this huge thing.

Back then (which, surprisingly, was over 2 years ago), the main reference I had on a large enough project was itchat. Their code structure hasn’t been changing much since then. itchat did have a reasonably large code repository, but the way it splits its functions is rather unideal.

The way itchat did to have all functioned defined at root level of each file, and have a loader function that “loads” these methods to an object called core which contains some configuration data. To the Python interpreter, this method indeed works, thanks to its dynamic typing. But this looks really bad when you were trying to work with the code, as IDE usually can’t give any hint with objects defined in this way. That also happens when you try to work on the library itself, despite every function starts with a self in their arguments.

Then I went on looking for other common practices on breaking down a large class, some suggested importing functions inside a function, other using multiple inheritance. ^[Ref.] The former is not much different from what itchat was doing, and the latter looked promising at the beginning. I went on to do some experiment with multiple inheritance, and found that it does provide better autocomplete with IDE, but only in the main class. I can’t see one subclass from another one in the IDE. That is still reasonable as all those subclasses only comes together in the main class, they are not aware of each other.

core.pycomponents/__init__.pycomponents/component_1.pycomponents/component_2.py

from .components import load_components


class Core:
    def method_1(self, param_1, param_2, param_3):
        """Doc string goes here."""
        raise NotImplementedError()

    def method_2(self):
        """Doc string goes here."""
        raise NotImplementedError()

    def method_3(self, param_1):
        """Doc string goes here."""
        raise NotImplementedError()


load_components(Core)

from .component_1 import load_component_1
from .component_2 import load_component_2


def load_components(core):
    load_component_1(core)
    load_component_2(core)

def load_contact(core):
    core.method_1 = method_1
    core.method_2 = method_2


def method_1(self, param_1, param_2, param_3):
    # Actual implementation
    ...


def method_2(self):
    # Actual implementation
    ...

def load_contact(core):
    core.method_3 = method_3


def method_3(self, param_1):
    # Actual implementation
    ...

I thought to myself, why can’t I just make some more classes and let them reference each other? Turns out that worked pretty well for me. I split my functions into several different “manager” classes, each of which is initialized with a reference to the main class. These classes are instantiated in topological order such that classes being referred to by others are created earlier. In ETM, the classes that are being referred to are usually those data providers utilities, namely ExperimentalFlagsManager, DatabaseManager, and TelegramBotManager.

__init__.pyflags.pydb.pychat_binding.py

from .flags import ExperimentalFlagsManager
from .db import DatabaseManager
from .chat_binding import ChatBindingManager

class TelegramChannel():
    def __init__(self):
        self.flags: ExperimentalFlagsManager = ExperimentalFlagsManager(self)
        self.db: DatabaseManager = DatabaseManager(self)
        self.chat_binding: ChatBindingManager = ChatBindingManager(self)

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    # Avoid cycle import for type checking
    from . import TelegramChannel


class ExperimentalFlagsManager:
    def __init__(channel: 'TelegramChannel'):
        self.channel = channel
        ...

from typing import TYPE_CHECKING
from .flags import ExperimentalFlagsManager

if TYPE_CHECKING:
    # Avoid cycle import for type checking
    from . import TelegramChannel


class DatabaseManager:
    def __init__(channel: 'TelegramChannel'):
        self.channel: 'TelegramChannel' = channel
        self.flags: ExperimentalFlagsManager = channel.flags
        ...

from typing import TYPE_CHECKING
from .chat_binding import ChatBindingManager
from .db import DatabaseManager

if TYPE_CHECKING:
    # Avoid cycle import for type checking
    from . import TelegramChannel


class ChatBindingManager:
    def __init__(channel: 'TelegramChannel'):
        self.channel: 'TelegramChannel' = channel
        self.flags: ExperimentalFlagsManager = channel.flags
        self.db: DatabaseManager = channel.db
        ...

While going on refactoring ETM, I learnt that multiple inheritance in Python is also used in another way – mixins. Mixins are classes that are useful when you want to add a set of features to many other classes. This has enlightened me when I was trying to deal with constantly adding references of the gettext translator in all manager classes.

I added a mixin called LocaleMixin that extracts the translator functions (gettext and ngettext) from the main class reference (assuming they are guaranteed to be there), and assign a local property that reflects these methods.

class LocaleMixin:
    channel: 'TelegramChannel'

    @property
    def _(self):
        return self.channel.gettext

    @property
    def ngettext(self):
        return self.channel.ngettext

When the mixin classes is added to the list of inherited classes, the IDE can properly recognise these helper properties, and their definitions are consolidated in the same place. I find it more organised that the previous style.

In the end, I find that simply creating classes for each component of my code turns out to be the most organised, and IDE-friendly way to breakdown a large class, and mixins are helpful to make references or helper functions available to multiple classes.

Splitting a Large Class and Multiple Inheritance in Python

Comments

Leave a Reply Cancel reply