When I started refactoring EFB Telegram Master Channel (ETM) for 2.0 updates, I was investigating ways to organize code into different files in a decent manner. In this article I’d like to talk about the strategy I used, comparing to another codebase I was reading back then, itchat
.
In ETM version 1, most of the code is written the heavy and ugly 1675-line-long __init__.py
. As more features planned to be added to ETM, it was really hard for me to navigate through the code, which have brought up my need of refactoring this huge thing.
Back then (which, surprisingly, was over 2 years ago), the main reference I had on a large enough project was itchat
. Their code structure hasn’t been changing much since then. itchat
did have a reasonably large code repository, but the way it splits its functions is rather unideal.
The way itchat did to have all functioned defined at root level of each file, and have a loader function that “loads” these methods to an object called core
which contains some configuration data. To the Python interpreter, this method indeed works, thanks to its dynamic typing. But this looks really bad when you were trying to work with the code, as IDE usually can’t give any hint with objects defined in this way. That also happens when you try to work on the library itself, despite every function starts with a self
in their arguments.
Then I went on looking for other common practices on breaking down a large class, some suggested importing functions inside a function, other using multiple inheritance. [Ref.] The former is not much different from what itchat
was doing, and the latter looked promising at the beginning. I went on to do some experiment with multiple inheritance, and found that it does provide better autocomplete with IDE, but only in the main class. I can’t see one subclass from another one in the IDE. That is still reasonable as all those subclasses only comes together in the main class, they are not aware of each other.
from .components import load_components
class Core:
def method_1(self, param_1, param_2, param_3):
"""Doc string goes here."""
raise NotImplementedError()
def method_2(self):
"""Doc string goes here."""
raise NotImplementedError()
def method_3(self, param_1):
"""Doc string goes here."""
raise NotImplementedError()
load_components(Core)
from .component_1 import load_component_1
from .component_2 import load_component_2
def load_components(core):
load_component_1(core)
load_component_2(core)
def load_contact(core):
core.method_1 = method_1
core.method_2 = method_2
def method_1(self, param_1, param_2, param_3):
# Actual implementation
...
def method_2(self):
# Actual implementation
...
def load_contact(core):
core.method_3 = method_3
def method_3(self, param_1):
# Actual implementation
...
I thought to myself, why can’t I just make some more classes and let them reference each other? Turns out that worked pretty well for me. I split my functions into several different “manager” classes, each of which is initialized with a reference to the main class. These classes are instantiated in topological order such that classes being referred to by others are created earlier. In ETM, the classes that are being referred to are usually those data providers utilities, namely ExperimentalFlagsManager
, DatabaseManager
, and TelegramBotManager
.
from .flags import ExperimentalFlagsManager
from .db import DatabaseManager
from .chat_binding import ChatBindingManager
class TelegramChannel():
def __init__(self):
self.flags: ExperimentalFlagsManager = ExperimentalFlagsManager(self)
self.db: DatabaseManager = DatabaseManager(self)
self.chat_binding: ChatBindingManager = ChatBindingManager(self)
from typing import TYPE_CHECKING
if TYPE_CHECKING:
# Avoid cycle import for type checking
from . import TelegramChannel
class ExperimentalFlagsManager:
def __init__(channel: 'TelegramChannel'):
self.channel = channel
...
from typing import TYPE_CHECKING
from .flags import ExperimentalFlagsManager
if TYPE_CHECKING:
# Avoid cycle import for type checking
from . import TelegramChannel
class DatabaseManager:
def __init__(channel: 'TelegramChannel'):
self.channel: 'TelegramChannel' = channel
self.flags: ExperimentalFlagsManager = channel.flags
...
from typing import TYPE_CHECKING
from .chat_binding import ChatBindingManager
from .db import DatabaseManager
if TYPE_CHECKING:
# Avoid cycle import for type checking
from . import TelegramChannel
class ChatBindingManager:
def __init__(channel: 'TelegramChannel'):
self.channel: 'TelegramChannel' = channel
self.flags: ExperimentalFlagsManager = channel.flags
self.db: DatabaseManager = channel.db
...
While going on refactoring ETM, I learnt that multiple inheritance in Python is also used in another way – mixins. Mixins are classes that are useful when you want to add a set of features to many other classes. This has enlightened me when I was trying to deal with constantly adding references of the gettext
translator in all manager classes.
I added a mixin called LocaleMixin
that extracts the translator functions (gettext
and ngettext
) from the main class reference (assuming they are guaranteed to be there), and assign a local property that reflects these methods.
class LocaleMixin:
channel: 'TelegramChannel'
@property
def _(self):
return self.channel.gettext
@property
def ngettext(self):
return self.channel.ngettext
When the mixin classes is added to the list of inherited classes, the IDE can properly recognise these helper properties, and their definitions are consolidated in the same place. I find it more organised that the previous style.
In the end, I find that simply creating classes for each component of my code turns out to be the most organised, and IDE-friendly way to breakdown a large class, and mixins are helpful to make references or helper functions available to multiple classes.
Leave a Reply