When I started refactoring EFB Telegram Master Channel (ETM) for 2.0 updates, I was investigating ways to organize code into different files in a decent manner. In this article I’d like to talk about the strategy I used, comparing to another codebase I was reading back then,
In ETM version 1, most of the code is written the heavy and ugly 1675-line-long
__init__.py. As more features planned to be added to ETM, it was really hard for me to navigate through the code, which have brought up my need of refactoring this huge thing.
Back then (which, surprisingly, was over 2 years ago), the main reference I had on a large enough project was
itchat. Their code structure hasn’t been changing much since then.
itchat did have a reasonably large code repository, but the way it splits its functions is rather unideal.
The way itchat did to have all functioned defined at root level of each file, and have a loader function that “loads” these methods to an object called
core which contains some configuration data. To the Python interpreter, this method indeed works, thanks to its dynamic typing. But this looks really bad when you were trying to work with the code, as IDE usually can’t give any hint with objects defined in this way. That also happens when you try to work on the library itself, despite every function starts with a
self in their arguments.
Then I went on looking for other common practices on breaking down a large class, some suggested importing functions inside a function, other using multiple inheritance. [Ref.] The former is not much different from what
itchat was doing, and the latter looked promising at the beginning. I went on to do some experiment with multiple inheritance, and found that it does provide better autocomplete with IDE, but only in the main class. I can’t see one subclass from another one in the IDE. That is still reasonable as all those subclasses only comes together in the main class, they are not aware of each other.
from .components import load_components class Core: def method_1(self, param_1, param_2, param_3): """Doc string goes here.""" raise NotImplementedError() def method_2(self): """Doc string goes here.""" raise NotImplementedError() def method_3(self, param_1): """Doc string goes here.""" raise NotImplementedError() load_components(Core)
from .component_1 import load_component_1 from .component_2 import load_component_2 def load_components(core): load_component_1(core) load_component_2(core)
def load_contact(core): core.method_1 = method_1 core.method_2 = method_2 def method_1(self, param_1, param_2, param_3): # Actual implementation ... def method_2(self): # Actual implementation ...
def load_contact(core): core.method_3 = method_3 def method_3(self, param_1): # Actual implementation ...
I thought to myself, why can’t I just make some more classes and let them reference each other? Turns out that worked pretty well for me. I split my functions into several different “manager” classes, each of which is initialized with a reference to the main class. These classes are instantiated in topological order such that classes being referred to by others are created earlier. In ETM, the classes that are being referred to are usually those data providers utilities, namely
from .flags import ExperimentalFlagsManager from .db import DatabaseManager from .chat_binding import ChatBindingManager class TelegramChannel(): def __init__(self): self.flags: ExperimentalFlagsManager = ExperimentalFlagsManager(self) self.db: DatabaseManager = DatabaseManager(self) self.chat_binding: ChatBindingManager = ChatBindingManager(self)
from typing import TYPE_CHECKING if TYPE_CHECKING: # Avoid cycle import for type checking from . import TelegramChannel class ExperimentalFlagsManager: def __init__(channel: 'TelegramChannel'): self.channel = channel ...
from typing import TYPE_CHECKING from .flags import ExperimentalFlagsManager if TYPE_CHECKING: # Avoid cycle import for type checking from . import TelegramChannel class DatabaseManager: def __init__(channel: 'TelegramChannel'): self.channel: 'TelegramChannel' = channel self.flags: ExperimentalFlagsManager = channel.flags ...
from typing import TYPE_CHECKING from .chat_binding import ChatBindingManager from .db import DatabaseManager if TYPE_CHECKING: # Avoid cycle import for type checking from . import TelegramChannel class ChatBindingManager: def __init__(channel: 'TelegramChannel'): self.channel: 'TelegramChannel' = channel self.flags: ExperimentalFlagsManager = channel.flags self.db: DatabaseManager = channel.db ...
While going on refactoring ETM, I learnt that multiple inheritance in Python is also used in another way – mixins. Mixins are classes that are useful when you want to add a set of features to many other classes. This has enlightened me when I was trying to deal with constantly adding references of the
gettext translator in all manager classes.
I added a mixin called
LocaleMixin that extracts the translator functions (
ngettext) from the main class reference (assuming they are guaranteed to be there), and assign a local property that reflects these methods.
class LocaleMixin: channel: 'TelegramChannel' @property def _(self): return self.channel.gettext @property def ngettext(self): return self.channel.ngettext
When the mixin classes is added to the list of inherited classes, the IDE can properly recognise these helper properties, and their definitions are consolidated in the same place. I find it more organised that the previous style.
In the end, I find that simply creating classes for each component of my code turns out to be the most organised, and IDE-friendly way to breakdown a large class, and mixins are helpful to make references or helper functions available to multiple classes.