Category: Tech

  • NLP を使わず簡単に中国語と日本語を仕分ける方法

    このブログの他の記事と同じ、この記事も自分のオープンソースプロジェクトを作るときに発見したものである。音楽ライブラリーのタイトル、歌詞などのデータに読みがなをつけたいときに、中国語と日本語のテキストを区別する方法が欲しかった。ボクの音楽ライブラリーに中国語、日本語とアルファベット系の言語しかなかった。アルファベット系の言語は大した処理をいらす、簡単にソートすることができるだが、中国語と日本語はそんなに簡単ではなかった、特に漢字に対する処理の仕方が違う。

  • 一种简单粗暴无需 NLP 的区分中文和日文文本的方法

    和博客里其他大多数的文章一样,这篇文章也是来自我平时开发个人项目时候的发现。在处理我的音乐库、歌词和其他数据的标音时,我需要一种简单的方式来区分中文文本和日文文本。因为我的曲库里面基本上只有中文、日文和其他拉丁字母构成的语种。而那些拉丁语种不需要太多复杂的处理就能够直接自然的排序,而中文和日文就没有这么简单,尤其是两种语言在对汉字的处理上有着截然不同的方法的时候。

  • Read and Write Tags of Music Files with FFmpeg

    Read and Write Tags of Music Files with FFmpeg

    In both my previous and recent projects, I have been working with tags (metadata) of music files. One of the reason being I am rather particular about having a nicely organised library with all tag data aligned to the same format. Until recently while I was seeking for a solution to read and write tags […]

  • Translate Text in Sphinx Templates and Configurations

    Translate Text in Sphinx Templates and Configurations

    Weeks ago when I was playing around with the docs of EFB and the Crowdin translation widget, I realized that the default theme for Sphinx — Alabaster isn’t really doing well in term of translation. It seems like the author isn’t really confident on that (or simply didn’t care since 4 years ago). As the […]

  • How to Write Integration Tests for a Telegram Bot

    How to Write Integration Tests for a Telegram Bot

    This is my 6th article on Telegram, the IM platform of my preference. In this article I’m going to introduce about how I wrote the integration tests for my EFB Telegram Master channel — a Telegram interface for EFB, using a userbot-like strategy. To get started, you need to have a bot ready to be […]

  • Awesome Command Line Tools

    Awesome Command Line Tools
  • An alternative way to prevent spammers entering Telegram Groups

    An alternative way to prevent spammers entering Telegram Groups

    Telegram is a growing platform of instant messaging which has gained great popularity in the past few years. With its openness and superior user-friendliness, it has attracted a lot of users, along with spammers.

  • Message delivery issues in EFB Telegram Master channel (comparing to generic IM services)

    Different from how usually an IM would work, EFB Telegram Master channel (ETM) strongly rely on Telegram Bot platform. This had made ETM more difficult to deal with messages failed to deliver. This article is first published on ETM Wiki on 20 April, 2019.

  • Custom sort order in music libraries: macOS and Android

    Custom sort order in music libraries is a rather rare need. Most major languages use phonograms in their scripts, where the natural sort order is more or less identical to what is seen in Unicode (probably after some normalizations). On the other hand, languages using logograms (logosyllabic scripts, mainly Chinese characters in our context) does […]

  • 在 macOS 和 Android 平台实现音乐库中的自定义排序

    歌曲名称、歌手以及专辑的自定义排序顺序常被认为非常罕见的需求。大多数主要语言使用的是表音文字。它们的自然顺序通常与 Unicode 中的排序的大致相同(有些文字可能需要进行规范化处理)。 而在使用表意文字(主要是汉字)的语言中,它们的自然顺序(通常是读音顺序)与 Unicode 中的编码顺序相当不同。这会导致这类语言以 Unicode 编码顺序时会看起来很奇怪,并且很难从中查找。因此,当歌曲库中存在着一种或多种这样的语言时,自定义排序顺序则是一个很有用功能。