Reverse engineering an IL2CPP NSO binary: Case study of Mojipittan Encore

This is yet another random side project I was working on recently, and my first attempt to reverse engineer a real world application compiled into binary. In this article, I want to talk about how I reversed engineered an Unity IL2CPP binary compiled to NSO, in a step-by-step fashion.

Forewords

Kotoba no Puzzle: Mojipittan is a word puzzle game series in Japanese, where the player makes words from letter pieces on a board. It sort of like Scrabble, but not exactly the same. I always wanted to give it a try, but it was quite an old game, available only on GBA, PSP, DS, and Wii. That’s until they made an Encore version on Nintendo Switch. This game is also one of the reasons I bought a Switch.

Ever since I bought the game, I was wondering if there is a way to solve the game optimally. The first step to approach this is to first get the dictionary of the game. Despite the game has now released to multiple different platforms, there has no resource online on the word list used in the game. I thus decided to do it myself.

Preparation

To follow this article, you would need an exploitable Nintendo Switch with a purchased copy of Mojipittan Encore and a PC running Windows. Although there are other ways to do it with only a PC, I do not recommended shem for reasons. You might be able to also use other OSes, but a lot of the resources has precompiled binary for Windows, which involves less effort.

To get the game data from the device, you need to dump it from the console. Yuzu has provided a detailed guide on how to dump games and corresponding keys. I will use an XCI dump here as an example.

Extract contents

To extract the dumped XCI file, I used hactool, specifically this wrapped version Unpackv2. Once the tool is downloaded and unzipped, follow the next steps:

  1. Copy the prod.keys extracted form your device to the folder where Unpack.cmd is found, and rename it to keys.txt.
  2. Drop the dumped XCI file onto Unpack.cmd to start the script.
  3. When prompted…
    If your patch was inside XCI, press "1" and ENTER If you don't have a patch, just only press ENTER
    …press Enter. This game exported did not come with a patch.
  4. Now, in the Unpackv2 folder, there will be a new ExtractedXCI folder created with 4 NCA files of various sizes created.
    When prompted…
    Drop here correct NCA patch file (probably the biggest one) from ExtractedXCI folder in
    …, drop the 774MB a0547397496b93fcb08f438bcaad2731.nca to the terminal window.
  5. Now the terminal print a list of files extracted, ending with the prompt…
    Press ENTER to delete all temporary files
    Press Enter twice to finish the export.

After finishing the export process, there will be a new folder created with the extracted content, with two subfolders: exefs and romfs. We will make use of these contents in the next step.

Unpack resource files

When you open the romfs/Data folder, you will be welcomed with some familiar file names, like resource.assets, sharedassets0.assets, level0, and level1. Yes, if you have ever made or opened the directory of a Unity game, you will surely recognize these file names. Unity organizes their asset files in a pretty recognizable pattern, and is well studied by the community with multiple tools created.

At this point, you are free to extract all static assets files found in the game, like text files and textures. The tool I used is Unity Asset Bundle Extractor (UABE). To extract assets, open the .assets files with the tool, and export the assets using the built-in plugins to easily processable formats.

Besides the folder mentioned above, there is a subfolder StreamingData/Switch/datas, which contains assets that are loaded after the game has initialized. Here, the files we are interested are the dictionary files under romfs/​Data/​StreamingData/​Switch/​datas/​dictionary. Open it with UABE, we can see three text assets: worddata.aid, worddata.cot, and worddata.dic. Extract them with the txt export plugin, we can get three binary files with some sort of patterns.

$ xxd worddata.aid | head -n 20
00000000: 5744 5000 0002 1ca1 0000 0001 0001 c1e6 WDP.............
00000010: 0000 0002 0001 ffee 0001 5eb2 0001 5eb3 ..........^...^.
00000020: 0000 0003 0001 ffef 0001 fff0 0000 0004 ................
00000030: 0000 0005 0000 0006 0001 5d48 0000 0007 ..........]H....
00000040: 0000 0008 0000 0009 0000 000a 0000 000b ................
00000050: 0001 92b4 0001 92b7 0001 92b3 0001 92b6 ................
00000060: 0001 92b5 0000 000c 0001 5a5f 0001 fff1 ..........Z_....
00000070: 0000 000d 0000 000e 0000 000f 0000 0010 ................
00000080: 0000 0011 0001 9019 0000 0012 0001 fff2 ................
00000090: 0000 0013 0001 fff3 0000 0014 0000 0015 ................
000000a0: 0000 0016 0000 0017 0000 0018 0000 0019 ................
000000b0: 0000 001a 0000 001b 0000 001c 0000 001d ................
000000c0: 0000 001e 0000 001f 0000 0020 0000 0021 ........... ...!
000000d0: 0000 0022 0000 0023 0000 0024 0000 0025 ..."...#...$...%
000000e0: 0000 0026 0000 0027 0001 fff4 0000 0028 ...&...'.......(
000000f0: 0000 0029 0000 002a 0000 002b 0000 002c ...)...*...+...,
00000100: 0000 002d 0000 002e 0002 1191 0002 1838 ...-...........8
00000110: 0001 5f43 0000 0030 0001 fff6 0000 0031 .._C...0.......1
00000120: 0000 0032 0000 0033 0000 0034 0000 0035 ...2...3...4...5
00000130: 0001 6782 0000 0036 0001 6783 0001 6785 ..g....6..g...g.

The files start with a WDP\0 header (which was not found elsewhere on the internet), and a bunch of 0000 bytes spread across the odd columns, but we can’t really interpret the meaning of these data by just staring the files. We definitely need the help of the code logic of the game.

Preparing for decompilation

As it is commonly known, most logic of Unity games are written in C♯. When compiled to DLL files, C♯ code is rather easy to decompile. However, in environments where .NET runtime is hard to prepare, or where performance is critical, Unity offers an option called Intermediate Language to C++ (IL2CPP) that further compiles Microsoft Intermediate Language (MSIL) into C++ and further into native code. This technique is commonly seen on Unity games running on mobile platforms. Nintendo Switch is of no exception.

Nintendo Switch runs on a special binary format called NSO, which is a custom variant of AArch64 ELF binary. To save space, a lot of NSO files are by default compressed. We need to first decompress it with hactool.

hactool --uncompressed=exefs/main_unc exefs/main

With the uncompressed binary, we can then use IL2CPPdumper to extract the offset and signature of each function in the binary.

Il2CppDumper exefs/main romfs/Data/Managed/Metadata/global-metadata.dat il2cppdump

In the new il2cppdump folder, you can find the JSON file script.json and a C++ header file il2cpp.h with all the metadata, which we will use later to locate the function code during the actual decompilation.

Browsing the script.json file, we can find some interesting methods that might help us to decode the dictionary file:

  • void CDictionary__GetDictionaryData (CDictionary_o* __this, System_String_o** strReading, System_String_o** strNotation, System_String_o** strMeaning, int32_t nWordId, const MethodInfo* method);
  • int32_t CDictionary___ConvertKey2String (CDictionary_o* __this, System_String_o** strOut, System_UInt32_array* apKey, uint32_t nLongFlag, const MethodInfo* method);

Fortunately, not only the class and method names, even the parameter names are kept, which will help us a lot figuring out the code logic.

To actually decompile the file, we will use Ghidra, an open-source reverse engineering tool that works on multiple platforms. However, Ghidra does not support NSO and IL2CPP binary out of the box, so we need some install something more to help us, namely:

  • Ghidra Switch Loader, which can be installed by going to File -> Install Extensions… in Ghidra and click the + button at the corner.
  • ghidra.py from IL2CPPdumper, which can be installed by copying the file to the %USERPROFILE%/ghidra_scripts folder.

Decompile the binary

Finally, we can proceed to decompile the binary. To make full use of the metadata we extracted earlier, there are a few steps we need do before starting to read the source code.

When the main_unc bianry is first loaded into a Ghidra project, it will prompt you to start an automatic analysis. Since the binary contains about 47MB worth of data, it might take a considerable amount of time to conduct the analysis, and we are only interested in a small portion of the code. I thus chose to skip the analysis.

The first step is to import the data types defined in the header file into Ghidra. Since the generated il2cpp.h contains some data types that it does not recognize natively, we need to prepend these lines to it.


typedef unsigned __int8 uint8_t;
typedef unsigned __int16 uint16_t;
typedef unsigned __int32 uint32_t;
typedef unsigned __int64 uint64_t;
typedef __int8 int8_t;
typedef __int16 int16_t;
typedef __int32 int32_t;
typedef __int64 int64_t;
typedef __int64 size_t;
typedef size_t intptr_t;
typedef size_t uintptr_t;

With the modified header file, we can then return to Ghidra, open File -> Parse C Source… to import it. When the Parse C Source dialog is opened, clear everything in the Source files to parse and Parse options section, then add the header file we prepared. Finally, click Parse to Program to start.

Next, we need to label the functions at their respective offsets. Open the Script Manager from Windows -> Script Manager, search for ghidra.py, then click the green play  ⃝▶ button to run the script. When prompted for files, select the script.json file exported from IL2CPPdumper.

Once the script is finished, we will see there will be all the functions imported in the Symbol Tree panel in the sidebar. In the Filter box of at the bottom of the section, we can enter CDictionary to find all the dictionary related method.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *