This is what it’s all been building towards. I now have a script that reads off the contents of a PAR file and dumps all the data to a series of CSV files. You can find it over on my git repo if you’re interested, as I know a few people have been. Special thanks to Alexander O., CYDI-TST, Guardian, __jmp32, Noctis, and Skeltek, variously from the InsideEARTH and United Lunar Dynasty discords, for their help in figuring out the last few details, especially to __jmp32 and Skeltek for showing me where to find the source file so I could get the field names.

The next step after this is writing another script to reverse this process and and finish the documentation I started in part 3, but today I’m going to talk about how the script works.

The essence of it is much the same as my early, hacky script. It reads the 16 byte file header and extracts the number of entity groups, reads each group in turn by the structure I talked about in part 1. Then, after all those are done, it reads the number of research definitions, then the definitions themselves. I still don’t know what the entire file header does or what those two trailing integers are for, but it turns out you don’t need them to read the file. Probably going to come back to bite me when I recompile the file, but that’s a problem for future Terrana.

If I were doing this properly, I’d probably stream this stuff from file to file, but since the PAR file is less than a megabyte, I’m just loading the whole thing into memory. Modern computers have gigabytes of the stuff.

What happens next is where I apply all that I’ve learned about PAR files over the last couple of weeks. The entities are categorized based on their type number and their class ID (if they have one), then burped out into an appropriately named CSV file, which has been helpfully pre-populated with the field names in the header. Then it finishes. Whole process takes less than a second.

A couple of types presented a problem for categorization. Type 8, sound packs, has three different structures depending on what kind of sound pack it is, but doesn’t have a class ID. In the end, I split it up based on the entity names: groups where the first entity starts with TALK_ are unit speech packs, ones with PLAYERTALK_ are interface speech files, everything else is a standard sound pack. It’s a hack, but I haven’t worked out how to do it properly yet.

Type 10 is even worse. It’s a jumbled mishmash of parameters of various kinds with a different structure for every. Single. Group. I honestly didn’t even try to deal with those, I just dumped them out to parameters.csv and hoped they’d go away.

Once I’ve got the compiler working and the documentation done, then I’ll start work on making these actually good pieces of software. Might start over in another language that I can actually compile or just harden these a bit so they handle error conditions.

Posts in this series: