2024-08 | pkmn

It’s somewhat nice to have been procrastinating writing this month’s post not because of a dearth of updates as has previously been the case but instead because there’s been too much progress to properly cover and too many more interesting things I’d rather be working on.

Last June I got burnt out working on the -Dchance and -Dcalc features of the engine, specifically in the process of attempting to complete what I have been calling the transitions function. basket helped originally motivate this work, but after I understood the use case I immediately recognized it could potentially unlock the “holy grail” I’d been chasing for a while – being able to use one codebase to solve multiple different problems. My work on damage calculators helped shape my thesis that correct and complete Pokémon tooling necessarily requires a full blown engine with a complete mechanics implementation, but different use cases (simiulators, AI, damage calculators) need to make different tradeoffs which makes it difficult (or perhaps undesirable) to attempt to optimally support them with a one-size-fits-all solution.

My original attempts at adding the features to the engine to support this functionality were fraught with problems – I didn’t properly understand the full complexity and nuances and kept trying to bolt-on edge cases or make small adaptations to my original designs which didn’t pan out but which involved a lot of churn and frustration. Furthermore, I was trying to solve the whole problem at once without making sure the abstractions I was attempting to build on were solid and correct, which meant that it was often unclear if my entire approach was wrong or simply half-baked and poorly realized. In the end, I abandoned the work, thoroughly discouraged from pretty much all coding (I spend the latter half of the year making some progress on Generation II support in the engine and then building this site and pkmn.ai instead).

Having already been burned, I was very afraid of returning to the problem – continually throwing away thousands of lines of code after every attempt in the process of developing an understanding for the problem space is fairly demoralizing and is also why @pkmn/logs and @pkmn/img have been similarly placed on the back burner (though in those cases I am fairly confident that I do know how to solve the problem at this point, it’s just a matter of writing the code). However, in the case of transitions and friends (“durations”, “pending”, etc), I was also confident that to figure out a solution I would need to be willing to let the problem fully consume me as there is too much state and complexity to be able to work on it in a casual manner over the course of several months. This was rather unappealing to me as “no-lifing” Pokémon development is a rather at odds with many other life goals.

Fortunately(?), developments in my personal life made it so that distracting myself for a month by focusing all my thoughts and energy on a hard problem actually proved to be desirable, and as a result I returned to this work and happily made a ton of progress:

After updating the codebase so that it could build with the latest Zig compiler again, I started with seemingly unrelated work: implementing simple diff and patch functions to the engine as an example of external unmake move support. It’s unclear whether these will be practical – 128 bytes is still quite a lot to need to store to be able to reverse an update, and coupled with the 64 bytes required by Actions to minimally encode the forward pass you’re looking at only 50% savings compared to simply storing the entire Battle. However, they were easy to implement (modulo hairy Zig casting…), having more tools in the toolbox never hurts, and its an open problem whether it’s actually possible to do better without cutting corners on correctness/completeness in Generation I.
In order to be able to test this new functionality, all of the tests were modified to go through a new update function that not only exercises and validates diff and patch but also wires in assertions around -Dchance and -Dcalc. This ended up being invaluable as every unit test and fuzz test now could verify invariants, meaning things like the total probability of all transitions summing to 1 or Actions overrides always being able to produce the same state get enforced by pretty much every test run on the engine.
While originally working on transitions, I experienced a lot of difficulty debugging the implementation due to the fact that the tests iterate over tens of thousands of similar states (which also renders printf or using a debugger pretty ineffective as they would get tripped thousands of times before anything of interest occurred). To combat this, I invested heavily in improving the logging output to enable actually pinpointing problems at a high level and then created a harness for easily digging into specific problematic cases one state at a time. While this was a lot of work, in general it has proven to be the case that time spent improving system debuggability has always paid off, and the rest of the month’s work would have proven impossible without these enhancements.
Armed with fuzz test coverage and the ability to properly debug failures, I then set to work resolving the known issues in the “pending” chance update optimization. The “pending” design was motivated by the fact that the cartridge performs critical hit and damage rolls before determining if a move hits, meaning the correct probability of a specific update being observed can only be computed after the fact. Calling the “pending” architecture an optimization is somewhat incorrect, as it ends up being essential to avoid misleading results – the chance of an update should be the same with and without -Dshowdown enabled, unless Pokémon Showdown behavior deviates in an observable way. The pkmn engine saves certain rolls as pending and applies them when it knows whether they’re relevant or not, but not commiting the pending results at the correct times could produce inaccurate results and scenarios where the total probability of all transitions doesn’t sum to 1.
I then implemented damage roll coalition, which in the end was signficantly easier than attempts last year would have suggested – the debuggability improvements (and building on a sound “pending” design) made a big difference here. Coupled with “pending”, it may seem silly to work on optimizations before having a fully baked solution, however, the viability of the higher-level design critically depends on these optimizations being possible in the first place – I am uninterested in spending the effort to get a correct solution working if it ultimately proves infeasible to optimize it enough to be practically useful.
…however, I do still like to work on completely pointless premature optimizations as well, so I spent a bunch of time optimizing the GCD algorithm that is at the heart of the engine’s Rational implementation. In this case the optimizations are more for fun than for actual performance – the engine’s Rational implementation avoids needing to reduce as much as possible meaning that hopefully the GCD algorithm rarely gets called. Note that this work mostly involved implementing the clever ideas had by others, but its still fun to see that there are still improvements being found for basic mathematical functions!

With the transitions function finished for any scenario not involving durations, I started to get excited about the potential applications. I spent some time thinking about how it will be used to implement a robust damage calculator which led me to flesh out my draft of a forthcoming article on “Damage Calculators from First Principles”. The current assumptions made by damage calculators are rather specious, though clearly damage calculators are thought to provide value in spite of this. I feel that in a lot of cases people are trying to use a damage calculator as an incredibly abstracted solver… which is why I then spent a lot of time exploring using transitions to build a solver for Generation I perfect-info 1v1 endgames.

basket has done a ton of work here, though unfortunately I need to redo a lot of it myself to be able to understand and appreciate it. I spent about a week taking a deep dive into linear programming and Nash equilibrium solvers, especially getting my hands dirty in the guts of lrslib. I think I ultimately will need to write something custom here, partly due to licensing issues but also because I feel there are potential gains to be had by optimizing for the specific domain – the Pokémon use case involves solving zero sum games on exclusively small dimension matrices and the AI use case can often afford to trade accuracy for speed (an ε-Nash equilibirum is sufficient for small enough ε).

I was eventually motivated enough to return to finish transitions completely, and resumed working on the final hurdle – “durations”:

The core challenge of durations (e.g., confusion or sleep turns) is that their actual duration is fixed from the start but hidden to the players. In most cases, each duration is equally likely from the perspective of the engine, but from the player’s perspectives the probability of having observed an additional turn is what’s interesting. However, sometimes these observations depend on perspective – a player’s opponent is unaware of their Pokémon’s Thrash ending or that their binding move will always last 5 turns due to Grip Claw. In the same way the engine side steps perspective entirely in the choices or -Dlog functionality, I found the only feasible option here is to also eschew tracking perspective and force the client to back out any revealed information from the omniscient perspective results here as well (though of course, it took several days worth of frustrating attempts to come to this conclusion).
I was able to get duration overrides working (the ability to extend or end a duration on the fly as opposed to based on the canonical counter within the Battle state), though had to spend a lot of time thinking about what invariants should hold. The Actions struct in the pkmn engine must be possible to efficiently use as both a unique identifier of an update and to reproduce the same observed outcome when applied to the original state via -Dcalc support – threading the needle to ensure both of these requirements held proved quite challenging.
Zig doesn’t support arrays in packed structs, so I had to get creative and implement my own array abstraction on top of an integer as a workaround to efficiently track sleep durations for all party members. This abstraction will prove even more crucial in Generation II where each hit of multi-hit moves features their own critical hit and damage roll.

Ultimately, I wasn’t able to fully finish implementing durations in August (the transitions function still needs to be able to iterate over all possible duration outcomes), as I pivoted to working on usage statistics instead after seeing activity in the smogon/usage-stats repository. pkmn/stats has been sitting at effective feature parity with Smogon’s Python scripts for years but I never made a push to get it used upstream as I was hoping to wait to be able to deliver the 100x gains @pkmn/logs can theoretically unlock. However, the features, maintainability gains, and modest speed improvements that even naive usage of the existing pkmn stats code can provide are valuable to realize independent of any further improvements, so I will be working to onboard Smogon as soon as possible. Tbolt has made some great contributions here, and hopefully the move to a modern codebase will encourage more future contributors to help push the community’s stats processing forward after a long period of relative stagnation.

Finally, I attended a local Zig meetup which both motivated me to work towards eventually being able to give a talk about the pkmn engine and its usage of Zig (which requires at least shipping something first!), and to add back support for Zig v0.11.0. I was concerned about the new features I’ve been adding to the engine impacting performance, but thankfully that’s not the case – the Zig compiler has just gotten a lot worse at optimizing it. It’s going to be a challenge to maintain support for newer versions of Zig given its volatility and the potential for breaking language changes, but ultimately a key reason the engine is written in Zig is the speed of the compiled code and taking a 20% hit is a non-starter.

— pre