Posted On: 2020-09-28
Recently, I've had the opportunity to be a (so-called) fly on the wall during a disagreement about how to adjust the balance of a particular game. At the heart of the disagreement was something of a false dichotomy: each side advocated for using their own approach by pointing out flaws in other approaches. As an outsider to the discourse, this seemed quite odd*, as the different approaches were not mutually exclusive. In fact, much of what I've read from the best designers has involved a mix of multiple different approaches, using each one to shore up the weaknesses of the others. Thus, I thought today would be a good time to write about three different approaches, explaining their strengths, weaknesses, and how they can be mixed together to improve design beyond what each individually has to offer.
For the purposes of this post, I am going to deviate slightly from how game balance is popularly conceived. Ordinarily, a game being "balanced" presupposes competition between players: the metaphor of being "balanced" conjures images of placing one player (or team) on one side of a scale and the opposing player (or team) on the other. Yet game balance is applicable far beyond competitive multiplayer: the designers of the single-player, deck-building roguelike Slay the Spire, for example, needed to carefully balance cards, not to keep one player from getting an advantage over another, but rather to keep the strategic choice between cards interesting. Thus, to give a much more more broadly applicable definition of game balance: a game is balanced* when it creates the kind of experience the designers intend for the players to have**. This, is, of course, the same definition I use for a game that is "well designed" - which is no real surprise - as the three methods for achieving game balance are ones that are used for all kinds of game design tasks.
Simple to explain, but difficult to execute, the first method for balancing a game is to meticulously balance it by hand. In this approach, a designer makes choices about balance by reasoning about various game systems and their interactions with each other. Importantly, this is not merely designing by gut, but rather it typically involves using mathematical models and patterns to understand how the game works. A simple example of this is "costing" creatures in the collectible card game, Magic: the Gathering: when designing a new creature, the designers have a baseline for how much mana (the primary in-game resource) that creature will cost because they have developed a mathematical model for what creatures should cost, based on their properties (strength, toughness, rarity, color, etc.) MtG designers may then nudge that new creature's cost (or other numerical properties) higher or lower slightly, as they also have a mathematical model describing what the distribution of cards across an entire set should look like*.
The primary benefit of balancing by hand is that any balance changes are made in the context of a deep understanding of the systems involved. Designers are as close as possible to the design, so the capabilities and effectiveness of this approach are exclusively bound by the designers' abilities - which should just as clearly reveal how there can be downsides.
The two primary downsides to designing by hand are how labor intensive it is and that it is prone to holes. The former can perhaps be best explained by example: when balancing a card game, one has to consider not just how strong any two cards are in comparison to one another, but how that comparison changes in a variety of different game states (such as early game vs late, or different combinations of cards currently in play.) Software tools can help alleviate some of this burden, but, generally, the deeper and more complex a game is, the more labor intensive even small balance changes become.
Being prone to holes is something that is something that is present with just about any manual design activity: no matter how much a designer thinks through their design, players will inevitably surprise them. Whether it's finding a way to jump out of bounds in a platformer or discovering a multi-card interaction that is ambiguous/undefined according to the game's written rules, with enough players on a game, it's inevitable that someone will find something that the designers never thought of. Which brings us nicely to the second approach to designing for balance: player feedback.
Having others play and provide feedback provides a world of value - not only in discovering the holes that were missed by designers, but also generally getting a fresh perspective on how to think about the game's systems and interactions. Feedback of this kind can come from internal playtesters (such as fellow developers/studio members) or more generally, but there are some significant downsides to be aware of, especially when gathering balance feedback from non-designers.
The primary benefit of player feedback is the ability to draw on a variety of perspectives and actual play experience. Players' firsthand experiences, and any observations or thoughts resulting from that, are an invaluable wealth of knowledge about how the systems actually work. Additionally, they are valuable windows into the perception of the game's systems - which design elements are perceived as balance issues versus which are accepted as merely part of the scenery.
The two primary downsides of using feedback to make balance changes are bias and miscommunication. Bias can come in a number of forms, such as relying on a particular play experience or having a predisposition to a certain playstyle/way of thinking. Regardless of its source, it is fundamentally grounded in the simple idea that, as a player, one will see the experiences and information from that perspective. When players are aware of their bias (or even the possibility of such), they can often help to mitigate its influence, such as by self-reporting which playstyles they prefer or how skilled/experienced they think they are with the game. Designers can then use said self-reported biases to help be better informed about which concerns affect which players*.
Miscommunication is virtually inevitable in any collaborative effort, but when players/playtesters are providing feedback, it can be amplified significantly. When providing feedback, players will filter what happened through their own perspective and report on issues that they identified or observed. Players that are less skillful at this may provide over-broad or unhelpful feedback (such as complaining that one character is OP*). Yet, even when players are eloquent and clear, they may still misdiagnose the source of the issue, thereby only communicating what the perceived issue was, rather than all the necessary underlying information that is required to understand the actual cause.
A third approach to balancing a game is to use data: specifically, aggregated analytics data from a large pool of players. When appropriately filtered, sorted and analyzed, player data can tell enormous amounts of information, with detail far exceeding what can be achieved from feedback. Since it is aggregated from all players*, designers can gain design insights not just from their vocal and active players, but from everyone. Finally, individual differences between players (such as skill level, playstyle, etc.) can either be aggregated away or filtered on, depending on the needs of the particular effort.
At the risk of repeating myself, the benefit of using analytics data is, specifically, the vast pool of data that is available. No other approach can provide anywhere near as much information. This benefit, of course, is dependent upon the quality and detail of the analytics one has coded into the game, as well as the size of the player base, but, assuming both those are solid, it will provide more information than any other approach.
When using analytics data, there are two primary downsides: dirty data and misattribution. The concept of data being "dirty" is nothing new - it's something software developers and analysts from all domains have to deal with. Essentially, the idea of data as "dirty" is the idea that you cannot be confident that the data you are looking at is accurate or complete. There exists a whole host of both technical and non-technical reasons data might be dirty*, but regardless of the cause, the more dirty data is in the set, the less confident one should be about the accuracy of any particular interpretation of the data. Additionally, while one can somewhat mitigate against this by filtering out/omitting known dirty data, it is generally impossible to reliably omit all dirty data from a data set**.
The other downside, misattribution, has more to do with deriving value from data. Fundamentally, data is inert - it cannot provide insights or guidance on its own. It is only when people interpret the data that it becomes valuable - but in so doing we open ourselves to incorrect interpretations that provide inaccurate or even harmful guidance. Guidance derived from data is no more impartial or unbiased than guidance derived from a designer's own thoughts - but the presence of a monumental amount of data, just waiting to become evidence, can make it easier to shore up even the shakiest of conclusions.
So, each approach has strengths, but also serious flaws. How does one approach balancing a game then? Well, the answer is to use them all, relying on each one's strengths to cover each others' weaknesses. Designers can do their best and hand it off to playtesters, who will catch things the designers missed and provide valuable feedback. That feedback can, when paired with the designers' own insights, be turned into a hypothesis, which can be confirmed/denied by looking at data - either in aggregate, or by digging into the data for individual play sessions. Once designers have a plausible hypothesis, they can compare that with their own models, adjusting, tweaking, and playtesting, until they've filled in holes and fixed inaccuracies in their own models.
This has been a whirlwind tour of three different options for balancing games, as well as how using them together can shore up their respective weaknesses. While one could go much deeper on just about every topic in here - I hope this surface exploration has, nonetheless, been interesting and valuable for you.