Posted On: 2021-03-22
While reading a bit about machine learning best practices*, I was struck by the similarities to best practices in tutorial design. This should, perhaps, be somewhat unsurprising: both are designed to optimize learning new (if not outright alien) interactions, and both liberally repurpose lessons drawn from studying how humans learn. Nonetheless, I found it fascinating - and thought it might be interesting enough to warrant sharing.
Reducing the number of possible actions can significantly improve machine learning speed. By limiting/preventing irrelevant actions, one cuts down on the possibility space for the agent to explore, thereby improving the likelihood of stumbling upon rewards.
In tutorials, this is often accomplished by omitting/disabling actions and interface elements that have not yet been taught. This helps to keep the training focused on just what the user needs to learn, and avoids unnecessary distractions.
If the machine learning agent can't "see" changes to its environment, it won't be able to learn the impact of its actions. Importantly, not just any observations will do; the AI agent must be fed sufficient data to actually solve the problem - otherwise, the best it can ever achieve is a poor approximation of a solution.
In the case of tutorials, users often need feedback about the environment and character states. Features like animations and audio effects expose this state information to the user, giving them a foundation from which to build the rest of their understanding.
While it's often easiest to initialize a model with random values, this often takes much longer to train than when the model is initialized using values from a different (but similar) model. (Of course, the availability of such a model is often a limitation.)
In tutorials, this is often accomplished by using affordances or leaning into genre-savviness. For example, environmental hazards could be depicted as anything, but depicting them as spikes taps into players' existing intuitions about touching spikes (ie. don't do it.)
As an extension of the previous point, a model that has trained on a simpler version of a task will learn much faster than a brand-new model. In particularly complex tasks, new models may never learn. To accommodate this, it's best to start by training the AI using simpler versions of a problem before attempting the full-complexity one (aka. curriculum learning.)
Every effective tutorial introduces simpler concepts before introducing harder ones. As the idiom goes, one has to learn to walk before they can learn to run (or jump, or fight off flying robots, etc.)
This one took me a bit by surprise: I had assumed that using a reward space between -1 and 0 would be indistinguishable from using a space between 0 and 1, but, apparently, that's not always true*. I expect this is a consequence of the particular ML approach used, but as I'm still relatively new to the topic, I haven't yet ascertained exactly why this is.
In tutorials, this often manifests through players getting positive reinforcement (ie. victory screens), with few (if any) punishments (ie. death/lost progress) There are a lot of possible explanations for this, but the one I personally point to is that many tutorials are themselves considered negative experiences*, and therefore have to manage punishments carefully to avoid users leaving altogether.
As you can see, best practices for training AI can serve as an excellent reminder for effective strategies for training humans as well. There are, of course, many best practices that are irrelevant to humans*, but I found it a bit surprising how many of the simpler concepts transferred successfully. Personally, I find it interesting to muse about (for example, would good tutorials be good AI training environments?) Hopefully, you have found it interesting as well.