You just need to enumerate for all possible states, the possible transitions and evaluate the probabilities for them. How many could there possibly be? Store in a small database and look em up on the fly. Easy peasy.
I know you’re joking, but it wouldn’t really be that hard in terms of a lookup for played positions which is what OP asked for - you could have Bitboards for the relevant info so finding a state would be a refined search starting with the busy/occupied Bitboard. Since you’re only storing games that have been played rather than all positions, the size of the index shouldn’t be outrageously large compared with other things already being stored.
I don't know any good, fast heuristics for predicting this (likelihood of a human choosing a specific chess move). Do you? Chess engine evaluation is computationally quite heavy.
You don’t need to make perfect predictions. There could actually be a lookup for a few common board states (say a couple million, to cover the first couple moves). Beyond that just use a simple scoring function, for example the value of your vs opponent pieces, whether any of your pieces (notably king) are in danger, whether u won, etc. This means u get better scores for the more likely winning moves, for capturing valuable pieces and for avoiding the loss of your own.
U could also play two or so turns of minimax, or perhaps use a neural network to evaluate the various reachable board states.
So for a given state, enumerate possible transitions, score the resulting states, and map to some sort of probability distribution, then use some prefix code (think Huffman tree[+]) based on the distribution of probabilities to encode the transition.
It’s perhaps not super fast, and not super accurate but if you can weed out the 50% of dumb moves, that already saves a bit.
[+] an easier and better approach is to map the probabilities into an interval between 0 and 1, and keep using fractional bits to „zoom in“ until one of the subintervals is uniquely defined, then recurse on that. Some of the common compression algorithms uses that (but I don’t remember the specifics, those intro courses were a long time ago).