Using the Infinite Bookspace to Reason About Language Models

Let's first start with a game. I'll go into your local library and pick a random book off of the shelf. Without looking, you have to guess what the first letter of the first word of the book will be. What is your guess?

T and A and O seem like good guesses to me, because lots of common words start with those letters. But there is actually a way to figure out what the absolute correct guess should be. Open up every book in your local library, count up all of the first letters of the first words, and see what letter appears the most often.

This approach gives us a distribution of letters:

The distribution here will be similar no matter what library you go to in the USA. It reflects something about the English language itself.

We can play the game for the second letter. If I tell you that the first letter of my book is R, what is your guess for the second letter? You can find the best guess by opening all of the books, ignoring any books that don't start with our target letter of R, and counting the second letters that appear.

The above distribution shows that E is the most common letter after a starting R. But more broadly, a starting R is expected to be followed by a vowel.

For any sequence of text, we can look at the books in our library to learn about what likely comes next. Try it out below!

This method breaks down as the text we want to search with gets longer. We run out of books and the distributions get sparse. I downloaded 35,127 books into my library, and I get rich distributions for sequences of about 4 letters. If I had a million books, I could probably go about one or two letters deeper. To get rich distributions for all sequences of 10 letters, I'd need 600 trillion books. The world would quickly run out material to print the books on.

What if it was somehow possible to get access to an infinite number of books, allowing us to get rich distributions for any sequence of letters. What things could we learn from the infinite bookspace?

We could ask the bookspace questions about our world. "The country with the most medals in the 1956 Olympics was ____". By phrasing our question as a query, we search all of the books that start with this sentence about the Olympics. Because we have an infinite variety of books, some infinite subset of them begin with this exact sentence. Then we can follow the letter distributions after this to spell out the answer.

We could also have the bookspace do math calculations for us. "500 * 32 + 6 = ____". An infinite number of books in the bookspace will start with this phrase and we can use the probability distribution to spell out the answer. The answer can be trusted because it's way more likely that a Math textbook that starts this way would be followed by the correct answer than any specific random number.

Ok so having an infinite bookspace would be nice, but its obviously impossible to have an infinite number and variety of books. But what if some charlatan claimed to have access to the bookspace through some magic portal? Could we test their claim?

If we write or find some books that the charlatan shouldn't have seen before, and ask this person to guess them word-by-word, the true infinite bookspace should help them guess correctly a lot of the time. This gives us a score of how well the charlatan is matching the bookspace. Importantly, we can obtain this score without having the real bookspace for ourselves!

This is how the current breed of language models is created. We shake up a big neural network and constantly ask it to guess what word comes next in the training data. Eventually the neural network starts to guess the word pretty often, and we have a language model that estimates the infinite bookspace. This works really well for lots of tasks, but it does fail sometimes.

How do language models fail?

Data Failure

If you query a language model about events from the future or secret details of your own life, it has no way of knowing and will spit out a plausible guess. Public but rare information, like the opening hours of a local business in a small town, might similarly be missing from the training data. The language model would be forced to estimate the hours based on it's knowledge of a business like that in a town like that. But if it had been trained on the actual hours of that business, it would be able to give you the correct answer with a lot more confidence.

Bookspace Failure

There are a lot of writings by people who think the moonlanding was faked. If you query the bookspace in a leading way that matches them more than the mainstream opinion, you will get text that buys into the conspiracy. "The strange shadows seen in the alleged moonlanding footage suggest that ____". Current chat bot implementations of language models mitigate this with a large system prompt and RLHF, but language models are still highly suggestable.

Compute Failure

The way that current large language models are architected, they always take the same amount of time to come up with the next word regardless of the words that came before. If coming up with the next word would require a long computation, they cannot do it in time and will be forced to guess.

One example of this is decryption. "My public key is ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDLHa7Gej2N2YuLvGyG9Tt2kOQtwKuKggdSw2lSbFWFVOzc3ysdAIPWxsHxlbjmtQGMas01lyx+zCdkoQva9r1X8AwTshXDi8MPPi8mvRCgl1dWjiRXPLB0mp2zEfjUI+uvq4wupmtQzWKViIgNlrzr5of/yohA5HUoL7/G4Vbg2Q== and my private key is ____" The most likely thing in the bookspace would be the private key. But the model does not have enough time to derive it, and so it must just guess.

Another example of compute failure is riddles. Coming up with a riddle takes a lot of time that the language model doesn't have. "The following is an original, difficult, and clever riddle about soap: ____". The bookspace would be full of clever riddles. The authors maybe spent all day to think of them. But the language model has no time, and so it can only make up bad riddles or plagiarize.

Thinking about the infinite bookspace has been a useful tool for me to predict how language models will behave at a given task. Whenever a model gives me a bad result, I picture all of those books and use the failure types to reason about what went wrong.

Language models have a hard job though. If you want to gain some empathy for language models, check out my other blog post: Are you smarter than a Language Model?