Wordle, the simple word-guessing game, has become insanely popular. Your job is to guess a common five-letter English word and see which of the letters in your guess are in the unknown word, either in the correct or incorrect location. You can only play once a day, you have six tries, and everyone is guessing the same word.
What is the right word to start with? Christopher Penn suggests ADIEU, which allows you to see the presence of common vowels. But which guess will be most likely to get you the most hits?
As a word guy, a mathematical thinker, and a novice Python coder, I decided to apply my skills to analyzing this question.
The analysis here is dependent on what list of possible words you use. I used the Scrabble NSWL2020 word list, which includes all legal Scrabble words, including some you’ve probably never heard of. If you use a different list, your results will be different. (The originator of the game used a list of 13,000 five-letter words, but narrowed it down to 2,500 solution words, and his list is not public.)
The Scrabble list includes 191,476 words, of which 9349 are five-letter words.
Update: It turns out that Wordle uses a list very different from the Scrabble list. That turns out to generate very different answers to the question of what word to start with. I’ve posted the updated analysis based on Wordle’s actual word list here.
The most common letters are not what you might think
What are the most common letters in the English language?
Everyone knows that E is the most common letter. If you look at a large collection of text, the most common letters, in order, are ETAOIN (which are also the first set of letters on a Linotype keyboard, for obvious reasons — in the old days, you might see this “word” in print, indicating that the Linotype operator marked a line with an error for deletion, but it slipped through).
But the frequency of these letters is based on their use in actual text, where words like “the” and “be” are far more common. In Wordle, any word in the lexicon is just as common as any other word. So the right question is, what are the most common letters in the lexicon of five-letter words, counting each word once?
Surprisingly, E is not the most common letter. In order, the most common letters in the lexicon are: SEAROI. This is true regardless of whether you count repeated letters once or twice. Here’s a chart of the letters of the alphabet and how many legal five-letter words they appear in.
Looking at this chart, you might want your first Wordle guess to be a five-letter word made from the five most common letters SEARO. The word “AROSE” is the only common word made from those five letters.
But the letters’ position matters, too
Perhaps AROSE is not the best choice. More words have A in in the middle than at the start. And many words end in S. What are the most common letters in each of the five possible positions in the word?
Based on this, if you want to get the right letters in the right places, you’d guess SAAES. Unfortunately, that’s not an English word, and it has another problem: the duplicated letters mean that you’re losing chances to test more letters at the expense of trying to get letters in the right places.
But looking at this chart, you would definitely want to get the S at the end (as you may imagine, there are a lot of plural words that end in S). You’d also like to get the A in the second position and the E in the fourth position. Using second choices where the first choices are already taken, a good word to guess first is CARES. While this misses the fifth most common letter overall, O, it would seem to maximize the chances of getting hits on the right letters in the right places. Other good first guesses would include BARES, PARES and especially TARES, since that includes the eighth most common letter, T, along with the four most common letters SEAR.
Why not just try every guess and see which one is best?
Of course, with computers at our disposal, we can just try a brute force approach: test every possible guess against every possible solution and see which one is best. I wrote Python code to do exactly that, determining the number of exact matches and matches with the right letters in the wrong place for every possible guess and every possible solution.
This generates an interesting set of possible ways to measure the success of a guess.
If your goal is to maximize the chance that you’ll have at least one letter in the target word, the best guesses are these, each of which will give you a 95% chance of at least one correct letter. (Some of these are obscure words, but they’re all legit Scrabble words and this is your chance to learn some new vocabulary!)
STOAE (an ancient Greek colonnade)
TOEAS (New Guinea currency)
ALOES (desert plants)
AEONS (a really long time)
AROSE (got up)
REAIS (Brazilian currency, plural form)
RAISE (lift up)
ARISE (get up)
SERAI (an inn where caravans rest)
AISLE (path down the middle of seats)
PASEO (a leisurely walk)
PSOAE (muscles in the pelvis)
ANISE (spice that smells like licorice)
By the way, if you want the worst possible guess — the word most likely to generate no hits at all — you might try XYLYL (a family of chemicals used in histology), which generates matches with only 38% of the other words in the lexicon. Other very bad guesses include FUZZY, YUKKY, and IMMIX (to mix in).
If you want to maximize the chances of getting a direct hit (right letter in the right place), your best choices are:
SANES (non-crazy people)
SONES (a unit of noise)
SERES (a succession of plant communities)
SALES (discounted retail offerings)
SOLES (bottoms of one’s feet)
SENES (Samoan currency)
SAGES (smart people)
SIRES (becomes a parent)
It’s no coincidence that these all end in S — 31% of legal five-letter words end in S. But by including S at the start and the end, you’re maximizing the chances of a direct hit over the chance to try a wider variety of letters.
If you want to maximize the number of hits in the word, not just the chances of getting at least one hit, the top choices are:
LARES (Roman deities)
RALES (sounds made by unhealthy lungs)
ARLES (earnest money)
EARLS (British nobles)
REALS (real numbers)
Perhaps you want to combine the chances of getting a direct hit with a likelihood of getting some letter matches. I created a score that counts letter matches in the wrong spot as one point and direct hits as two. The best guesses on that score are:
TARES (weights of containers)
RATES (gives a rating to)
CARES (gives a damn)
DARES (has audacity)
MARES (female animals)
Finally, you might be interested in maximizing your chance of “getting lucky,” which I defined as hitting 7 or higher on my combined score. That means you’re trying to match multiple words with either five hits with two in the right place, or four hits with three in the right place. The best guess for getting lucky is TARES, which matches up closely with this long list of possible targets (but still gives you at most a one in a hundred chance of getting lucky):
ACRES APRES AURES BARES BATES CARES CARET CARTS CATES DARES DARTS DATES EARLS EARNS FARES FATES GATES HARES HARTS HATES KARTS LARES MARES MARTS MATES NARES NATES PARES PARTS PATES RACES RAGES RAJES RAKES RALES RAPES RARES RASES RATES RAVES RAXES RAZES SAREE SATES TABER TABES TACES TAELS TAHRS TAJES TAKER TAKES TALER TALES TAMER TAMES TAPER TAPES TARED TARGE TARNS TAROS TARPS TARRE TARSI TARTS TASED TASES TATER TATES TAWER TAXER TAXES TEARS TERES TERMS TERNS TIRES TORAS TORES TREES TRIES TRUES TWAES TYRES WARES WARTS
So which is the best first word?
Looking at all of this, here are some observations. It’s a really good idea to start with a word that ends in S, because so many other words end in S — you’ll have a 31% chance of a direct hit on the S, and a 46% chance of matching the S somewhere in the word. It’s great to include the most common vowels, A and E, and to include them in the second and fourth positions respectively, since that is where they’re most likely to be in the final word. Having made those choices, your best first-guess words are CARES, TARES, LARES, NARES, and RATES.
If you want to include more vowels to get more intelligence about where they land, I’d pick TOEAS (if you can stand the obscure word), ALOES, or AEONS. All of them end in the high-likelihood S, and all will allow you to probe for the three most common vowels, E, A, and O. If you’re fond of R’s, try AROSE.
These are all strategies that will maximize hits, which I think is a good start.
My analysis is dependent on the world list I used. If you know a widely available lexicon of more common words, it’s easy enough for me to put that into the code and see what comes out. (This 3000-word lexicon is not the best choice, because it doesn’t include enough common five-letter words to be interesting — words like PROXY and PANIC, for example.)
Everyone seems to have their own way of probing Wordle. I’d love to hear about yours. Did this analysis change your perspective at all?
UPDATE: Others have now pointed me to both the allowable guess list and the list of answers for Wordle. I will be updating this analysis shortly.