Wordle Best Starting Words is AROSE not ADIEU - Episode 2463

larryrobx · Jan 26, 2022

YouTube is driving me nuts, Bill, repeatedly deleting my reply to your comment there, seconds after posting it, with a "Returned error" message. And, no further edits I apply are able fix it. So, here's my reply, which will hopefully make it through to you through this alternative path:
_____

Larry Robinson

I always learn so much from your problem-solving Excel formulas in these videos, Bill. But, I was wondering, isn't there a simpler scoring algorithm that yields the same scores?

Why not rank all ~9000 Scrabble dictionary words across the entire alphabet? In other words, fix the "Based on Top" variable to 26 on your sheet? The resultant rank order of your top words stays the same, since all your top scores would still be ranked identically in this more comprehensive scoring.

However, this would also properly calibrate all the lesser value words as well. Thus, once and done, with simpler formulas. Plus, the resultant word scores would then be truly scalar across this particular vocabulary set, so their relative strengths could be directly compared. Just curious. If my logic is off here, Bill or any other commenter, please reply. Thanks!

BTW, this game reminds me a lot of MasterMind, all the rage 1/2 century ago. In it, you try to guess 4 colored pegs on a board instead of 5 letters in a word as in Wordle, with color and position feedback similarly provided after each guess. But, there's a big difference between guessing 6 color combinations versus 26 letter possibilities in these respective games.

To commenter Jeff Davis' optimal "guess #2" question, Bill responds with a smart recursion of his same algorithm. An add'l solving tactic might be borrowable from those MasterMind solvers from so long ago. They had a play called the "impossible negative." Even though it's a guaranteed wrong answer, it maximally narrows the field of remaining possibilities. Those MasterMind gurus proved its efficacy with formal logic and, visually, with a set of interlocking truth tables illustrating the mechanism at work, play by play. Because of that game's inherent symmetry, they were able to demonstrate there's a fixed strategy which guarantees a solution in 5 guesses or less. Of course, the structure of the English language affords no such solution symmetries to Wordle here ...

Anyway, a delightful puzzle. And, such a smart solver/helper routine illustrated here!
_____

Highlighted reply
MrExcel.com

I used to play MasterMind all the time and I forgot about the impossible negative strategy! I was convinced that my iterative strategy would allow me to solve in 3 guesses because I had a three-day run of three guesses. But then today, my third guess was wrong and I had to use a 4th guess (although only one word was still "in play" by then). I am going to consider how to apply the MasterMind to this.
_____

Larry Robinson

Thanks for the reply from a fellow old-time MasterMind aficionado, Bill. You mention you are currently further refining this same tool. So, in that spirit, here are a few add'l ideas you might wish to consider.

Recursively applying your initial algo as you propose above would certainly work. However, after the initial guess, I'd argue that the objective changes. As in MasterMind, the goal of successive guesses is to maximally narrow the field, until the odds of actually guessing the word surpass the benefits of further winnowing down the possibilities list.

Your existing algo could be expanded a bit further to enable a standard chess app strategy of building out a decision tree of possible next moves and then value-scoring them. Now, such a tree in Wordle gets broad fast, of course. However, Excel's calc engine is clearly up to the task. And, unlike chess, where grandmasters routinely need to see 10-15 moves ahead, the advantage of a 2 move Bayesian look-ahead in Wordle would beat any human challenger, due to that very same tree breadth difference between these two games.

Finally, if you want to apply a game theory twist to this resultant decision tree, consider whether your opponent (the word creator) is active or passive. If words are selected at random, then no further adjustment is necessary. However, if your opponent is judged to be either friendly or hostile, a further weighting layer applied to the objective function merits consideration.

We can speculate that the Wordle creator is not hostile, since deliberately picking words which only Scrabble nerds would know would probably needlessly constrain the propagation of this game across its target web audience. On the other hand, if the creator is biased toward selecting only common 5 letter words, then it would make sense to further weight your Scrabble dictionary words here with their relative frequency in common English usage.

A quick scan of the Internet surfaces such a digital data set here:

NEW: COCA 2020 data
This site contains what is probably the most accurate word frequency data for English. The data is based on the one billion word Corpus of Contemporary American English (COCA) -- the only corpus of English that is large, up-to-date, and balanced between many genres.
Source: Word frequency: based on one billion word COCA corpus

Now, the Excel skill required to implement such an algorithm is way beyond my pay grade. But, perhaps, not yours. Plus, you'd have the honor of building the Wordle solver equivalent of "Deep Blue" vs Gary Kasparov in chess. <smile>

You recalled MasterMind from the '80's. So, how about Instant Insanity from the '70's? It was a similar sensation back then: 4 cubes, 6 colors. The goal was to display different colors on each of the four outward-facing sides. So long ago, all I had available to build my solver was Fortran on a mainframe. Primitive, yes -- but, it did the job.
_____

MrExcel · Jan 28, 2022

Hi Larry

I had not heard of Instant Insanity. I had previously built a solver for the 1978 board game Black Box using a TI-99/4A. (There were multiple games called Black Box... the one I am referring to is this.)

Since publishing my videos. I found a couple of interesting word lists.
First... there are only 2315 words that will ever be the Wordle solution.
Second, there are 10657 words that you are allowed to guess but that will never be the solution.

If I look at the letter frequency of the 2315 words, then there is a 3-way anagram tie for using alert, alter, or later as the first guess. irate is one point less.

We are thinking of a tool to analyze all 2315 words.
Loop through all 2315 words assuming the Word(i) is the correct word.
Loop through the top 10 first words. Guess each word and see the response.
Then test the top 20 words from either the list of 2315 or the list of (2315+10657). Get the response for each of those 20 words.
My goal is to always win by Step 3. For the third guess, my theory is that I can only analyze the 2315 words since I don't want to guess something that can not be a winner.
I am pretty sure after Guess 2 or after Guess 3, there will only be one solution.

This all seems fairly do-able in Excel. 2315 words x 10 x 10 is only 231,500 iterations.

I am honestly torn for guess 2. Using the Mastermind Negative theory, it is fine to guess something that could not be a winner if it will narrow down guess 3 for the win.

All of this sounds great, but my friend and co-author Oz from ExcelOnFire has finished his pages for our new book and I need to finish mine. So - the development is going to get tucked away into a spare hour some day.

Bill

larryrobx · Jan 28, 2022

"Black Box - The Game of Hide and Seek." Now, that's a game I've never heard of, Bill. But, from your Pinterest photo, it looks intriguing. And, writing a solver for it at age 17 on a "Trash-99?" ****. Here's a snapshot of Instant Insanity, that '70's game I mentioned, in case you're curious: https://www.jaapsch.net/puzzles/images/insanity.jpg

Back to Wordle, your further intelligence gathering to reduce the target word set by 75% makes a world of difference in sol'n design. You also took the 2-deep decision tree idea and added a crucial, practical element to it. Namely, you prune the tree as you go along, winnowing it down to only the most likely candidates. Your proposed heuristic reduces an otherwise exhaustive search down to a doable quarter million iterations on a personal computer.

Regarding your ambivalence to the Mastermind "impossible negative" tactic, the simple test you describe is a pretty straightforward expected value calc at the tip of each decision tree branch: which choice yields the better payoff for each endgame situation, pass or punt? <wink>

"It is fine to guess something that could not be a winner if it will narrow down guess 3 for the win." (Bill J)

I know you've got lots of other real work to do down there in Florida, so I don't expect a response. (The luxury of retirement allows me to toy with these kinds of puzzles to my heart's content.) At some point, though, I would love to see the endgame Guess #3 histogram of residual word candidates over multiple runs, whenever you eventually implement your algorithm:

"I am pretty sure after Guess 2 or after Guess 3, there will only be one solution." (Bill J)

Then, if you subsequently elect to run this solver engine against every single word in the Wordle dictionary, that would constitute pretty decisive proof of your algo's efficacy.

In closing, young Bill and Larry may have been justly proud of their teenage computer prowess. However, here's a tidbit to mull over. A few years back, I contacted a young college kid who'd written a Minesweeper solver, complete with web player interface. (This and Solitaire were Microsoft's free PC games back then, I'm sure you remember.) He shared both his code (C++, running on a real computer, not a PC) and his algorithm, which was just stunning in its sophistication. And, unlike yours and my rote sol'ns from way back, his was an exhaustive, analytic solver, guaranteeing the best possible probabilistic play given any 10x10 game board. Admirable. And, humbling. I assume Google or Wall Street has long since snapped him up ...

I remember studying Arthur Samuel's checkers player, written in assembler for an IBM 701 in 1952, followed a couple decades later by Mikhail Botvinnik's chess-playing algorithm. He combined his ~20 year world chess champion skills with his "day job," back in the USSR, as an electrical engineer and computer scientist, to develop one of the 1st workable chess-playing programs. His algo took an entirely different approach to the game from my measly coder perspective. Very illuminating. And, just plain cool stuff!

Anyway, hope to hear your Wordle routine results, whenever time allows you to get back to it. Then, all your fans out here across the Internet will look forward to awarding you the "Big Blue" crown for Wordle. Cheers!

larryrobx · Jan 29, 2022

Hands down, your proposed further algo evolution yields the best bang for the buck, Bill -- a straightforward, logical extrapolation of your work to-date on it. But, I just can't resist expanding on those chess-like decision tree build and prune thoughts from my prior post. Now, you've got better things to do with your life, I realize. I just want to lay it out here, for you and your Excel brain trust's consideration, in case it could spark some hybrid ideas there in your own shop. So, here goes ...

The MasterMind analogy is instructive, at least for its proof of the utility of selectively invoking "impossible negative" guesses. However, unlike MasterMind, Wordle clue results are neither symmetric nor transitive. And, thus, such a pure logic sol'n which is derivable for MasterMind is not also possible for Wordle. Yet, Wordle is, like MasterMind, still a deterministic game. (As are checkers and chess, btw -- at least theoretically.) And, unlike chess, Wordle's complete possible outcome set is actually calculable.

Altho' the number of potential Wordle games is very large, it is still reasonably finite. It's 2500 for all possible 1 guess games, + 2500x2500 for all 2 guess games, + 2500x2500x2500 for 3 guess games, etc. Now, the vast majority of such games would be idiotic, of course, and thus obviously not worth exploring. Yet, this means that a unique "best guess" sequence can be determined for each and every one of those 2500 possible game words. The key question is, how much calculation is worth cranking through for how much incremental sol'n improvement? Here's where a prudent, iterative decision tree building and pruning algorithm proves essential. And, this is my proposal for such an algorithm:

Guess #1: Select each of the 2500 possible solution words, one at a time.
Clue #1: For each of these possible solution words, generate its 3^5 = 243 possible clue results, with each clue yielding its own unique, reduced candidate word array. (Utilizing your magnificent winnowing engine, Bill, already built!)
Tally these individual clue candidate word counts across all 243 arrays into a single aggregate word count score. This resultant count ultimately becomes the comparative rating system for the relative "resolving power" across all 2500 possible guess #1 words. So, low score wins, since the objective of each guess is to minimize the remaining list of eligible solution words.
However, that's just round 1, yielding only a provisional solution. It isn't sufficient/definitive, because the various sequences of guess #1 + guess #2 + guess #3 + guess #n turn out to be highly inter-dependent. Hence, we really need chained results (Bayesian what/ifs) across each complete game in order to answer what is truly the "best guess" combo for any particular solution word. (Plus, such a determination can be made with confidence only in retrospect!)
Guess #2: So, as an heuristic, select now, say, the 10 lowest guess #1 scoring words, one by one.
Clue #2: Expand out exactly the same sort of 243 clue results tree as done for clue #1, above, and similarly score their respective "resolving powers."
Guess #n and Clue #n: Apply this same algorithm, recursively, 'til all 2500 solution words are "present and accounted for," however long each of their individual "best guess" word chains turns out to be. Thus, of all the zillions of possible combos enumerated in paragraph 2, above, only this tiny optimal subset remains.

Is this a perfect solution? No! Because we've somewhat arbitrarily pruned the decision tree down to its 10 provisionally best candidates after each guess, based on their discriminant value. An analogy would be to a chess-playing algorithm which sees ahead 10 half-moves (which is magnificent, btw). Yet, this same algorithm is blind to a potential checkmate opportunity at move #11, which a human grandmaster would see in a heartbeat.

Fortunately, unlike chess games, which are typically 40, 50, 60, or more moves long, it appears that Wordle games are usually over after just 3 or 4 expert guesses. Plus, the winnowing process in Wordle appears to be much more uniform and orderly than trying to apply a similar scheme to the multitude of potential moves available from any given chess position.

So, how aggressively you prune your decision tree after each move affects how confident you can be that you've truly reached the "best possible" solution. Now, some further sensitivity testing, experimenting with varying pruning depths, could probably resolve such a question to any level of certainty desired.

Finally, you ask, should you include all 10,000 "impossible negative" words, from that list you've recently uncovered, in your early guesses? Almost certainly the answer is yes, since at least some of them will likely prove to have greater "resolving power" scores than the much smaller 2500 word set of actual possible solutions. Only in later guesses, once the solution candidate list has been greatly reduced, will you need to run that expected value tradeoff calc, as described in my previous post, to decide whether to "fish or cut bait." Modifying my proposed algorithm above to explore all 12,500 words (2500 possible + 10,000 impossible) would be, while not trivial, merely a linear expansion of the computational effort required, not a geometric or exponential one. Thus, it's probably worth such an expansion for its likely improved sol'n yield.

Whew. Long post. Sorry. I look forward to seeing where you eventually land on all this, Bill, once you find the time to build and publish your planned further refinements to this Wordle solver (and, accept appropriate kudos), out there on YouTube.

MrExcel · Jan 30, 2022

The Insanity puzzle now looks vaguely familiar.
A Minesweeper Solver is something that I would buy for myself for Christmas.

Do we really need to study all 3^5 243 possibilities?
After guess 1, I am usually down to 80 possible words. After guess 2, I am down to 4-6 words.

As I solve each day's puzzle, I am realizing that I am in an interesting quandary when I get to four words left.
(Each of the last three days, I have been down to four words after two guesses).

Obviously, I have a 25% chance of winning on this guess.
But let's say I don't win on this guess. I want to make sure that I minimize the words left after guess 3 to one word. Having 2 words or (horrors!) 3 words left after guess 3 is not desired.

I've been manually drawing out this 4x4 matrix each day on a piece of paper. What if I guess X and the answer is Y?

Then I look at the four Results that could happen in each case to see if I have four unique responses in that column.

In the image above, guessing the word DRUNK is a bad choice, because if I am wrong, I have learned nothing about the remaining three words.
For Friday and Saturday, there was one definitive guess that generated four unique answers.

Today, though, none of the four words would guarantee a penultimate guess. If I would guess CRUMB, there is a 25% chance of winning in three, 25% chance of winning for sure in 4, and a 50% chance of [50% winning in 4, 50% of winning in 5].

Does the logic have to change when I get to 4, 5, or 6 words left? Or is this all logic that I should be using before Guess 1? If the matrix is 2,322 x 2,322, I could generate a list of guesses that would maximize the number of unique values and/or minimize the How Many Times Did This Answer happen?

I need to built out some functions here. Like you say, I can prove these theories by running the algo. I am working on the some of the algo functions now.

Wordle Best Starting Words is AROSE not ADIEU - Episode 2463

larryrobx

New Member

MrExcel

.

larryrobx

New Member

larryrobx

New Member

MrExcel

.

Forum statistics

Share this page

Wordle Best Starting Words is AROSE not ADIEU - Episode 2463

larryrobx

New Member

MrExcel

.

larryrobx

New Member

larryrobx

New Member

MrExcel

.

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock