Thursday, January 18, 2018

Prof. Marks gets lucky at Cracker Barrel

Introduction to Evolutionary Informatics, by Robert J. Marks II, the “Charles Darwin of Intelligent Design”; William A. Dembski, the “Isaac Newton of Information Theory”; and Winston Ewert, the “Charles Ingram of Active Information.” World Scientific, 332 pages.
Classification: Engineering mathematics. Engineering analysis. (TA347)
Subjects: Evolutionary computation. Information technology–Mathematics.1
Yesterday, I looked again through "Introduction to Evolutionary Informatics", when I spotted the Cracker Barrel puzzle in section 5.4.1.2 Endogenous information of the Cracker Barrel puzzle (p. 128). The rules of this variant of a triangular peg-solitaire are described in the text (or can be found at wikipedia's article on the subject). The humble authors then describe a simulation of the game to calculate how probable it is to solve the puzzle using moves at random:
A search typically requires initialization. For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, at random, a single peg is removed. This starts the game. Using random initialization and random moves, simulation of four million games using a computer program resulted in an estimated win probability p = 0.0070 and an endogenous information of $$I_\Omega = -\log_2 p = 7.15 bits.$$ Winning the puzzle using random moves with a randomly chosen initialization (the choice of the empty hole at the start of the game) is thus a bit more difficult than flipping a coin seven times and getting seven heads in a row
Naturally, I created such an simulation in R for myself: I encoded all thirty-six moves that could occur in a matrix cb.moves, each row indicating the jumping peck, the peck which is jumped over, and the place on which the peck lands. And here is the little function which simulates a single random game:
cb.simul <- function(pos){
# pos: boolean vector of length 15 indating position of pecks
# a move is allowed if there is a peck at the start position & on the field which is
# jumped over, but not at the final position
allowed.moves <- pos[cb.moves[,1]] & pos[cb.moves[,2]] & (!pos[cb.moves[,3]])
# if now move is allowed, return number of pecks left
if(sum(allowed.moves)==0) return(sum(pos))
# otherwise, chose an allowed move at random
number.of.move <- ((1:36)[allowed.moves])[sample(1:sum(allowed.moves),1)]
pos[cb.moves[number.of.move,]] <- c(FALSE,FALSE,TRUE)
return(cb.simul(pos))
}
I run the simulation 4,000,000 times, changing the start position at random. But as a result, my estimated win probability was $p_e=0.0045$ - only two thirds of the number in the text. How can this be? Why were Prof. Marks et.al. so much luckier than I? I re-run the simulation, checked the code, washed, rinsed, repeated: no fundamental change. So, I decided to take a look at all possible games and on the probability with which they occur. The result was this little routine:
cb.eval <- function(pos, prob){
#pos: boolean vector of length 15 indicating position of pecks
#prob: the probability with which this state occurs # a move is allowed if there is a peck at the start position & on the field which is
#jumped over, but not at the final position
allowed.moves <- pos[cb.moves[,1]] & pos[cb.moves[,2]] & (!pos[cb.moves[,3]])
if(sum(allowed.moves)==0){
#end of a game: prob now holds the probability that this game is played nr.of.pecks <- sum(pos)
#number of remaining pecks cb.number[nr.of.pecks] <<- cb.number[nr.of.pecks]+1
#the number of remaining pecks is stored in a global variable cb.prob[nr.of.pecks] <<- cb.prob[nr.of.pecks] + prob
#the probability of this game is added to the appropriate place of the global variable
return()
}
for(k in 1:sum(allowed.moves)){
#moves are still possible, for each move the next stage will be calculated d <- pos
number.of.move <- ((1:36)[allowed.moves])[k]
d[cb.moves[number.of.move,]] <- c(FALSE,FALSE,TRUE)
cb.eval(d,prob/sum(allowed.moves))
}
}
I now calculated the probabilities for solving the puzzle for each of the fifteen possible starting positions. The result was $$p_s=0.0045 .$$This fits my simulation, but not the one of our esteemed and humble authors! What had happened?

An educated guess

I found it odd that the authors run 4,000,000 simulations - 1.000,000 or 10,000,000 seem to be more commonly used numbers. But when you look at the puzzle, you see that it was not necessary for me to look at all fifteen possible starting positions - whether the first peck is missing in position 1 or position 11 does not change the quality of the game: you could rotate the board and perform the same moves. Using symmetries, you find that there are only four essentially different starting positions. the black, red, and blue group with three positions each, and the green group with six positions. For each group, you get a different probability of success
group blackgreenredblue
prob. of choosing this group .2.4.2.2
prob. of success .00686.00343.00709.001726
One quite obvious explanation for the result of the authors is that they did not run one simulation using a random starting position for 4,000,000 times, but simulated for each of the four groups the game 1,000,000 times. Unfortunately they either did not cumulate their results, but took only the one of the results of the black and the red group (or both), or they only thought they switched starting positions from one group of simulations to the next, but indeed always used the black or the red one.

Is it a big deal?

It is easily corrigible: instead of "For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, at random, a single peg is removed." they could write "For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, one peck at the tip of the triangle is removed." If the book was actually used as a textbook, the simulation of the Cracker Barrel puzzle is an obvious exercise. I doubt that it is used that way anywhere, so no pupils were annoyed. It is somewhat surprising that such an error occurs: it seems that the program was written by a single contributor and not checked. That seems to have been the case in previous publications, too. Perhaps the authors thought that the program was too simple to be worthy of the full attention - and the more complicated stuff is properly vetted. OTOH, it could be a pattern.... Well, it will certainly be changed in the next edition.

Monday, January 8, 2018

UD in 2017

Just a few pics:

Monday, July 17, 2017

A letter to Winston Ewert

Winston Ewert, Wiliam Dembski, and Robert Marks have written a new book "Introduction to Evolutionary Informatics" Fair to say, I do not like it very much - so I wrote a letter to Winston Ewert, the most accessible of the "humble authors"...
Dear Winston,
congratulations for publishing your first book! It took me some time to get to read it (though I'm always interested in the output of the Evo Lab). Over the last couple of weeks I've discussed your oeuvre on various blogs. I assume that some of you are aware of the arguments at UncommonDescent and TheSkepticalZone, but as those are not peer reviewed papers, the debates may have been ignored. Fair to say, I'm not a great fan of your new book. I'd like to highlight my problems by looking into two paragraphs which irked me during the first reading: In your section about "Loaded Die and Proportional Betting", you write on page 77:
The performance of proportional betting is akin to that of a search algorithm. For proportional betting, you want to extract the maximum amount of money from the game in a single bet. In search, you wish to extract the maximum amount of information in a single query. The mathematics is identical"
This is at odds with the previous paragraphs: proportional betting doesn't optimize a single bet, but a sequence of bets - as you have clearly stated before. I'm well aware of Cover's and Thomas's "Elements of Information Theory", but I fail to say how their chapter on "Gambling and Data Compression" is applicable to your idea of a search. I tried to come up with an example, but if I have to search two equally sized subsets $\Omega_1$ and $\Omega_2$, and the target is to be found in $\Omega_1$ with a probability bigger than to be found in $\Omega_2$, proportional betting isn't the optimal way to go! Does proportional betting really extract the maximum of information in a single guess?

Then there is this following paragraph on page 173:

One’s first inclination is to use an S4S search space populated by different search algorithms such as particle swarm, conjugate gradient descent or Levenberg-Marquardt search. Every search algorithm, in turn, has parameters. Search would not only need to be performed among the algorithms, but within the algorithms over a range of different parameters and initializations. Performing an S4S using this approach looks to be intractable. We note, however, the choice of an algorithm along with its parameters and initialization imposes a probability distribution over the search space. Searching among these probability distributions is tractable and is the model we will use. Our S4S search space is therefore populated by a large number of probability distributions imposed on the search space.
Identifying/representing/translating/imposing a search and a probability distribution is central to your theory. It's quite disappointing that you are glossing over it in your new book! While you give generally a quite extensive bibliography, it is surprising that you do not quote any mechanism which translates the algorithm in a probability distribution.

Therefore I do not know whether you are thinking about the mechanism as described in "Conservation of Information in Search: Measuring the Cost of Success": this one results in every exhaustive search finding its target. Or are you talking about the "representation" in "A General Theory of Information Cost Incurred by Successful Search": here, all exhaustive searches will do on average at best as a single guess (and yes, I think that this in counter-intuitive). As you are talking about $\Omega$ and not any augmented space, I suppose you have the latter in mind...

But if two of your own "representations" result in such a difference between probabilities ($1$ versus $1/|\Omega|$), how can you be comfortable with making such a wide-reaching claim like "each search algorithm imposes a probability distribution over the search space" without further corroboration? Could you - for example - translate the damping parameters of the Levenberg-Marquardt search into such a probability distribution? I suppose that any attempt to do so would show a fundamental flaw in your model: the separation between the optimum of the function and the target....

I'd appreciate if you could address my concerns - at UD, TSZ, or my blog.

Thanks,
Yours Di$\dots$ Eb$\dots$

P.S.: I have to add that I find the bibliographies quite annoying: why can't you add the number of the page if you are citing a book? Sometimes the terms which are accompanied by a footnote cannot be found at all in the given source! It is hard to imagine what the "humble authors" were thinking when they send their interested readers on such a futile search!

Tuesday, February 2, 2016

Some Pies for "The Skeptical Zone"

In 2015, there some 45,000 comments were made at The Skeptical Zone. Here are the top ten of the commentators (just a quantitative, not a qualitative judgement.) I'll stick to the color scheme for all of figures in this post... "The Skeptical Zone" has a handy "reply to"-feature, which allows you to address a previous comments (with or without inline quotation.) It is used to various degree - and though some don't use it at all, nearly 50% of all comments were replies.

Wednesday, January 27, 2016

"Uncommon Descent" and "The Skeptical Zone" in 2015

Since 2005, Uncommon Descent (UD) - founded by William Dembski - has been the place to discuss intelligent design. Unfortunately, the moderation policy has always been one-sided (and quite arbitrary at the same time!) Since 2011, the statement "You don't have to participate in UD" is not longer answered with gritted teeth only, but with a real alternative: Elizabeth Liddl's The Skeptical Zone (TSZ). So, how were these two sites doing in 2015?

Number of Comments 2005 - 2015

year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
UD  8,400 23,000 22,400 23,100 41,100 24,800 41,400 28,400 42,500 53,700 53,100
TSZ - - - - - -  2,200 15,100 16,900 20,400 45,200
In 2015, there were still 17% more comments at UD than at TSZ.

Tuesday, January 26, 2016

The "Discovery Institute" trembles before the mighty powers of DiEbLog!

Just kidding. It isn't. But they published some of the pages the absence of which I had criticized in my previous post: John G. West wrote an article on Dennis Prager Was Right: Atheists Are More Open-Minded on ID than Some United Methodist Officials, in which he included further pages from the poll which the Discovery Institute (DI) had ordered on the subject of being snubbed by the United Methodist Church.

I assume that this little blog mainly flies under the RADAR of the DI, but they most probably follow astutely the very amusing Sensuous Curmudgeon, where I raised the problem earlier.

So, as I have guessed there was a question Q9, regarding the religious beliefs of the participants of the study. Why did the DI need an extra day to put a spin on the answers to this questions? Did they think it to be especially juicy, so that they were able to get yet another article from it? Or were they annoyed that one third of the participants of the poll identified themselves as agnostic or atheists?

Let's wait and see for Q8 - the question for the degree of education. Perhaps some scientists named Steve were involved, that result could be unpleasant...

OMG - The Discovery Institute is Committing Censorship!!!11!!1!

Does the Discovery Institute (DI) want to keep its much coveted Censor of the Year Award for itself this year?

If you are interested in this kind of things, you will have noticed the tantrum John G. West and his friends are collectively throwing over at Evolution News & views (EN&V) because they were somewhat rebuffed by the United Methodist Church (UMC). Here is some background as it presents itself to me (EN&V's viewpoint may differ): The UMC is holding its ''General Conference'' once every four years. In May 2016, it will be taking place at the ''Oregon Convention Center''. ''Sponsors and exhibitioners'' may rent booths at the center to present themselves to the estimated 6,500 participants of the event. The DI was willing to pay the 900 Dollar - 1200 Dollar fee to become an exhibitioner, but their application was turned down. There may have been various problems, but unfortunately for them, it did not seem to match the fourth criterium for eligibility:

Proven Business Record: Purchasers must have a proven business record with their products/services/resources. Exhibits are not to provide a platform to survey or test ideas; rather, to provide products/services/resources which are credible and proven.
It is fair to say that the DI has not recovered from this blow yet- over the last eight days, there have been at least fourteen articles been published on this matter at EN&V. One of the highlights was this New Poll: Most Americans Turn Thumbs Down on United Methodist Ban on Intelligent Design: The DI spent the money it has saved on the booth to have a survey performed by SurveyMonkey. It asked:
The United Methodist Church recently banned a group from renting an information table at the Church’s upcoming general conference because the group supports intelligent design—the idea that nature is the product of purposeful design rather than an unguided process. Some have criticized the ban as contrary to the United Methodist Church’s stated commitment to encourage “open hearts, open minds, open doors.” Rate your level of agreement or disagreement with the following statements:
1. The United Methodist Church should not have banned an intelligent design group from renting an information table at its conference.
2. The United Methodist Church’s ban on the intelligent design group seems inconsistent with the Church’s stated commitment to encourage “open hearts, open minds, open doors.”
What surprised me: thought the question was obviously leading, still 30% didn't agree with the first statement and 22% didn't agree with the second one! Or, as the DI describes it:
More than 70% of the 1,946 respondents to the nationwide survey agreed that “the United Methodist Church should not have banned an intelligent design group from renting an information table at its conference.” More than 78% of respondents agreed that “the United Methodist Church’s ban on the intelligent design group seems inconsistent with the Church’s stated commitment to encourage ‘open hearts, open minds, open doors.’”
But here is the cinch: Though EN&V announced that the "full report" can be downloaded from here, it is obvious from the pagination that at least two pages are missing!

Enter panic mode: OMG! The Discovery Instituted is censoring its report! What are they covering up? Are they beating puppies? Like Darwin! They should get their own Censorship Award!!!!11!!1

The truth is a little bit less sinister: Survey Monkey asks you about your age (Q11), your gender (Q12), your income (Q13), your party affiliation (Q10) and the region you are living in (Q14). What is surprisingly missing are questions about your religious orientation and your education. These two characteristics are of obvious interest for a poll like this one - so, I am guessing that the questions Q8 and Q9 were about these matters. Maybe the results did not please the DI and thus, were omitted from the final report.

Edit: Instead of trying to claim that it was meant to be ironic, I just corrected an embarrassing spelling mistake in the headline...