Response to Jockers, Criddle, et al., Now Available

The upper-crust forum for scholarly, polite, and respectful discussions only. Heavily moderated. Rated G.
Post Reply
_MCB
_Emeritus
Posts: 4078
Joined: Sat Aug 29, 2009 3:14 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _MCB »

It is very naïve to assume the method will work equally well in both cases.
That is why Dale, elsewhere, is suggesting that John Leacock's works, both in KJE and regular English, be tested with NSC to see how they compare. They were published about 1775, so Rigdon, Spalding, or Smith could not possibly have written them.
Huckelberry said:
I see the order and harmony to be the very image of God which smiles upon us each morning as we awake.

http://www.vatican.va/archive/ccc_css/a ... cc_toc.htm
_GlennThigpen
_Emeritus
Posts: 583
Joined: Wed Apr 09, 2008 5:53 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _GlennThigpen »

CaliforniaKid wrote:Glenn,


In principle, Matt is right. Given Matt's assumptions, Bruce's modification shouldn't have been expected to produce substantially different results. However, Bruce's results are very different. I don't see how Matt can ignore that. Something is wrong with Matt's assumptions. Either the author is not in his candidate set, or NSC is not returning accurate results when applied to the Book of Mormon.


Peace,

-Chris


Chris,
The assumptions of Matt and Bruce are really irrelevant to the problem. The original Jockers methodology would have the same problem, as Bruce demonstrated, with any set of texts if the real author is not included in the candidate set.
I agree, and I think that Bruce would also, that more work needs to be done to see if genre does indeed influence the NSC results.

Glenn
In order to give character to their lies, they dress them up with a great deal of piety; for a pious lie, you know, has a good deal more influence with an ignorant people than a profane one. Hence their lies came signed by the pious wife of a pious deceased priest. Sidney Rigdon QW J8-39
_CaliforniaKid
_Emeritus
Posts: 4247
Joined: Wed Jan 10, 2007 8:47 am

Re: Response to Jockers, Criddle, et al., Now Available

Post by _CaliforniaKid »

GlennThigpen wrote:The assumptions of Matt and Bruce are really irrelevant to the problem.

All I was saying is that their assumptions explain why they approach the problem so differently.

Matt assumed it was OK to use a "closed set" because the probability of someone outside his candidate set having been the author seemed vanishingly small. We shouldn't be too hard on him for this assumption, because at the time it seemed like a reasonable enough assumption to make.

Bruce's religious beliefs led him to expect that the true author is not in the candidate set, which is why he developed his "open set" approach. If Matt's assumptions were correct, then Bruce's results shouldn't have been substantively different from Matt's. However, they are different. Therefore, Matt's assumptions need to be re-evaluated.

In short, Matt is right to defend himself in the sense that his methodology was based on assumptions that seemed sound at the time. However, he is wrong to defend himself in the sense that Bruce's study does appear to seriously problematize those assumptions.

MY contribution to the discussion is to observe that Bruce's anomalous result may not mean the author isn't in the candidate set (which is how he interprets it in the paper). Rather, it may mean that the NSC methodology was inadequate to the task from the get-go. Bruce is right to question Matt's assumptions, but he is focusing on the wrong one.

The original Jockers methodology would have [a] problem, as Bruce demonstrated, with any set of texts if the real author is not included in the candidate set.

Yes, it would.

Peace,

-Chris
_bschaalje
_Emeritus
Posts: 31
Joined: Fri Jul 16, 2010 8:03 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _bschaalje »

Hi Matt (through aussieguy).

Though the authors would like readers to believe that our use of NSC was "naïve" (as they write in the article), it was not in fact naïve. (It warrants mention that the statistician, Prof. Witten, who did the statistics on our project, is Tibsharani's former grad student. Tibsharani is the one who invented NSC. ) Our use of the algorithm was perfectly legit for what we applied it to. Schaalje misrepresents our objectives in order to create a straw man argument related to the business of a closed set of candidate authors.

Being someone’s graduate student doesn’t guarantee that the person will not naïvely misapply their mentor’s work. Tibshirani invented the method for a very different application, tumor type identification from gene-microarray data, in which the closed-set assumption is usually met and there is nothing like text size to worry about. Go talk to Tibshirani himself.
Your use of NSC was *not* perfectly legit. For one thing, the goodness-of-fit test showed that predictions based on your fitted model did not match your data. Assumptions that are contradicted by the data are bad assumptions. Inferences based on bad assumptions are bad inferences. It’s like saying that we added two numbers when we should have multiplied them. But we felt like adding for this problem, so it’s magically legit.
Burrows, Juola and Koppel (leaders in the authorship attribution field) all worry about mistaking open-set problems for closed-set problems. It’s not a straw man to worry about this issue in this of all cases. Matt, for interest’s sake, how do you justify ignoring the false-positive cutoff for your use of Burrow’s delta?

Schaalje's primary complaint is that we use a closed set of candidate authors. That is a legitimate complaint and it is a point that we acknowledge quite clearly in the paper.

You simply pointed out that you chose to use closed-set classification, as if the choice to do so has no consequences for your inferences. If you add two numbers when you should have multiplied them, you get the wrong answer.

That said, all authorship problems are ultimately closed set problems. You cannot test for every single person in the entire history of the world. Instead, you have some candidates and wish to figure out which among them is the most likely culprit. . .

. . . or, as Burrows said, determine if it is wiser to look further afield. In other words, you can’t just assume away the possibility that none of them is likely. The open-set technology allows you to honestly allow this possibility. So you’re wrong. Not all authorship problems are closed set problems. I would say the vast majority are open.

Our objective was to test *existing* theories of authorship of the Book of Mormon. And that's exactly what we did. We took the list of suspects and tried to rank them in terms of their likelihood.

Read your own paper. You acted as if your authorship probabilities were absolute probabilities rather than probability rankings. When the absolute probabilities are so small, it is highly misleading to build a case on probability rankings.

We point out in the paper that it is possible that the *real* author is not in the closed set. The real author could, for example, be Moroni or Napoleon. The point of our work was to "reassess" prior theories of authorship using machine classification. Our result is only compelling if you first accept the historical evidence and then accept that the candidates we tested represent a good set of candidates.

Even if you tentatively accept the historical evidence, the results are not compelling if the closed-set assumption is empirically untenable. All of your candidates (other than Isaiah) used your marker words very differently than they were used in Book of Mormon texts. You can see that in the principal component plots and the goodness-of-fit tests.

The Schaalje paper creates a straw man argument and then plays fast and lose with the facts of our paper, cherry picking little bits here and there to make it look like we did something other than what we did, or that we had a goal other than what we had. In my opinion the fundamental conclusion of our work is that it lends additional support to the spalding-rigdon theory of authorship.

Please back these statements up with more than bald assertions. What is the straw man? Exactly where did we misrepresent your paper?

Schaalje makes a silly point of testing for Rigdon in the Federalist papers. No one ever suspected Rigdon of writing the Federalist papers. And then to make sure that his results pack the right punch to match his bias, he uses an entirely different feature set from ours and then, voila, they say that Rigdon shows up as most likely author of some of the Federalist papers. This is just plain hogwash.

This example dramatically shows how silly attribution results can be when closed-set methods are used inappropriately. But open-set technology rescued the attributions from being absurd even in this silly example.
The use of a different feature set makes no difference. What bias is introduced by using a different feature set? Do you think we chose features that would make Rigdon look similar to Hamilton under closed-set methods and different from Hamilton under open-set methods? Look at the principal components plot. If anything our feature set seems to bias things against the point we are making. I’m completely baffled as to why you can’t see the warning in this example. Most readers of this board immediately see the relevance of this example. Why do you choose not to?

Schaalje tries to suggest that text length is a factor. we checked that there was indeed no correlation between text length in our study and the results we derived. I can't remember if that was included in the final version of our paper.

How did you check this? Exactly what did you correlate with what? Look at figure 6 of our paper. The variances of the features--the major source of uncertainty in the authorship attributions--depend greatly on text size. But your probability calculations have no adjustment for text size at all. You pretend that you are calculating absolute probabilities, and these most definitely depend on text size.

For what it is worth, Schaalje does offer some interesting ideas but ultimately they are constructed out of an entirely different data set and used in an entirely different approach.

I think this gets at the fundamental problem. We adamantly do not believe that historical theories give you license to ignore empirical characteristics of the data.

And all of this they did in the context of statistics and formulas that only a highly trained statistician would understand. In my opinion, they should have first sought to have the statistical methodology reviewed and published in a journal of statistics. They suggest a modification of NSC and NSC is an approach that has been peer reviewed by statisticians who have the necessary expertise to evaluate it.

We did. The reference (Communications in Statistics – Theory and Methods) is in the bibliography. (By the way, we similarly believe that Matt and Craig should have first sought to have their historical theories reviewed in a history journal. Since we did go to a statistics journal, why not do it now?)

In my opinion, Schaalje played fast and loose with everything in our own work in order to present a critique that would be pointless to try and rebut. Their paper is full of baffling formulas and mathematical slights of hand. In my opinion, it says virtually nothing.

If you don’t understand the formulas, have your coauthor (Witten) explain them to you before giving an opinion that means exactly nothing. Where are the slights of hand? Please give just one example.

Unfortunately, it will achieve exactly what the authors hope, namely, it will provide them with a peer reviewed paper that they can hold up and claim to be a rebuttal of our paper. Anyone who can read and understand it will see that it is nothing of the sort, but few in Schaalje's target audience will, I suspect, actually try to read it.

It’s clear that *you* didn’t try to read it with understanding before forming your opinion. It’s also interesting that you admit to not being able to understand the math, but have no problem firmly concluding that ‘it is nothing of the sort.’ Aside from believing that we have made a real contribution to statistical authorship attribution methodology, our real hope is that future research on statistical authorship attribution will not be as shoddy as the Jockers et al. paper.
_harmony
_Emeritus
Posts: 18195
Joined: Fri Oct 27, 2006 1:35 am

Re: Response to Jockers, Criddle, et al., Now Available

Post by _harmony »

bschaalje wrote:It’s clear that *you* didn’t try to read it with understanding before forming your opinion.


You know, it's entirely possible to read something, understand it, and disagree.
(Nevo, Jan 23) And the Melchizedek Priesthood may not have been restored until the summer of 1830, several months after the organization of the Church.
_bschaalje
_Emeritus
Posts: 31
Joined: Fri Jul 16, 2010 8:03 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _bschaalje »

You know, it's entirely possible to read something, understand it, and disagree.

I agree. I was too strong in what I wrote last night.
However, I think the situation is analogous to fitting a straight line to observations that clearly follow a curved pattern. You can mechanically find the best fitting straight line, and you can say that you have theory that suggests that the relationship should be a straight line. But the empirical evidence is that the relationship is not linear.
Jockers seems to be saying something like “The fitted straight line is fine to use for predictions because my theory says the relationship should be a straight line. As long I admit that these predictions assume that the straight theory is correct, I can proceed without fitting a more complex model.”
I’m contending that empirical evidence trumps theory, that the data itself shows that the straight line theory is not correct. My contention is that you should therefore fit a curved relationship to the observations and use this for prediction. Why even gather data if you’re going to ignore what it tells you about the relationship?
_mjockers
_Emeritus
Posts: 1
Joined: Tue Jan 25, 2011 3:08 am

Re: Response to Jockers, Criddle, et al., Now Available

Post by _mjockers »

Something of an Apology and some Clarification from the source. . .

Professor Schaalje and forum members,

I write, in part, to apologize for comments I made in a personal correspondence with a Mr. Hausler of Australia.  I have discovered today that those remarks were posted to this forum.  Though I stand by the general message of those remarks, I certainly would have framed them quite differently, and offered background, evidence, and clarification, if I had believed they were going to be entered under my name into a debate on the subject.

For the record, I was impressed by many aspects of Professor Schaalje's paper and certainly would have registered those impressions had I thought I was speaking to anyone beyond Mr. Hausler. I'm especially impressed by Schaalje's innovative approach to dealing with the closed set problem. As my private remarks to Mr. Hausler suggest, I'm less impressed with Professor Schaalje's paper as a critique of our paper, but more on that in a moment. . .

Some years ago, Craig Criddle approached me and asked if I would help test several existing theories of authorship in the Book of Mormon.  He provided a list of candidates (the closed set), and we proceeded to test the likelihood of the one candidate versus the others; this approach is fairly standard in the authorship literature.  I do not believe, as Professor Schaalje does, that our work misrepresents the results derived from our closed set analysis.  Nor do I agree with Professor Schaalje that we passed off those resulting likelihoods as anything other than what they were, i.e. probabilities of one candidate versus the others within the closed set. I do understand that there are some who have misunderstood or misinterpreted our results and what they might "mean" in terms of larger questions beyond the scope of our paper. I do not believe that our paper is ambiguous about what it investigates; Professor Schaalje does think so. He and I clearly have a difference of opinion on this point. Nor do I agree with Professor Schaalje that our choice of method was made naïvely. It was, instead, aptly suited to the problem we set out to explore. Professor Schaalje's work addresses a different, if related, problem, and it may be the case that his methods are more appropriate to the problem as he conceived it.

I agree with Professor Schaalje that we must always worry about the possibility that the real author is not in the closed set, and we made that point in the conclusion of our paper. Professor Schaalje insists that even though we stated the point in clear language, that somehow we did not in fact state it, or perhaps we simply did not emphasize the point to his satisfaction. Again, I'd say this is a difference of opinion. From my perspective the nature of our experiment was perfectly clear.

Professor Schaalje's work in trying to deal with the closed vs. open set problem is innovative and will likely inform future work in the field.  As a new approach to authorship problems, I think he and his team have likely made a valuable contribution. I do not, however, see Professor Schaalje's paper as a refutation of our work, and perhaps this is because I read the results of our work differently than he does.

The paper we wrote was designed to answer the question of who among the suspect candidates was the most likely. That's it. In my opinion, Professor Schaalje's paper takes aim at a fictionally constructed argument that we did not in fact make: hence my reference to the "straw man," "playing fast and loose with our conclusions," and "slight of hand."  I understand that Professor Schaalje and other readers may believe that our paper was about "proving" who wrote (or most likely wrote) the Book of Mormon.  That is most certainly not what it was about for me, and this is why I do not believe that Professor Schaalje's paper stands as a rebuttal of our work. Our work was designed to rank a closed set of candidates who had been suggested by other researchers as possible authors. From my point of view our results showed simply that one candidate in the set was more likely than another (for any given chapter).

I do hope that Professor Schaalje will believe me when I say that I'm pleased to see that his work has indeed been reviewed and published in a journal of statistics (mea culpa--had I known that my comments were going to be posted to this forum, I certainly would have given more time to studying his bibliography)!

As for the "baffling statistics" remark, I wrote that (and the first line of my reply) in the context of the email I received from Mr. Hausler who noted in his to me that he was a layman in terms of statistics and could not parse the entirety of Schaalje's argument. Such formulas and statistics, presented as they are in Professor Schaalje's paper, would most certainly baffle the layman.  I believe that our paper, unlike Professor Schaalje's, worked very hard to disambiguate the math and thus make it accessible to the largely humanist audience, which is served by the journal.  Given the journal's audience, we attempted to strike a balance by avoiding the complex formulas and providing a textual description of the mathematical procedures in the body of the text and in the footnotes. I simply meant to convey to Mr Hausler that for the layman, such nomenclature would be very difficult to understand and has the effect of numbing one with numbers. As a point of argument in a debate over the merits of the research Professor Schaalje conducted, my comment makes no sense at all and was not intended to be used for such a purpose. My remarks to Mr. Hausler are completely out of context placed in a discussion over the merits of the research.

And Professor Schaalje makes any number of good points in response to my informal remarks to Mr. Hausler. He notes, for example, that "being someone's graduate student doesn't guarantee that the person will not naïvely misapply their mentor’s work". Yes, obviously that is true. My remark was made parenthetically, as an aside, and as a way for Mr. Hausler to contextualize Professor Schaalje's unnecessary use of the ad hominem "naïve" to characterize our research team. I won't bother contextualizing or defending other things I wrote while thinking I was writing a personal message to a person I have corresponded with from time to time. I meant the message for the recipient, and it is certainly not how I would have addressed a colleague or a public forum.

I am truly sorry that my hastily written note to Mr. Hausler has become a fire point in what should be a scholarly debate argued in the journals.  And I say this not simply because it is embarrassing for me to have my personal correspondence published online, but also because I think it detracts from the true merits of the research: both Professor Schaalje's and ours. I have maintained since becoming involved in this work that I have no dog in the fight, and I still don't. My interests are academic and methodological, and it is on this point that I find value in Professor Schaalje's work, even while it attacks ours in a way I feel to be a bit disingenuous.

I continue to stand by our work, and I will defend it for what it is. I cannot, however, defend what others may construe it to be. Frankly, I have no interest in the ongoing discussion of just what can or cannot be said--based on our work or Professor Schaalje's--about who, ultimately, wrote the Book of Mormon. Given the present evidence, this is not a question that I (or probably anyone else) is in a position to answer, even with advanced statistics and computation! All that anyone can do is to further probe the evidence and hopefully reveal new information that will change the game.

And this, of course, is how science proceeds. Our paper was one in a long line of papers exploring the authorship of the Book of Mormon. Some of these past works have been statistical, some theological, some genetic, and some historical. Ours was by no means unique in approaching the Book of Mormon problem (or authorship problems more generally) in terms of a closed-set of candidate authors; ours was unique in the application of specific type of machine classification algorithm. I'm sure that more research will continue along these lines and that is a good thing.

I have received many emails from folks on both sides of the larger Book of Mormon question, and the one constant is that folks in both camps seem to find much more in our paper than what I believe it to actually contain. This came as a bit of a shock to me, and if the charge of naïvété should be placed anywhere it is here. I was saddened by the number of people who wrote to me having only read the abstract to our paper: on one hand folks thanking me and on the other folks vilifying me. I learned quickly that the paper had touched a sensitive nerve, and I have been and remain disappointed that so many readers of the essay (or, more likely, readers of the forum postings that purport to summarize it) seem to find in it a smoking gun, a gun that I frankly don't think is there.

Many, both in person and via email, have asked that I comment on the paper in these online forums. Given my position as an outsider to the sort of questions being discussed here, it is not my place to engage in this discussion. I believe that the work we completed stands on its own. Perform the experiments as we did and the results will be the same. But certainly other experiments can and should be performed, and I welcome them as I welcome all new research, including Professor Schaalje's. Ultimately, though, I must admit to having no interest in the larger debates taking place here and elsewhere, and I'll not return to this or other forums to discuss the matter further. Before leaving, though, I do wish to apologize once again to Professor Schaalje's and his colleagues for having to suffer through my hasty and untactful reply to Mr. Hausler. Despite my objections to how Professor Schaalje characterized our work and our integrity ("naïve"), I have far more respect for his research than what was conveyed in the tone of my off-the-cuff remarks to Mr. Hausler.

Matthew Jockers
_GlennThigpen
_Emeritus
Posts: 583
Joined: Wed Apr 09, 2008 5:53 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _GlennThigpen »

I, for one, appreciate Matt;s response. I had assumed that aussieguy recieved permission to post that email, as he had said he was going to do.
I don't know if Matt is aware of the information Craig Criddle has been promulgating about the results of the study. He surely gives the impression that the results are absolute.
With Matt's explicit acknowledgement that Bruce's work does have merit, hopefully we can all have a more measured discussion about what it means.

There is a question that has come to my mind during all of this though. Is there a method that can check the authorship homogenity of a text purporting to be by a single author? The original Jockers study assigned some chapters to one author, interspersed by chapters from other authors. I would have to get a copy of the original published paper to really find out just how many chapters by Rigdon are contiguous and how many had other author assignments. That seemed strange to me. And got me to thinking if there was a way to internally check say, First Nephi against itself to see if the authorship word prints are consistent throughout the text.

Glenn
In order to give character to their lies, they dress them up with a great deal of piety; for a pious lie, you know, has a good deal more influence with an ignorant people than a profane one. Hence their lies came signed by the pious wife of a pious deceased priest. Sidney Rigdon QW J8-39
_MCB
_Emeritus
Posts: 4078
Joined: Sat Aug 29, 2009 3:14 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _MCB »

WOW!! Glenn, I am working on an analysis of the text, including possible sources for it which were available in 1816 and/or 1829, and comparing it with Jockers et al's chapter-by-chapter results. It is apparent that richness of sources can be seen in sections attributed to Spalding, while other sections, attributed to Smith & Co. do not have such complex literary allusions. You will just have to wait. I just found a real gem, maybe you can look for it, posted elsewhere.

I will have to review my semantics, to make sure that I have made the point clear that in a chapter-by-chapter review there is a larger margin of error. Matt has previously warned me to be careful about that. It appears by his post that he does not intend to re-analyze with more possibilities, such as Lucy Smith.

More work needs to be done, and I look forward to that.

I would have to get a copy of the original published paper to really find out just how many chapters by Rigdon are contiguous and how many had other author assignments.


That information was not included in the original paper. You can find charts on Dale's website which illustrate the results, if you can bear to also look at how well Jockers et al lines up with Dale's results. I have run a simple statistic that shows that this could not be due to chance.
Last edited by Guest on Tue Jan 25, 2011 2:41 pm, edited 3 times in total.
Huckelberry said:
I see the order and harmony to be the very image of God which smiles upon us each morning as we awake.

http://www.vatican.va/archive/ccc_css/a ... cc_toc.htm
_Fifth Columnist
_Emeritus
Posts: 396
Joined: Fri Nov 26, 2010 7:08 pm

Re: Response to Jockers, Criddle, et al., Now Available

Post by _Fifth Columnist »

I would like to know if Jockers or Schaalje's methods can be used to determine if the Book of Mormon, Book of Moses, Doctrine and Covenants, and the Book of Abraham share the same author. For example, can the Book of Moses be used as the word print for author X and then compared to the other texts to see if author X is the most likely author of the text. The same thing could be repeated using the other texts as the source of the word print for author X.
Post Reply