Dale writes:
With your automated computer program, could you not develop the data for matching the 1830 Book of Mormon text directly with the Spalding text, in a matter of a few hours? We could then fill in the Spalding chart, to see if it had any noticeable distribution patterns of shared language with the Book of Mormon (or with Warren, etc.)
Yes. It's not entirely automated. Perhaps the most time consuming part is breaking the text into the appropriate sized pieces.
In the approach that you take, sample sizes really aren't that much of an issue - just tending to level out as the sample sizes get larger (this ought to be as expected - since we will limit towards the averaged).
I thought I would address one other point. We know of texts that are related. Warren's history is one of three histories that borrows quite liberally from two other historians. One of these is Ramsay's histories (he did a history of the revolutionary war and a history of George Washington). When we compare Warren to Ramsay, we get something very, very different to the kinds of results we see for the Book of Mormon and Spalding text comparisons.
There are roughly 2,200 unique three word phrases in common between the Book of Mormon and Spalding's manuscript, and a 40% vocabulary overlap for the entire text (this includes all uniqe proper nouns) - this roughly 40% goes in both directions despite the fact that the Book of Mormon is much, much longer than Spalding's manuscript. What Dale has done is to point out that in certain parts of these books have a higher correlation (although this has been entirely directional - what he hasn't done yet that probably ought to be done is to go the other route to see if the Book of Mormon is more like certain parts of Spalding than other parts). This 2,200 figure amounts to roughly 6.4% of all unique 3 word phrases in Spalding and about 1.5% of all the unique 3 word phrases in the Book of Mormon. Part of the reason for this discrepancy is the difference in size between the manuscripts.
There are only about 450 unique shared 4 word phrases between Spalding and the Book of Mormon. This accounts for 1.2% and 0.2% respectively. Again, the gap is due in part to the size of the texts. These are unique phrases. Dale's numbering doesn't consider this aspect. For him, all instances are counted (this isn't a bad thing though in the way that he looks at the page). For example, the 4 word phrase "at the head of" occurs 12 times in both texts. While "by the hand of" occurs only once in Spalding, and 47 times in the Book of Mormon. "To the land of" occurs once in Spalding but 91 times in the Book of Mormon. Spalding's text is not as repetitive and is not that long, so this is also perhaps somewhat expected.
Having said that, we can compare Warren with Ramsay's
Life of George Washington which I can do quickly because I already have the normalized data. And just looking at over all numbers, the differences are rather amazing.
Warren is about as long as the Book of Mormon. Ramsay is about two to three times the length of Spalding (but nowhere near the length of either Warren or the Book of Mormon). The results:
Vocabulary overlap: Warren has just under 50% of the vocabulary of Ramsay. In the other direction, the number jumps to an astonishing 80% (note that this indicated borrowing in the direction of Warren - Warren borrows from Ramsay and so most of Ramsay's vocabulary is in Warren).
3 Word locutions: 11,400+ (this is far more than the 2,100+ between the Book of Mormon and Spalding).
By percentage, 5.1% of the unique 3 word phrases in Warren's text are found in Ramsay, with an amazing 15.2% of unique three word phrases from Ramsay in Warren. This compared to the 6.4% and 1.5% for the Book of Mormon and Spalding comparison. By way of another comparison, Warren and Spalding's comparative numbers are 10.2% and 1.5% respectively. And the Book of Mormon compared with Jules Verne's book comes in at 4.8% and 1.6% respectively as well.
And for 4 word phrases? Between Warren and Ramsay, there are 3,700+ common unique 4 word phrases (compared to the 450 between the Book of Mormon and Spalding) - that's 1.3% and 4.4% of the totals (remember for the Book of Mormon and Spalding these numbers were 1.2% and 0.2% respectively).
Now these numbers need to be adjusted a bit to put them into the same kind of perspective that Dale gives the issue here. That 3,700 unique phrases in common represent (including duplicates) roughly 5,400 phrases in Ramsay and 9,500 phrases in Warren. So, using my 15 pages of Spalding as a guide (using 162 words as an average word count per page), this would make a document roughly 540 pages long. Put into that the 5426 tabulated shared 4 word phrases, and we get an average of roughly 10 such strings per page. Reducing that to the ratio that Dale used (10:162) we get 6.17%. Now, add this to the over all vocabulary overlap. The figure I gave above of 80% represents all unique vocabulary. Since the unique words will be a small minority, we can figure the overlap in terms of total words. In doing this we note that while there are 1483 unique vocabulary words in Ramsay that are not in Warren, the total number of words represented by these 1483 terms is on 2,655 words (most of them are actually unique). Divide this into the total number of words in the text (87,433), and we get an overall percentage of 97% overlap in words.
Add these two numbers together - 96.96% and 6.12% and we get 103.08%. That will be the overall AVERAGE expected value on Dale's scale for each and every page of Ramsay's text when compared to Warren's text. Assuming any kind of variance, it wouldn't be surprising to get single pages as high as 110-115% or perhaps even more.
My point here (and this is also for Roger) - Dale's figures are not that significantly high. They don't show the kinds of numbers I would expect to see with deliberate borrowing. Now there are, I am sure, plenty of way for Spalding theorists to deal with this particular perception, but my point remains that there will always be an apparently significant overlap between texts, and that this isn't unexpected or particularly noteworthy. Part of what makes it look significant is that most people who look at this kind of data don't have any expectation of results from pulling lots of texts together and looking at them.
It is more troublesome in this case (Dale's argument) that the criteria for the connection is the statistical model. In other words, in any particular comparison between two texts, some pages will be more like a plausible source than other pages - purely by coincidence. In the above Ramsay/Warren example, I am sure that there are going to be significant variations between individual pages of Ramsay (should I take the time to parse it into 162 word sections, we could even determine this). But, the over all averages will be what I calculated. If you want to take the data describing the relationship between Spalding and the Book of Mormon and use it to break the Book of Mormon up into sections that are the most like Spalding, you create something of a circular argument. Only in assuming that the most Spalding-like chapters represent an underlying Spalding text can you create the data that naturally agrees with this assumption.
My own look at this kind of statistical analysis suggests that the difference between finding Spalding as a source for the Book of Mormon and finding the English translation of Jules Verne's book as the source for the Book of Mormon is virtually non-existent. The only way to bring the Spalding material into more relevance is to do what Dale has done, and isolate those sections of the Book of Mormon that are the most Spalding like. There is a down side to this. When the most Spalding-like material is removed, the rest of the Book of Mormon develops an almost anti-Spalding look and feel.
And there is one more issue. Dale looks at the Book of Mormon in small sections compared to the Spalding manuscript. But taking it the other way (Spalding by page to the Book of Mormon) reveals an entirely different kind of look. Dale asked about that one page way on the bottom with only 80% overlapping vocabulary (in my numbers). Here are the words on that Spalding page (12) not in Warren:
antic, barking, brains, capers, croaking, dancing, devils, distortions, frogs, furies, gestures, jumping, medley, owls, screaming (x2), screeching, shouting, topsy, tumbling, turvy, uproar (x2), whooping (x2), whoops, wolves
Now, how many of these words are in the Book of Mormon? It's an interesting question. The second lowest frequency page in Spalding used these words not in Warren:
anchored, billows, buxom, dames, devotions, droll, firma, foaming, furled, hark, huge, Jesus, keeled, longing, mariner, mate, neptune, rosy, shipmates, sturgeon, terra, tom, tossed, trojanus, ye
(I think I missed some proper nouns there ...) Hopefully you get my point here. The Book of Mormon has a tiny vocabulary for a book of its length. When looked at with this kind of analysis, it has a fairly high degree of overlap. But, we can all look at your pages and see that the majority of that overlap is in fairly common words. But, the Book of Mormon also doesn't use a majority of the language which Spalding uses. And outside of this kind of statistical analysis which tends to ignore the Spalding material not found in the Book of Mormon, we have a Book of Mormon which really doesn't look or read like Spalding. And once we arrive at the conclusion that the statistical numbers Dale is producing aren't as good as they look, well, my point is that there isn't much of a reason from this kind of a study to actually conclude borrowing at all.
All texts have some correlation Roger. Not some, all of them do. If you want, we can produce some tests completely randomly. We will find that content and subject matter makes a difference. But not perhaps as much as you thing. A common millieu will have far greater impact. The hundred most common words for a time and place will still be the hundred most common words - and you will tend to find them in all authors covering all subject matter. This is one of the reasons why it is these kinds of common words that are used in word print studies (including Criddle's) and not the kinds of words that we think are much more significant.
Ben M.