Benjamin McGuire wrote:Dale wrote:
>>Well, Ben McGuire or Nevo can produce the same results from out of the writings of Ethan Smith, or Josiah Priest, or some such early 19th century worthy, then perhaps I'll settle down a bit. But they won't. The best they can do is to show how much of the Book of Mormon is like a KJV Bible (and that doesn't help their cause).
Dale, I am a bit unclear exactly how you came up with one of your figures (your comments notwithstanding). If you could clarify a this for me, I would be happy to oblige you.
You seem to be using the following statistical figures:
1) Word count (easy enough)
Easy perhaps -- but also subject to error. On the first go-around I
simply eyeball-counted the number of words on each 1830 Book of Mormon
page, allowing a two-word count for hyphenated words. But then
I spotted a couple of miscounts and became concerned that I
might have made some additional mistakes --- so I re-counted
each page automatically using a word-processing program. After
adjusting my results, I suddenly discovered that the computer
program was counting some combinations of long dashes, equal
signs and parenthesis as actual words. However, by then I had
already constructed all my charts. Any particular page's word
count may thus be off by one or two words.
2) Shared vocabulary (as a percentage) - am I right in recognizing that you first remove all unique proper nouns?
Yes, because two identical pages, by the same author, with the
names altered on one of the pages, could compute as a page
not attributable to the author we know wrote it. The best
solution I could come up with was to remove such words entirely
from the computations, unless they were present in both texts.
Thus, Jesus Christ remains, to be counted, but Moses and Jews
are eliminated (since they only occur in one of the texts). Words
like "neas" and "mamoon" remain for the initial count, but are
eliminated in the second stage.
I think that this probably skews the data a bit, but I am unsure of how it does until I run the numbers. Would it be acceptable if I presented a couple of alternative counts that a) remove all proper nouns, and b) include all proper nouns.
Perhaps you could present the alternatives in spreadsheet tables,
whereby they can easily be compared to my method.
3) Word Strings - this is the one that has me really quite confused. What exactly do you mean by a word string.
He is an example of two sentences:
1. Dale and Ben never played poker together with Susan.
2. Sam or Tom never played strip poker together with Susan.
Any two words in sequence is a "string." Thus "Dale and"
is a string, as is "Ben never."
"played... poker together with" is a four-word string, but
it is a "broken string," and thus perhaps less statistically
weighted for computational purposes than a normal string
of contiguous sequential words.
"Ben and Dale" (if it occurred elsewhere), would be a string,
but would not match either sentence.
"Sam or Tom" would not match sentence #1, even though
that string has the same function as a subject in string #2.
The "significant string" shared by the two sentences is
"poker together with Susan." because it has four words in
sequence, shared by both sentences. The term "significant"
is objective if we rely purely upon the four-word qualification,
but my picking the number four is subjective. Because of
the likely confusion entering into the study by my using such
a term, I have replaced "significant" with "tabulated," and
presented a tabulation of the strings cross-compared.
Are these taken from your significant word string tabulations? If so, would you explain what you believe to be the criteria to place a word string into that list (since, obviously, I would need to construct a similar list for a competing study).
Here are some rough criteria for "matching" or "shared" strings:
1. include all shared strings of four or more contiguous, sequential words
2. include any broken string of four or more words (if only a small break)
3. include a few dozen randomly chosen three-word strings with information
4. include a handful of two-word strings with practically unique information
5. ignore plural noun forms (include singular and plural forms as identical)
6. do not include homonyms
7. count overlapping word-strings individually
You can determine for yourself what "information" might be -- here is
my example: "a profound sleep" = information; "and it was" = no information
You can determine for yourself what practically unique information might be:
my example: "raging deep" = information; "but then" = no information.
You can construct your own criteria, so long as what you come up with
roughly parallels what I came up with.
The goal is to map a selection of Spalding's known phraseology across
the entire Book of Mormon, and determine whether the resulting
distribution is uniform or clustered -- whether there is a wide range
between lowest count per page and highest count per page, etc.
If you only map out the unbroken strings of four or more words,
then many, many of the Book of Mormon pages tabulated will compute at "zero."
If you include every single three-word string, many pages will be
so covered with shared phraseology as to be almost uncountable,
in all of their overlapping, etc.
I count two overlapping four-word strings (which, say, happen to
form a single five-word string on a page) as TWO strings and not as ONE,
And, like I said, charting out ALL of the possible three-word strings
creates a mess.
Go ahead and see what your results might be. So long as you apply
your selection of language consistently and uniformly across the
entire Book of Mormon, the results will be informative -- even if you
choose a different selection of tabulated strings from my own.
By the way, what did you think of my essay in the latest Farms Journal?
Will let you know when I finish my reading for this month --
UD