I've changed my mind. A couple of days ago I was convinced that it is very bad to multiply likelihood ratios together but boy, was I wrong. I now think not only is it okay to multiply them together, it's probably a really good idea to, and specifically for the reasons I had issues with before: in order for more generic pieces of evidence to overpower a "smoking gun". My impulse was that something has to be wrong with the assumptions that allows for victory by parallel-mania: here is a smoking gun in the case against the Book of Mormon, but then lo, here are six weak parallels that overcome it. several great points were made against that thinking and I got very close to conversion, but two additional points clicked into place when I went back to read about LRs yesterday.
Point 1: The wording of articles I'd read weeks ago using "sensitivity / specificity" language and then the Kass paper being abstract as it is, clouded the obvious that all these factors are, are literal "odds" calculations, so why wouldn't you multiply them? When I did a fresh Google yesterday, I got lucky: the article that came up was as simple as it could be put. A Bayes factor of 2.5 is a 2.5/1+2.5 = .71. So yeah, I feel kind of dumb about that but better to just come clean and start over. If I can do it, Dales, so can you.
Point 2: I targeted this one, given Physics Guy bringing up errors. Many of the introductory articles to medical test LRs either don't mention or gloss over statistical significance. So I hunted down articles with "likelihood ratio" and "confidence interval". Now it gets a little more interesting. Here's a good one:
https://pdfs.semanticscholar.org/8ae8/b ... b5203c.pdfWords of caution about LRs:
Second, although test results at the extremes of the distribution provide the greatest diagnostic information, because they have the most extreme likelihood ratios, the estimates of likelihood ratios at these extreme val-ues are very imprecise (ie, they have wide confidence intervals) due to the sparse data at the extremes.12,13 Although the point estimate for the likelihood ratio for>40,000 WBCs is 25.0, the 95% confidence interval is 2.4 to 257.2 (Table 4).
A good calculator for those interested in LRs with sample sizes considered:
http://www.sample-size.net/sample-size- ... ood-ratio/That was a paradigm shift to my thinking. Instead of thinking of strong evidence in terms of how much weak evidence would it take to beat the strong evidence, the real constraint is how deep can you go into the tail before your LR is statistically meaningless. Given sample sizes that I suppose are the norm in medical testing, it is what it is. Maybe sometimes you'll be lucky and have a 50 that's usable, but a couple of 10s with good confidence (if not a single 10) are generally going to be better.
So thinking about the Dale's own exemplar use case, medical testing, what does their project have in common? I'm trying to envision a medical example that works according to the Dale's numbers. In real medical testing, there seems to be an array of factors for a diagnosis, with precious few at the extremes. On the face of it, the Dale's have this weird case where they have several "smoking guns" on either side of the equation, blasting away at each other like in a Yosemite Sam cartoon and the guy who had one more bullet wins. Each piece of evidence tilts the balance to the 98th percentile, and two together brings it to three nines, and it's only getting started.
Well, first off, sure, nobody knows anything about the model and the Dales can say whatever they want. Perhaps it just so happens that this diagnosis does have several 50s. But then, given questionable confidence intervals, where does that leave their analysis? what happens if you multiply a mere 25 that could be anywhere between 2.5 and 250 (see above) by another 25 with the same problem? And then divide by the counter-evidence with a couple more ambiguous 25s?
A possibility is that the Book of Mormon is a "deep-tail" book, unlike cases in medical testing, due to an apologist's ability to imagine the model is whatever helps win an argument with a critic in that moment. Well, three nines in medical testing might be way out there, but in circuit board failures or something like that, perhaps you can go deep into a tail. But, supposing you can, expectations would be raised, and certainly, you'd raise the standard for strong evidence. If you can get a 150 with confidence then maybe that means a 10 is no longer very strong at all. And so best case scenario for the Dales, is that what they mean by "strong evidence" is really moderate evidence such that it still has a tight CI and we can feel good about multiplying together.
Obviously, that means there is stronger evidence artificially blocked off, and it does matter, as maybe there is a 74.8 out there with a pretty good CI. But I think there are a couple other points. First of all, Billy Shears helpfully did some LRs, here is just the denominator for city of Laman:
Let’s assume that this strong tendency is 10%. In other words, there is a 10% probability that the consonants of cities from Book of Mormon times would survive the way the city Laman did. If that is the case, what is the probability that only one Lehite city (Laman) exhibited this “strong tendency”? If there are 100 named Book of Mormon cities and the probability of a name sticking is 10%, then we would expect that 10 Mayan cities would have names that could be traced back to their true Book of Mormon historical roots. The probability that only 1 does is about 0.13% (this was calculated by approximating the binomial distribution with a normal distribution).
So, dividing the probability of the evidence assuming the hypothesis is true by the probability of the evidence assuming the hypothesis is false is the likelihood ratio, which for this point of evidence is .22/.0013 = 170
To get a 170, you got to go pretty deep into that tail. And sure enough, in this case, it doesn't look like the deep-tail hypothesis can be raised as we're talking about a single city with the D- result, which is going to make for a bad CI. If only the Nephites had 10,000 cities, then if 1% of those preserved ***, we could say that by falling short of 1000 -- 10% of them, that the resulting 170 in favor of a fictional Book of Mormon means something. Supposing this LR isn't a complete outlier, then I think we'll find that all the very strong evidence comes with a big question mark, and multiplying it out -- I can scarcely imagine that being valid, even if they are independent, but that's just a guess as unfortunately, the papers I've come across don't deal with weird hypothetical examples. But we do have stats people on board who might have an opinion...; )