Floyd Rudmin, Professor of Psychology at the University of Tromso (Norway), provides an interesting argument for why the NSA’s large-scale data mining operations are extremely unlikely to uncover terrorists, unless we assume a very high base rate of terrorists. (Hat tip to Bruce Schneier.) According to Rudmin:

The US Census shows that there are about 300 million people living in the USA. Suppose that there are 1,000 terrorists there as well, which is probably a high estimate. The base-rate would be 1 terrorist per 300,000 people. In percentages, that is .00033%, which is way less than 1%. Suppose that NSA surveillance has an accuracy rate of .40, which means that 40% of real terrorists in the USA will be identified by NSA’s monitoring of everyone’s email and phone calls. This is probably a high estimate, considering that terrorists are doing their best to avoid detection. There is no evidence thus far that NSA has been so successful at finding terrorists. And suppose NSA’s misidentification rate is .0001, which means that .01% of innocent people will be misidentified as terrorists, at least until they are investigated, detained and interrogated. Note that .01% of the US population is 30,000 people. With these suppositions, then the probability that people are terrorists given that NSA’s system of surveillance identifies them as terrorists is only p=0.0132, which is near zero, very far from one. Ergo, NSA’s surveillance system is useless for finding terrorists.

If the odds are that low, why would anyone engage in what can only be called an exercise in futility? Leaving possible non-security related goals aside (e.g., establishing a surveillance infrastructure to monitor various groups of non-terrorist undesirables), Rudmin gives the following answer:

Mass surveillance of the entire population is logically sensible only if there is a higher base-rate. … The whole NSA domestic spying program will seem to work well, will seem logical and possible, if you are paranoid. Instead of presuming there are 1,000 terrorists in the USA, presume there are 1 million terrorists. Americans have gone paranoid before, for example, during the McCarthyism era of the 1950s. Imagining a million terrorists in America puts the base-rate at .00333, and now the probability that a person is a terrorist given that NSA’s system identifies them is p=.99, which is near certainty. But only if you are paranoid. If NSA’s surveillance requires a presumption of a million terrorists, and if in fact there are only 100 or only 10, then a lot of innocent people are going to be misidentified and confidently mislabeled as terrorists.

Rudmin’s theory fits the known facts surprisingly well, without requiring us to assume any form of large-scale conspiracy or “Wag the Dog”-type motivations. Institutions charged with protecting the country, any country, from attack have a long history of overestimating threats. Risk perception is largely shaped by someone’s social and institutional environment and only loosely correlated to “actual risk,” assuming that there is such a thing as an objective measure of risk. So it is not unreasonable to assume that the NSA is “paranoid,” in Rudmin’s terms, or as a result of cognitive dissonance avoidance, is likely to assume a higher-than-warranted base-rate. At the same time, we are seeing increased evidence of innocent people getting swept up in the NSA’s surveillance efforts. In other words, the government’s approach is over-inclusive and leads to false positives, which is precisely what we would expect as a result of a higher-than-warranted base-rate.

There are a couple of important issues, however, that Rudmin’s analysis doesn’t address. First, is it necessarily true that 30,000 false positives make a surveillance system “useless for finding terrorists?” Comparing that to a situation that I know from experience, large-scale electronic discovery, a document review platform that reliably provided me with 30,000 potentially responsive documents out of a universe of 300 million would be far from useless. For a first-pass review, that’s actually very good. What really counts is what happens next, that is, how the government goes about conducting its further investigation of the 30,000 “potential terrorists.” If those can be whittled down to, say, 3,000 fairly quickly and with minimally intrusive means (e.g., research of additional publicly available or pre-existing, lawfully obtained government records), then we’re probably looking at a fairly effective system. Of course, effectiveness alone doesn’t answer the underlying normative question whether the loss in civil liberties from scanning the records of an entire population is worth the gain of 3,000 leads. It doesn’t answer the question if those leads could not have been obtained by less intrusive and more targeted means, in particular considering the potential for abuse that is inherent in large-scale, vacuum-cleaner surveillance. I understand that these considerations are beyond the scope of Rudmin’s article, but their omission makes his claim less persuasive.

Technorati Tags: , ,

License

This work is published under a Creative Commons Attribution-Noncommercial 2.5 License.


One Response to “Data Mining, Terrorism, Statistics, and Paranoia”  

  1. 1 Floyd Rudmin

    Good evening,

    I just noticed your response to my essay. Thank you for the critical commentary.

    There is a big difference between a document search and spy search. For one, you know some certain things about the document. For example, key words, dates, etc.. You know that it exists. Second, the document is not actively trying to be not-detected. In fact, there are no certainties about finding terrorists, even whether or not there are any to be discovered. And terrorists that might exist are actively trying not to be discovered.

    The secondary search of the 30,000, in fact, is already included in what NSA does. No matter how complex and multi-staged is their identification system, it is still a system that is subject to Baye’s Theorem.

    There is, also, of course, the obvious fact that IF….IF…NSA could do this kind of thing well, then there is no terrorist problem. They get caught.

    Part of this problem is definitional. A terrorist can only be known for certain after the fact. So, we have the case of the young man who was talking about a dirty bomb, this is a man who knows nothing about bombs and who has no access to radioactive materials. He has been tried and imprisoned as a terrorist. But the reality is that he is a young man with a big mouth, an active imagination, and a bad lawyer. We are not safer for him being imprisoned.

    While resources are being wasted on this, they are being diverted from sting operations, human resource spies, etc. Plus, our intelligence agencies lose credibality with the public, who ultimately control the budgets and continuing existence of the agencies. Public employees like NSA need to take care that their employers (the rest of us) appreciate their use of our resources. Endangering that good will is another cost to monitoring all US phones and email.

    In my opinion.

    Floyd Rudmin

Leave a Reply


*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-Spam Image