Monday, December 11, 2006

Data Mining Can't Improve Our Security

December 8, 2006

by Jim Harper

Jim Harper is director of Information Policy Studies and the author of Identity Crisis: How Identification is Overused and Misunderstood. He is coauthor of the forthcoming Cato policy analysis, "Effective Counterterrorism and the Limited Role of Predictive Data Mining."

When the Department of Homeland Security put into effect its Automated Targeting System this week, it added to a growing list of programs that use information about ordinary Americans to seek after terrorists. An outgrowth of systems used to track cargo, ATS now assigns a "risk score" to Americans crossing the border, using data about them from a wide variety of databases.

ATS appears to use data mining to single out people as suspected terrorists or criminals. If data mining worked to catch terrorists, a program like ATS would deserve widespread endorsement. Unfortunately, data mining does not have this capability.

Data mining is a technique for extracting knowledge from large sets of data. Scientists, marketers and other researchers use it successfully to identify patterns and accurate generalizations when they do not have or do not need specific leads.

For example, 1-800-FLOWERS has used data mining to distinguish among customers who generally only buy flowers once a year — on Valentine's Day — and those who might purchase bouquets and gifts year-round. It markets to the first group less often, and to the second group more often. With thousands of customers to study, their researchers get useful information from data mining.

However, despite the investment of billions of dollars and unparalleled access to U.S. consumer behavior data, the direct marketing industry achieves response rates ranging from 5.78 percent for telephone solicitation to 0.04 percent for direct response television. Marketers do not know which potential customers will come to a new store, much less what they will buy. Data mining cannot predict such specific information.

Data mining for terrorism prediction has two fundamental flaws:

— First, terrorist acts and their precursors are too rare in our society for there to be patterns to find. There simply is no nugget of information to mine.

— Second, the lack of suitable patterns means that any algorithm used to turn up supposedly suspicious behavior or suspicious people will yield so many false positives as to make it useless. A list of potential terror suspects generated from pattern analysis would not be sufficiently targeted to justify investigating people on the list.

In a paper to be issued by the Cato Institute on Monday, Jeff Jonas, the founder of data analysis firm Systems Research and Development, and I write that using data mining in an attempt to find terrorists would waste national security resources and threaten the privacy and civil liberties of the thousands of innocents whose lawful activities coincide with a purported terror pattern.

Data mining may be useful for targeting common crimes, about which there is enough information to develop relatively accurate patterns. It may reveal potential identity fraud or credit card fraud. A certain transaction pattern might justify, for example, a credit card company calling a customer to see if she made a certain purchase. But whether data mining can be used to give government authorities reasonable suspicion of ordinary crime is fraught with difficulty. One thing is certain: They won't catch terrorists this way.

Nearly a year ago, The New York Times revealed that the National Security Agency was monitoring the international phone calls of Americans. During the course of the year, additional revelations emerged of telephone companies providing domestic call logs to the NSA and a European financial network providing records of Americans' financial transactions to the Treasury Department.

In October, National Journal reported that much of the Total Information Awareness program that Congress explicitly had rejected had reconstituted itself under the name Tangram. We take as a given that the government officials involved are people of good faith trying to protect the country from terrorists. But Americans are right to be concerned that large storehouses of data about their lives are being used without the benefit of a clear legal structure and robust oversight.

They also are right to worry that our national security services might be wasting time and money on data mining, rather than employing effective counter-terrorism methods that are known to work.

The 9/11 Commission report showed how investigators following leads and using traditional investigative techniques could have foiled al-Qaida's plans, although hindsight is 20/20. Had anyone in the national security bureaucracy known the devastating consequences the attacks would have, they would have had the focus to prevent them. That this did not happen is not an indictment of traditional investigative techniques, nor does it call for using data mining on problems it can't solve.

Unfortunately, there is no magic bullet that solves the security conundrums created by terrorism. Data mining is a useful technique in many areas, but not this one.

This article appeared in the St. Louis Post-Dispatch online on December 7, 2006.

No comments: