August 07, 2006

Thank AOL for bringing us this example of datamining

Readers might be mighty sick of reading how the boring non-entity SWIFT lost its data virginity in the grubby hands of the US Government. Now we have a change in melody, but the beat remains the same.

AOL released 20 million randomised searches, indexed to 650,000 users, from its Google-rebranded search front-end as an experiment to aid researchers. Unfortunately for them, the bloggers got hold and started to research:

....someone typed in "borderline personality disorder" multiple times and then days later there were many queries about "men that are abused by wives." The queries seem to be coming from somewhere in Toledo, Ohio. Months later someone searched for "ohio correctional institute strkyer ohio," then for airline tickets to Detroit Wayne airport and then finally on the words "win him back."

The Internet has been a boon to those who have needed to search difficult subjects. We all know that the doctor says, "visit me," but how many of us do? The net has the answers.

What might not have been clear is that the net has your questions, too. How easy is it to misconstrue dangerous search requests? Well, one could argue that if one is using the net, and not asking a human, there is a good reason. Plenty of room for misinterpretation, we can assume.

Sometimes it is clearer:

Check out the search history for user 17556639, most recent search is at the bottom of the list.. Does this look like the search history of a user wanting to do something bad?

17556639 how to kill your wife
17556639 how to kill your wife
17556639 wife killer
17556639 how to kill a wife

We all want to know that from time to time, but mostly we don't write down those spur of the moment thoughts. User 17556639, would you come quietly with us, please?

The primary point here being that this data is now permanently breached. Once breached, it will be shared. And datamined. Once datamined, expect surprising results, visits by surprising people and surprising levels of abuse.

Got governance? AOL does not, placing it in firm company with the US government. According to today's earlier post, expect firings & hirings to soar at AOL, and conspiracy theorists will suggest that the USG suggested the research angle to the witless at AOL after the subpoena debacle earlier in the year.

Thanks to Dani for heads-up.

Posted by iang at August 7, 2006 08:21 PM | TrackBack

Add to this the ease with which anyone with a decent legal budget can get anyone else's personal data from the ISP in the U. S. of A., and contemplate the consequences:

"And these folks are telling me not to pick my nose!", as the bad kid (always having THAT on his mind) exclaimed in the classic Russian joke.

PS: Unfortunately, the original joke is impossible to translate because of several layers of cultural references, but the dear readers are welcome to invent English jokes with this punchline themselves.

Posted by: Daniel A. Nagy at August 8, 2006 10:00 AM

AOL apologises over search data 'screw-up'
AOL says privacy breach was a mistake
AOL: Breach of Privacy Was a Mistake
One More Missing Computer with Military Vet Data (security breach)
AOL release of users' search histories called a privacy breach
AOL Data Spill Threatens AOLusers With Extinction
AOL apologizes for exposing search data,289142,sid14_gci1208972,00.html
AOL's disturbing glimpse into users' lives
AOL's disturbing glimpse into users' lives
AOL Removes Search Data on Group of Web Users

Posted by: Lynngram at August 9, 2006 09:20 AM

Copied from:

Greetings. I've written and spoken many times about the sensitivity of search engine query data. We all know about Google's stance in DOJ vs. Google early this year, where Google wisely attempted (for several reasons) to prevent release of such data to a government fishing expedition related to "child protection" legislation. We also know that Gonzales, et al. are merrily pushing mandated data retention laws -- again mainly in the name of child protection -- that would leave Internet users vulnerable to all manner of unreasonable surveillance of their Internet activities. All of this is already enough to be sounding alarm bells regarding the lack of reasonable legislated protections for such data.

The AOL action in releasing the search records of a reported 500K AOL users -- assuming it took place as outlined below -- is probably the most egregious violation of users' search privacy in the history of the Internet, despite the half-hearted attempt at crude anonymization. The unbelievable lack of responsibility or good judgment shown by AOL in this case should be enough to cause any remaining AOL subscribers (or users of their free services) to strongly consider ceasing any further contact with AOL.

Furthermore, we need to accept the fact that search query data is incredibly sensitive and often contains extremely personal data that does not lose its potential for abuse via simplistic forms of anonymization. Nor can we necessarily depend indefinitely on some individual search engines' honest and praiseworthy desires to protect such data (e.g. Google) in the face of intense competition and intrusive government actions.

Search query data can contain the sum total of our work, interests, associations, desires, dreams, fantasies, and even darkest fears.

We must demand that this data be protected.


Posted by: Risks - Lauren Weinstein at August 9, 2006 11:27 AM
Post a comment

Remember personal info?

Hit preview to see your comment as it would be displayed.