It seems like a large part of the ruling hinges on the fact that Google matched the image hash to a hash of a known child pornography image, but didn't require an employee to actually look at that image before reporting it to the police. If they had visually confirmed it was the image they suspected it was based on the hash then no warrant would have been required, but the judge reads that the image hash match is not equivalent to a visual confirmation of the image. Maybe there's some slight doubt in whether or not the image could be a hash collision, which depends on the hash method. It may be incredibly unlikely (near impossible?) for any hash collision depending on the specific hash strategy.
I think it would obviously be less than ideal for Google to require an employee visually inspect child pornography identified by image hash before informing a legal authority like the police. So it seems more likely that the remedy to this situation would be for the police to obtain a warrant after getting the tip but before requesting the raw data from Google.
Would the image hash match qualify as probable cause enough for a warrant? On page 4 the judge stops short of setting precedence on whether it would have or not. Seems likely that it would be a solid probable cause to me, but sometimes judges or courts have a unique interpretation of technology that I don't always share, and leaving it open to individual interpretation can lead to conflicting results.
The hashes involved in stuff like this, as with copyright auto-matching, are perceptual hashes (https://en.wikipedia.org/wiki/Perceptual_hashing), not cryptographic hashes. False matches are common enough that perceptual hashing attacks are already a thing in use to manipulate search engine results (see the example in random paper on the subject https://gangw.cs.illinois.edu/PHashing.pdf).
It seems like that is very relevant information that was not considered by the court. If this was a cryptographic hash I would say with high confidence that this is the same image and so Google examined it - there is a small chance that some unrelated file (which might not even be a picture) matches but odds are the universe will end before that happens and so the courts can consider it the same image for search purposes. However because there are many false positive cases there is reasonable odds that the image is legal and so a higher standard for search is needed - a warrant.
>so the courts can consider it the same image for search purposes
An important part of the ruling seems to be that neither Google nor the police had the original image or any information about it, so the police viewing the image gave them more information than Google matching the hash gave Google: for example, consider how the suspect being in the image would have changed the case, or what might happen if the image turned out not to be CSAM, but showed the suspect storing drugs somewhere, or was even, somehow, something entirely legal but embarrassing to the suspect. This isn't changed by the type of hash.
It shouldn't. Google hasn't otherwise seen the image, so the employee couldn't have witnessed a crime. There are reportedly many perfectly legal images that end up in these almost perfectly unaccountable databases.
That makes sense - if they were using a cryptographic hash then people could get around it by making tiny changes to the file. I’ve used some reverse image search tools, which use perceptual hashing under the hood, to find the original source for art that gets shared without attribution (saucenao pretty solid). They’re good, but they definitely have false positives.
Now you’ve got me interested in what’s going on under the hood, lol. It’s probably like any other statistical model: you can decrease your false negatives (images people have cropped or added watermarks/text to), but at the cost of increased false positives.
Rather simple methods are surprisingly effective [1]. There's sure to be more NN fanciness nowadays (like Apple's proposed NeuralHash), but I've used the algorithms described by [1] to great effect in the not-too-distant past. The HN discussion linked in that article is also worth a read.
This submission is the first I've heard of the concept. Are there OSS implementations available? Could I use this, say, to deduplicate resized or re-jpg-compressed images?
The hash functions used for these purposes are usually not cryptographic hashes. They are "perceptual hashes" that allows for approximate matches (e.g. if the image has been scaled or brightness-adjusted). https://en.wikipedia.org/wiki/Perceptual_hashing
> Maybe there's some slight doubt in whether or not the image could be a hash collision, which depends on the hash method. It may be incredibly unlikely (near impossible?) for any hash collision depending on the specific hash strategy.
If it was a cryptographic hash (apparently not), this mathematical near-certainty is necessary but not sufficient. Like cryptography used for confidentiality or integrity, the math doesn't at all guarantee the outcome; the implementation is the most important factor.
Each entry in the illegal hash database, for example, relies on some person characterizing the original image as illegal - there is no mathematical formula for defining illegal images - and that characterization could be inaccurate. It also relies on the database's integrity, the user's application and its implementation, even the hash calculator. People on HN can imagine lots of things that could go wrong.
If I were a judge, I'd just want to know if someone witnessed CP or not. It might be unpleasant but we're talking about arresting someone for CP, which even sans conviction can be highly traumatic (including time in jail, waiting for bail or trial, as a ~child molestor) and destroy people's lives and reputations. Do you fancy appearing at a bail hearing about your CP charge, even if you are innocent? 'Kids, I have something to tell you ...'; 'Boss, I can't work for a couple weeks because ...'.
It seems like there just needs to be case law about the qualifications of an image hash in order to be counted as probable cause for a warrant. Of course you could make an image hash be arbitrarily good or bad.
I am not at all opposed to any of this "get a damn warrant" pushback from judges.
I am also not at all opposed to Google searching it's cloud storage for this kind of content. There are a lot of things I would mind a cloud provider going on fishing expeditions to find potentially illegal activity, but this I am fine with.
I do strongly object to companies searching content for illegal activity on devices in my possession absent probable cause and a warrant (that they would have to get in a way other than searching my device). Likewise I object to the pervasive and mostly invisible delivery to the cloud of nearly everything I do on devices I possess.
In other words, I want custody of my stuff and for the physical possession of my stuff to be protected by the 4th amendment and not subject to corporate search either. Things that I willingly give to cloud providers that they have custody of I am fine with the cloud provider doing limited searches and the necessary reporting to authorities. The line is who actually has the bits present on a thing they hold.
I think if the hashes were made available to the public, we should just flood the internet with matching but completely innocuous images so they can no longer be used to justify a search
I think it would obviously be less than ideal for Google to require an employee visually inspect child pornography identified by image hash before informing a legal authority like the police. So it seems more likely that the remedy to this situation would be for the police to obtain a warrant after getting the tip but before requesting the raw data from Google.
Would the image hash match qualify as probable cause enough for a warrant? On page 4 the judge stops short of setting precedence on whether it would have or not. Seems likely that it would be a solid probable cause to me, but sometimes judges or courts have a unique interpretation of technology that I don't always share, and leaving it open to individual interpretation can lead to conflicting results.