The SOPA Debate and Congress's Understanding of Child Porn

memset · on Jan 11, 2012

I really enjoyed reading this perspective. It reminds me of something Feynman described (can't find the link), wherein some military general asked scientists to develop a tank which would convert rocks and sand into fuel as it drove across the terrain. And Babbage before that - "I cannot comprehend the confusion of ideas..."

If we want to frame SOPA in the context of these kinds of incidents, one optimistic view would be that, as has happened in the past, science - if we can agree that determining whether or not a copyrighted work is stolen - will prevail.

Cpt_Monac · on Jan 11, 2012

Could not agree more. (Well in this case the problem is solvable unlike cryogenics or turning sand into fuel). Btw, the Feynman story you referred to is from "Surely, you're Joking, Mr. Feynman", interesting stories about Feynman's adventures: "One guy in a uniform came to me and told me that the army was glad that physicists were advising the military because it had a lot of problems. One of the problems was that tanks use up their fuel very quickly and thus can't go very far. So the question was how to refuel them as they're going along. Now this guy had the idea that, since the physicists can get energy out of uranium, could I work out a way in which we could use silicon dioxide--sand, dirt--as a fuel? If that were possible, then all this tank would have to do would be to have a little scoop underneath, and as it goes along, it would pick up the dirt and use it for fuel! He thought that was a great idea, and that all I had to do was to work out the details. That was the kind of problem I thought we would be talking about in the meeting the next day. "

danso · on Jan 11, 2012

Great anecdote...I'm ashamed I didn't fit it in even though I read it recently. It's from his memoir, "Surely You're Joking Mr. Feynman"

http://goo.gl/KjW6V

asmosoinio · on Jan 12, 2012

Please don't use link shorteners in HN. Full link:

http://books.google.fi/books?id=7papZR4oVssC&pg=PA289...

bitwize · on Jan 11, 2012

In a more physical sense, this is like detecting a machine that can determine from a photograph of your handbag whether it’s a cheap knockoff and whether or not you actually own that bag – as opposed to having stolen it, or having bought it from someone who did steal it.

Congressman: "That's no excuse. I've seen CSI. You can just go 'enhance, enhance, enhance' and find out anything about it."

wazoox · on Jan 11, 2012

I think the quoted handbag example is a great, understandable analogy, that should be taught to everyone without the necessary technical understanding. It makes obvious how ridiculous the whole project is.

mindslight · on Jan 11, 2012

The other major difference between the two (which today's politicians are incapable of understanding) is that most everyone disapproves of child pornography. Their laws criminalizing it do not make it go away - they are but a tool to enforce the general societal view that child pornography is despicable. On the other hand, copyright is a government edict that very few people care about. You can't legislate societal values - good laws merely codify them.

jopt · on Jan 11, 2012

This ties back to why opponents of copyright as such call it "imaginary property:" there is nothing about the copy that makes it any worse than the original. We can't recognize a copy because then it isn't a copy, in the information theory sense.

Meanwhile, in the physical world, a knock-off handbag can be easily identified as such by airport personnel. This is the world these pro-PIPA politicians live in. Child porn, online as offline, is inherent in the work. You don't need supplementary information to decide if something is child porn. Online, the difference between a copy and the original is in some people's minds, not anchored in reality.

rcthompson · on Jan 11, 2012

Good point. The misunderstanding is rooted in a fundamental difference between physical objects and ethereal "content".

kpanghmc · on Jan 11, 2012

"But we have the technology, Google has the technology, we have the brainpower in this country, we certainly can figure it out."

Even if this were true, just because Google has the capability to do this doesn't mean every company does. If it costs Google $X million to prevent online piracy on their sites, is it reasonable to expect every web startup to do the same?

mgkimsal · on Jan 11, 2012

SOPA backers can't be responsible for every undercapitalized entrepreneur in America.

WildUtah · on Jan 12, 2012

For those who missed the joke, sixth paragraph:

http://www.pittsburghlive.com/x/pittsburghtrib/opinion/colum...

asmosoinio · on Jan 12, 2012

For the lazy:

---- "I can't be responsible for every undercapitalized entrepreneur in America," Mrs. Clinton said in 1993, responding to charges that her plan would bankrupt businesses and cut employment.

mgkimsal · on Jan 12, 2012

Anyone downvoting this is probably not old enough to remember Hillary Clinton: https://www.google.com/search?q=hillary+clinton+undercapital...

nitrogen · on Jan 11, 2012

Every bootstrapping entrepreneur in America can't be responsible for Hollywood.

ericd · on Jan 11, 2012

They are responsible for fostering a vital economy. Since undercapitalized entrepreneurs are the source of many of the US's great companies, congress is responsible for making it so that they continue to be able to operate efficiently.

thebigshane · on Jan 12, 2012

What `ercid` said isn't wrong. I don't understand why he should be down-voted for missing subtle sarcasm.

ericd · on Jan 13, 2012

Ah shoot, thanks. I didn't read very carefully :-\ Apologies to parent for my misplaced blowhardiness. There's a reason sarcasm is discouraged on HN, though.

lomegor · on Jan 11, 2012

Wow! Really good unbiased article on why some people support SOPA/PROTECT-IP. It's hard to find these things these days when everyone is accusing the government of being run by corporations.

knowtheory · on Jan 11, 2012

It's also worth noting that Dan Nguyen (pronounced "win" even) is an awesome dev who works for ProPublica and has written a book on learning ruby (http://ruby.bastardsbook.com/ ), amongst other things he's accomplished!

Jun8 · on Jan 11, 2012

"So for child porn, we are able to design a machine that is able to detect child porn. You can detect certain colors that would show up in pornography, you can detect flesh tones."

AFAIK, this is complete BS! I was quite up to date with research in this field till about 5yrs ago and the systems are still in their infancy. If you put an automated system to automatically detect pornography, the false alarm rates would be very high. And differentiating child porn from regular porn is also extremely hard.

JamesVI · on Jan 12, 2012

Yes, this is complete BS. The backbone of all systems currently in use to block URLs leading to child abuse images is the Internet Watch Foundation block list, which is manually compiled by a small team of police-trained analysts.

Companies providing the actual blocking services don't want to even see the URLs on the list (going so far as to ensure that it is encrypted at all times until the URLs can be safely hashed). They certainly don't want to download the contents and have them sitting (even temporarily) on a server somewhere.

danso · on Jan 11, 2012

I think Google must have something to filter out any kind of mostly exposed-skin in static images in order to do its safesearch filter, no?

As for telling which is child porn and which isn't...given the ways that people usually distribute this material (not through public webpages), it's doubtful that Google really ahs to do that much proactive monitoring of it to keep it out of the usual Google-pointed-to places

Jun8 · on Jan 11, 2012

I think that most filtering for safesearch is powered by textual analysis and analyzing site links.

thebigshane · on Jan 11, 2012

I haven't tried it but looks fairly effective: http://www.patrick-wied.at/static/nudejs/

burgerbrain · on Jan 11, 2012

The author seems to say that has a detection rate of 60%.

georgieporgie · on Jan 12, 2012

Yeah, I was wondering where that came from. The quote was there, and the author seemed to just roll with it.

I used to work for a company that did high-demand image hosting. Filtering porn was a big deal at the time. It was considered to be of such low accuracy as to be useless. I can't imagine how one comes to the conclusion that a sexually exploited child can be located in an image by a machine.

pfraze · on Jan 11, 2012

Actually, how hard would it be, really? All you'd need to do is create a central registry of content-owners and licensed content-users, build Shazam-like content-detection, then force everybody to operate on that infrastructure.

Please don't do that, but how hard would it be?

SoftwareMaven · on Jan 11, 2012

Don't forget to ID every user on every access so you can see what permission they have. Without any regard to requiring your papers to traverse the Internet, I can't imagine how difficult that would be to scale. And if it goes wrong, the whole Internet goes down.

pfraze · on Jan 11, 2012

Nah, you just need it for content uploaders and copyright holders. No need to implement a global permissioning system.

burgerbrain · on Jan 11, 2012

In case you haven't been paying attention for the past decade or so, "uploaders" are now the general population. Everyone uploads things.

Natsu · on Jan 11, 2012

I get the feeling that a lot of people have no idea how bittorrent works.

burgerbrain · on Jan 12, 2012

Yes. Or simply just posting comments online.

pfraze · on Jan 12, 2012

It's not that I'm unaware of peer-to-peer sharing, or the fact that uploading is inherently part of browsing. My thinking is, most content hosts are published to be found. An indexing process could find published content, and laws could enforce findability.

It's about as effective as our existing regulatory systems. For instance, a lot of commerce is taxed and regulated, but you could just walk up to somebody and make exchange. Never air-tight, but it is enforcement.

Is my suggestion wise? I definitely don't think so, but it is feasible. And maybe I'm not considering something correctly -- but, please, don't be rude about my technical prowess. I'm just trying to strengthen our argument.

SoftwareMaven · on Jan 12, 2012

That's not true. The problem is, for any given file online, I may have permission from the owner to access it, but you may not. I shouldn't be prohibited from accessing it just because you don't have access, so the central service needs to know who is accessing it and know their rights to said file.

Without that, there is no way to tell if a link is really legal or not. And that is what Google was saying is impossible to handle.

warfangle · on Jan 11, 2012

Given that any content ever must be recorded in said registry, and you must register (and probably pay a fee!) before ever publishing anything. Including forum posts.

Doubtful. Scary, but doubtful.

nkassis · on Jan 11, 2012

What follows is a just a completely hypothetical idea that shouldn't be taken seriously ;p

Let's say we went from "everything has copyright" to "you must register with the copyright office to get a copyright". Now it does have a lot of drawbacks but for most things like this post on HN I could care less that it's copied elsewhere yet currently there is a copyright on this post.

Now in a system like that what would prevent the copyright office to make publicly available a database of hashes for copyrighted content like videos,music,text that would allow people from Google and elsewhere to make a resonable assumption about licensing? This DB could contain a list of urls that are allowed to host the said content or offer more open terms.

In a way google already started such a DB for youtube but it's prone to abuse and contesting copyright on there isn't exactly simple to do.(as seen from the megaupload thing)

anamax · on Jan 11, 2012

> publicly available a database of hashes for copyrighted content

What about "derived work"? It's relatively easy to make a "copy" that displays/plays the same but has a completely different hash. In fact, it's very difficult to get the same hash on a excerpt.

And then there's fair-use.

nkassis · on Jan 11, 2012

All valid points but then youtube is doing something that allows them to detect even derived work albeit they do have access to the full video so that might not work for everyone. At the very least with a registration based system it's clearer who "owns" what.

Natsu · on Jan 11, 2012

That's how copyright used to work, but they got rid of that after the Berne convention. Also, Google has some very fancy tech; it's not something just anybody could make work. And even they don't have it working all that great.

And then there's the matter of who has to bear the cost of creating and placing the filters, sites outside our jurisdiction that don't bother with those expensive filters, etc. You end up hobbling the US tech industry for no reason, putting everyone at a competitive disadvantage.

danso · on Jan 11, 2012

Well...there's a lot of reasons...but let's shift focus from music/movies to something like photos. You can imagine how arduous it would be to maintain a database like that that even if possible from a technical standpoint, would not be an undue burden upon photo sellers/buyers. Plus, a one-stop centralized database of everyone who owns everything from "Casablanca" to "Debbie Does Dallas"? That's something that might give the database-fearing crowd some pause.

pfraze · on Jan 11, 2012

A massive search index of the open Internet still gives me pause, but there it is!

Natsu · on Jan 11, 2012

The endpoints of the internet are not secure. Worse, pretty much all of our wireless protocols are broken right now, so a determined person can hack almost any wireless access point. There was a nice story about that on HN a while ago.

Right now, there's a lot less incentive to do all of that. But change to a system like this and people will just start trading in stolen accounts, much like they already trade in stolen credit cards.

Bobby_Tables · on Jan 11, 2012

What annoys me to no end is that the counter-argument to the "Google is smart enough to build it" canard is legal, not technological. Child porn is illegal in this country, full stop. If it exists, it's prosecutable. An MP3 of a Justin Bieber song isn't necessarily illegal, and determining whether a particular copy is legal requires information that is not readily available for financial security reasons. So nobody can build a detector, regardless of how smart they are.

Which is exactly the point of SOPA...since we can't stop the flow of copyrighted media on the internet, we can't have the internet.

chairface · on Jan 11, 2012

This is just a friendly suggestion to edit that second sentence - it threw me for a bit.

Bobby_Tables · on Jan 11, 2012

GOOD CALL. This is why I should not post on HN while waiting for my tests to run.

lutorm · on Jan 11, 2012

Except that, like someone pointed out, porn is not necessarily illegal and determining whether it is requires knowing the age of those appearing in it. This information is also not readily available. (Of course, just like it's pretty obvious in certain instances, it's pretty obvious that a full movie available for download is illegal, too.) So the cases aren't that different.

Natsu · on Jan 12, 2012

The information about how old a person looks exists in the image itself. You can compare the size and proportions of a person's features to known examples. No one is going to mistake a four-year-old for a grandmother. The information about whether or not someone has permission to copy something does not exist in the file: the infringing copy and the legitimate copy are bit-for-bit identical. You have to get the data on who has permission to do what from somewhere else. The Viacom v. YouTube case proved that even expensive lawyers spending hours doing research can't manage to get even 99% accuracy. They had to remove things that Viacom itself had uploaded (or authorized to be uploaded) from that case. Twice.

Now suppose we have a magic system that's better than expensive copyright lawyers that can manage 99% accuracy (which they didn't) and that we operate at internet scale on the roughly 60 billion web pages Google indexes. Then assume that no less than 20% of the internet is pirated material. Does that sound too high? Good. Your numbers will look even worse if it's lower. Now we do a little math:

  0.2 * 0.99 = 19.8% are pirate pages correctly blocked
  0.8 * 0.99 = 79.2% are innocent pages correctly allowed
  0.2 * 0.01 =  0.2% are pirate pages mistakenly allowed
  0.8 * 0.01 =  0.8% are innocent pages mistakenly blocked

So we've just censored four innocent pages for every one containing pirated material and there would be 120,000,000 pirate pages (0.2% of 60 billion) out there that aren't blocked. Feel free to work out the math, but if you want to say that 20% is too high a proportion, you'll have even more false positives, so even more innocent pages are censored for every pirate blocked.

And this assumes that we have something better than ~$300/hr copyright lawyers screening all 60 billion web pages. Any computer program we make won't even be this good.

lutorm · on Jan 13, 2012

You are doing a selective comparison.

No one is going to mistake a four-year-old for a grandmother.

True. But plenty of people will mistake a 14, 15, 16, or 17-year old for an 18-year old.

The Viacom v. YouTube case proved that even expensive lawyers spending hours doing research can't manage to get even 99% accuracy.

Again true, but the odds of a randomly selected uploaded full-length movie being illegally uploaded is, I bet, significantly larger than the odds of a randomly selected explicit movie featuring people of illegal age. So the prior coming from what fraction of available content out there is illegal according to the two standards would likely lead to a larger false positive rate for pornography.

Natsu · on Jan 16, 2012

Saying that the porn filters don't work well only damages the case for copyright filters further. If the filters don't work well when people want them to, why would they work any better when people start trying to undermine them?

phamilton · on Jan 11, 2012

From a technical standpoint:

We can verify authenticity. We can transfer authenticity. I think about BitCoin and the security of the network. Everyone on the network can verify the origin and authenticity of a transaction.

Could something similar be put in place for media? A distributed DRM essentially? While DRM in its current form is a nightmare, isn't the main drawback being tied down to certain platforms? If DRM were decentralized, I don't think I would have a problem with it.

Thoughts?

wazoox · on Jan 11, 2012

This looks like "what colour are your bits?". http://ansuz.sooke.bc.ca/entry/23

Your concept seems ridden with so many problems and possibilities or abuse that I'll let this as an exercise to the reader... (example: how do you "tag" existing files? existing CDs? whose interest would this be? would people bother? etc.)

phamilton · on Jan 11, 2012

To tag existing CDs, we've already seen that problem solved with iTunes Match.

To get people to do it, it would be a part of uploading media files. If SOPA were to pass, I imagine hosting sites would require only "signed" media. It wouldn't be perfect, but any distributed file would be signed at creation. Self signing would be simple enough if it were built into media programs. The key to all this is that nobody controls it, and the standard is open source. Once again looking at BitCoin as a role model.

Also, I don't think it's fair to tag a concept "ridden with problems and possibilities for abuse". It's a concept, not an implementation. And it was a mighty vague one at that.

wmf · on Jan 11, 2012

Yeah, this has been discussed (but not enough). If you've legally licensed something, the owner should provide you with a certificate of some sort. We already have this with those Windows holograms to some extent. It would be expensive to set up, though.

usaar333 · on Jan 12, 2012

So what Rep. Marino essentially wants is for Google to build a Shazam-like service that doesn’t just identify a song by “listening” to it, but also determines if whoever playing that song has the legal right to do so. Thus, this anti-pirate-Shazam would have to determine from the musical signature of a song such things as whether it came from an iTunes or Amazon MP3 or a CD. And not only that, it would have to determine whether or not the MP3 or CD is a legal or illegal copy.

This isn't as hard as the author is making it sound and I suspect Youtube already handles it with movie clips. With the aid of an external "copyright registry", Google could see if the fingerprint of an unknown song is close to one in the registry. Obviously, there would be many false positives and true negatives, but that is certainly true with child porn detection as well. Indeed, it wouldn't take all that much coordination to pull off; a better-funded copyright office could handle this.

I don't agree with Google/internet being forced to do this due to the burden it creates, but it isn't that hard to pull off...

wmf · on Jan 12, 2012

Especially since basically zero major label music or Hollywood movies are legally available for free, so if Google detects such a file it's presumptively infringing. (Oops, is that something you can't say?)

funthree · on Jan 12, 2012

>basically zero major label music or Hollywood movies are legally available for free

only because its not true?>

bad_user · on Jan 11, 2012

TL;DR ... If a normal person isn't able to properly classify something (good / bad), then a computer algorithm definitely cannot.

My opinion - pushing for such a legislation would make it a criminal act to err on the side of false negatives. Which means false positives will be very common.

This point even a monkey could understand. If it refuses, then it is wishing for bananas or something.

feralchimp · on Jan 11, 2012

> TL;DR ... If a normal person isn't able to properly classify something (good / bad), then a computer algorithm definitely cannot.

Technically untrue. Computers are able to correctly classify (and with good accuracy/precision) lots of things that even domain-expert humans have a tough time classifying.

One problem with those systems is that, depending on how they're trained up, humans aren't even able to pick apart how the software is making its decisions (and use that to advance the state of the relevant science, for example).

Creepy and awesome.

bad_user · on Jan 11, 2012

Examples required.

A 7 year old that can read could be trained to properly recognize spam, child porn, (allegedly) copyright infringement and also give you advice on purchases after seeing your spending habits. And a 7 year old may not have the bandwidth of a supercomputer to process hundreds of thousands of items at once, but he is able of greater accuracy and that's because the human brain is the most advanced pattern-matching processor in existence.

You can classify anything by means of statistics, sometimes with surprising results, however my point (and maybe I wasn't making myself clear) is that you'll get a lot of errors of judgment. Which is why algorithms will be trained to err on the side of false positives, because doing otherwise will put the business in jeopardy.

feralchimp · on Jan 13, 2012

I should have mentioned up front that I strongly agree with your core point about legal penalties and their effects on selection bias.

I was hoping the Wikipedia page on 'expert systems' would bail me out the examples front, but its 'disadvantages' section isn't all that clear or complete. It does touch on the issue I mentioned, though.

I think many high-frequency stock trading algorithms are examples. The trades may as well be magical, and as long as the program makes money the owner doesn't much care why.

astrodust · on Jan 12, 2012

Given the fact that there are billions of images out there and only a tiny handful are legitimately child pornography, even an exceedingly small false positive rate would put a whole lot of people in the hot seat.

How would you like it if the cops busted down your front door because you accidentally uploaded to Facebook a picture of your new baby having a bath? What if the pageant photo of your daughter was automatically classified as sufficiently pornographic? Handcuffs for everyone.

Once computers are involved, people blindly trust them, and from there there's nothing but trouble.

Google's smart. How could it ever get anything wrong?

feralchimp · on Jan 13, 2012

Sorry, I didn't mean for my comment on machine learning to be taken as any sort of argument that either:

a) it is possible to build an expert system to classify online content into 'infringing' vs. 'non-infringing'

b) were it possible to build such a system, attaching the deployment of such a system to a piece of federal legislation would be a legitimate use of government power

wallawe · on Jan 11, 2012

I think this article is empathetic in how it has shifted focus to trying to understand the view of SOPA from "their" position. This is a good thing to do, because without understanding your opponents argument you can't fully understand your own.

However, as kpanghmc pointed out, even if Google had the capability, that doesn't mean every company does. I'll take that a step further. Even if Google had the capability, the government doesn't have the right to step in and demand that a single private entity build something to stop it, even using government (taxpayer) money. This is how you know the government has gotten too big.

Furthermore, the article while trying to empathize, omits the fact that this is censorship. The banning of web pages or websites as a whole without undue process is not constitutional and un-American.

jimworm · on Jan 12, 2012

In fact, the computer program could only ever get to analyze a copy of the file, and not "the" file itself. The legality of the in-memory copy remains in a quantum superposition, simultaneously legal and illegal until a politician discovers how computers work.

In other news, the police start withdrawing money from every bank account using an ATM and analyzing the $100 bills for signs of the account holders' criminal transactions.

Joakal · on Jan 11, 2012

Demand anti-Child porn politicians to answer this anti-Internet question: What's more important, blocking rare child pornography content or allowing children to have unrestricted access to knowledge and future with the Internet?

If they wish to block child abuse, they basically admitted to wishing to kill the future of children just to hide evidence of abuse.

The ball's in their court.

CWuestefeld · on Jan 11, 2012

If this saves even one child from the degradation and abuse of pornography, then any cost is worth it.

</sarcasm>

Joakal · on Jan 12, 2012

If this ruins the ability of almost a billion children to freely access knowledge and to express themselves in the future, then I hope the cost is worth it, congressperson.

Speaking of, I wonder how many children there are in the world.

thebigshane · on Jan 11, 2012

I think I see a hole in this logic. A very well thought article and I agree with the main premise, but allow me to nitpick a crucial point...

[Warning: speculation, I do not work for Google...] Google has developed a very effective algorithm for detecting porn, not child porn. They can use flesh tone analysis for detecting nudity but algorithmic analysis of a single image to determine the age of the person in the picture seems way harder. File names don't work either: a picture of a handbag with a file name of "16yr_old_porn.jpg" is not illegal.

Battling child pornography requires a human to look at stuff. Search results get flagged by normal users (and by paid teams I'd bet). These flags are analyzed and used as patterns to detect other similar images (similar colors, sizes, names, patterns in the image, source urls, watermarks) and this secondary analysis can be done with fancy algorithms but it still requires groups of people to flag things and create these patterns.

If I'm right, then a similar method could be used to help combat infringing material. But as danwin points out, there is a new factor: to figure out if a particular piece of content is licensed or infringing, which is hard, even for a human. But there are some people who can look at a video, picture, (listen to a) song, etc, and be pretty sure that it is infringing. Clips or full videos of some popular movies on youtube are obviously infringing. Most mp3s on the web are infringing.

There would be false positives and false negatives (just like with child porn), but you could build a similar type of pattern database that could be used to find similar files. This same type of flagging/reporting could be used to identify infringing sites. I know youtube has an existing algorithm for identifying patterns in audio tracks to some database, but I don't think these patterns are based on flags, I think they are hard-coded.

Users could still browse directly to a URL (or IP) hosting questionable content but it would definitely cut down on the easy access and spread of files.

NOTE: I don't like the idea of Google (or any organization) being mandated to implement such a process but it may be better than interfering with DNS mechanisms and lawsuits and court orders. I also don't like how reliant we are on a private organization holding such power over what the world can find in their results. Nearly everyone agrees on child porn but I think one of the main reasons SOPA is controversial is because we don't all agree on the importance of fighting infringement. (Counterfeit goods maybe, but not license violations)

By no means do I claim that SOPA is a good bill. I think doing nothing is better than implementing what is in SOPA.

EDIT: An assumption made here is that we consider the way Google is fighting child pornography is effective, sufficient, and ethical. I think that is a reasonable assumption.

EDIT2: One factor I didn't consider is the quantity and diversity of child pornography versus infringing material. I'm not sure how the algorithm would be affected but it'd probably be slightly less effective.

danso · on Jan 11, 2012

I think the issue is that not only is this burdensome, but it practically invites the enforcer to err on the side of caution, leading to a chilling effect.

For those who haven't taken journalism law classes, a newspaper can print falsehoods without fear of being sued, especially about public figures. The threshold is that they cannot do it with "actual malice" or "gross negligence."...this is not the case in many other countries, including Britain.

Why not make it a suable offense to print any kind of falsehood, which are most definitely detectable by human (and probably machine) readers? Because if a newspaper -- or any media source -- had to worry about being sued because of simple mistakes of fact or, more problematically, because a whistleblower misled them -- then most newspaper businesses would err so far on the side of caution that they would not print anything useful about any public figure.

[insert ridicule for how the corporate-controlled media doesn't do that anyway]

To go back to the SOPA case...a company like Google has to deal with something much more difficult to ascertain than errors of fact, and so it worries that it will have to adopt a "it's better to apologize than to have to ask" policy, which would be detrimental to its search property.

thebigshane · on Jan 11, 2012

I agree with your premise if this algorithm was enforced by threat of lawsuit. An alternative would be that, instead of lobbying&lawsuits, the RIAA/MPAA pay Google to implement such a thing and in return also grant them immunity/amnesty. [insert ridicule for how Big Content would never do such a thing]

burgerbrain · on Jan 11, 2012

"Clips... ...of some popular movies on youtube are obviously infringing"

Or you know, fair use.

thebigshane · on Jan 11, 2012

"Fair use" is pretty specific. Most interpretations allow to use a clip in your work. But if we are taking about a clip of a movie on its own, like the first 10 minutes, without any transformative addition/change, I don't see a "fair use" defense.

trevelyan · on Jan 12, 2012

My sole encounter with the Youtube copyright police came from posting a two minute, highly edited visual commentary on the film Inception (http://www.youtube.com/watch?v=B6rLEzrWzCw), pointing out the religious allusions to Matthew 7.24 and the allegorical significance of the final scene.

It was auto-flagged by their system and I had to jump through hoops to get it back online. I can't imagine this sort of thing would survive SOPA. And what other platform is there but the Internet for the sharing of multimedia commentary and criticism?

wmf · on Jan 12, 2012

YouTube isn't the Internet, BTW. I know people have forgotten how to do this, but it is possible to host videos on your own Web site. In that case you probably would have been left alone, or you might have received a DMCA takedown which you could easily defuse with a counter-notice.

trevelyan · on Jan 12, 2012

Sure I could have hosted it one of my servers, but that isn't the same any more than posting a couple of links in an Apache directory is the same as sharing it on Reddit or Facebook. And am I supposed to be maintaining a fulltime blog on film criticism in order to share a single video with an audience of people who might care?

There's no sane reason we should lose the visibility, distribution and community relevance of social platforms simply in exchange for exercising fair use of something. That said, the point was less about Youtube being unfair and more about how these platforms are already heavily biased against the assumption of fair use.

thebigshane · on Jan 12, 2012

The fact that you were able to get your video back up, (and remains up at the current moment) is testament that this is a decent, but not perfect, solution.

And its better than letting Congress enact SOPA, which is my whole my point upward in this thread. I know that's not saying much, but SOPA supporters claim that we can't (or won't) offer any alternatives to SOPA. Since Congress is always in the mindset of "we have to do something", let's give them an alternative that they can support that shows "something is being done".

trevelyan · on Jan 14, 2012

I agree. I was responding more to what I read (perhaps incorrectly) as the insinuation that fair use is well-defined and well-protected. I had read this as implying that it would consequently be protected under SOPA as well.

teach · on Jan 12, 2012

Educators are allowed to use movies in their entirety for face-to-face teaching situations. I am a teacher, and our school librarians are quite well-informed on the specifics of fair use for educators.

thebigshane · on Jan 12, 2012

Yes. But teachers are not allowed to post these videos to youtube (or any other public website right?) claiming fair use.

burgerbrain · on Jan 12, 2012

Youtube has private ("unlisted") videos. I don't see why teachers couldn't use this so they can show it in class and give links to their students to view at home.

burgerbrain · on Jan 11, 2012

How do you know somebody isn't using that clip in their work? Embedding videos in the middle of documents is very popular online...

Fair use may be defined in specific terms (rather debatable... "most interpretations" indeed.), but determining if a particular use is fair use is not a science.

thebigshane · on Jan 11, 2012

Hmm, I suppose in this hypothetical example, I know that this somebody isn't using that clip in their work because I specified "on its own" and "without any transformative addition/change".

If you are embedding the video in the middle of a document, its clearly not on its own and has been added to. However, if you embed 10 minutes of a video in the middle of your document, now that might be too much to be considered fair use.

I think your point is that a weaknesses of the algorithm I proposed is that media considered "fair use" could be flagged as infringing. My response to that is that Google already has procedures for dealing with images/videos that were questionably reported as offensive.

skore · on Jan 12, 2012

"Most mp3s on the web are infringing."

Quotes like this make your assessment a bit bumpy - do you have any citation to back this up? From what I can see, there is tons and tons of mp3 material on the Internet that is perfectly legal. While infringing material is mostly either trafficked via p2p or server-based filesharing systems. Maybe I'm just understanding "the web" to be more limited than "The Internet", though.

thebigshane · on Jan 13, 2012

I think you're right; and I cede that point. I can't claim that most are infringing.

I know there are some mp3 sites like beemp3 [https://www.google.com/search?q=related:beemp3.com]. I suppose I was thinking about the ease of a human to judge that, yes, beemp3 is nearly entirely made of infringing music. That humans can report these sites and algorithms can find similar sites (and similar files) and develop a pretty decent database of items they could omit from search results. My point is only that it does seem technically feasible.

tkahn6 · on Jan 11, 2012

> REP. MARINO: I only have a limited amount of time here and I appreciate your answer. But we have the technology, Google has the technology, we have the brainpower in this country, we certainly can figure it out.

Wow that is infuriating; ignorance masquerading as arrogance.

Natsu · on Jan 11, 2012

Indeed, that was an incredibly pernicious comment. I can understand how he got into politics.

Sadly, even though it was carefully explained to him, data on whether or not something constitutes copyright infringement does not exist. That's not a technical problem. It's not something you can change with a magical computer program. Infringement rests upon whether or not the person has permission to do whatever it is they're doing. That data is simply not available, so there aren't any numbers for Google (or anyone else) to crunch. He can only say it's easy because he doesn't know anything.

If he worked in sales, I bet he'd promise customers a time machine, collect a huge bonus, then blame the geeks for failing to deliver.

bgentry · on Jan 11, 2012

"Here is why blocking child porn is fairly trivial but blocking copyright infringement is impossible..."

Rep. Marino: "wakes up from daydream you nerds are smart, you can figure it out"

missing_cipher · on Jan 11, 2012

I heard that. Oh how I wished someone would have stopped him and said "Sir, I'm sorry, but you have no idea what you're talking about. We don't have the technology." Or something.

/rage...

r0ll3rb0t · on Jan 11, 2012

My first thought after hearing that was, "wow, we are doomed."

rcthompson · on Jan 11, 2012

s/appreciate/ignore/g