Hacker Newsnew | past | comments | ask | show | jobs | submit | intended's commentslogin

It’s abrupt only if you are unaware of safety challenges and issues in children’s gaming in the past decade.

Moderating user generated games is a kafkaesque joke. It’s not just text, audio, or video. It’s all of those combined in an interactive environment which can include trigger conditions - and one category of games is escaping from mazes.

Since it’s kids, you will end up with maps based on actual schools, combined with violence, on your mod que.

The list of horrifying stuff that happens frequently is quite long, and it’s unfortunate how unaware most people seem to be about it.

At least so many people wouldn’t be surprised.


School maps, takes me back, I made them back in the day myself. Fact is kids spend so much time at school and it’s their social life as well. Of course in my day it was made by kids for kids, not by grooming adults.

Huh ?

This context of the conversation is Wikipedia, an encyclopedia with a responsibility to verify and attribute its content.


Yep, things went downhill after social media.

Social media + mobile phones pit the ingenuity of our cleverest minds against the will and habits of the many, to sell ads.


They are selling an impossible product.

If you make an LLM more safe, you are going to shift the weight for defensive actions as well.

There’s no physical way to assign weights to have one and not the other.


> If you make an LLM more safe, you are going to shift the weight for defensive actions as well. > > There’s no physical way to assign weights to have one and not the other.

Do you think a human is capable of providing assistance with defense but not offense, over a textual communication channel with another human?

If no, how does a cybersec firm train its employees?

If yes, how can you make the bold claim that it's possible for a human to differentiate between the two cases using incoming text as their basis for judgement, but IMpossible for an LLM to be configured to do the same? Note that if some hypothetical completely-determinstic LLM that always rejects "attack" requests and accepts "defense" ones can exist, the claim it's impossible is false. Providing nondeterministic output for a given input is not a hard requirement for language models.


> Do you think a human is capable of providing assistance with defense but not offense, over a textual communication channel with another human? > If no, how does a cybersec firm train its employees?

In general, no, humans can’t be sure they are only helping with defensive and not offensive work unless they have more context. IRL, a security engineer would know who they’re working for. If they’re advising Apple, then they’d feel pretty confident that Apple is not turning around and hacking people.


If the task is ill-defined, then it's a bit unfair to make it sound like the problem is that an LLM can't be configured to do something, if a human would have an equally hard time with the same task. The statement "it's impossible to configure the weights to..." should really be something more broad like "it's impossible to...".

I have no comment about whether it's impossible to determine the intentions of a person asking for assistance through a textual conversation with that person.


> IMpossible for an LLM to be configured to do the same?

Because that’s what I am seeing emerge from the various efforts to build LLM safety tools.

> Do you think a human is capable of providing assistance with defense but not offense, over a textual communication channel with another human?

LLM != human? They don’t even use the same reasoning process.


> Because that’s what I am seeing emerge from the various efforts to build LLM safety tools.

Something having not been obtained so far is not a logical argument it is impossible to obtain that thing.

> LLM != human? They don’t even use the same reasoning process.

There are a finite number of possible input strings of a given length. For any set of input strings, it is possible to build a deterministic mapping that produces "correct" answers, where those correct answers exist. Ergo anything a human can do correctly with a certain set of text inputs, it is possible to build an LLM that performs equally well. You can think of this as hardcoding the right answers into the model. The model itself can get very large, but it is always possible (not necessarily feasible).

It's only impossible for an LLM to do something right if we cannot decide what it means for the answer to BE right in a stable way, or if it requires an unbounded amount of input. No real-world tasks require an unbounded input.


Most organizations don’t have the resources to let people travel without an actual event on the other end.

I could be wrong, but I suspect visas may be affected as well.

The organizers and attendees will try and coordinate something else. However, throwing something together on zoom, at such short notice is going to be a crap show.

The schedules wouldn’t survive unscathed, since speakers will start dropping out at this point.

If they do manage to get something together, they run the real risk of it being a rickety and frustrating event.


You were wise enough to avoid this, unfortunately for most people “shiny tech!”.

Yet more regulation? We have regulation for these glasses already?

Aren’t there countries that make it mandatory to blot out faces of people on videos if they didn’t consent?


Which examples did they cover in the book?

I’m betting this is going to some ML / Data labelling pipeline.

Yeah, moderation may instead be labelling in this case. Its likely the same type of firm handles both sorts of work on behalf of FAANG

Sounds plausible.

We could also toss vibe coded mess on top of this and probably get closer to the truth.


The article itself is ambiguous on this point: "At the time of the publication, Meta admitted subcontracted workers might sometimes review content filmed on its smart glasses when people shared it with Meta AI."

That could be moderation, or it could be labelling new examples for training/validation


This feels like an instance of weasel words. One can scarcely imagine any reason to do content moderation over people’s own private and personally consumed data.

I’ve worked in trust and safety - for me this is stupid, but well below the threshold of impossible.

Hell, I know of a major firm that decided QA was not needed for their trust and safety process.

Another common issue will be SEA Arabic speakers tasked with labelling Middle Eastern Arabic content, because accents and cultural dialects are not a thing.

I’ve had people at FAANG firms cry on my shoulder, because they couldn’t get access to engineering resources at their own firms.

There was the famous case of meta executives overriding T&S policy and telling them that what content was news worthy during the Boston bombing. On a separate incident, they told their team that cartel violence was not newsworthy when friends in London complained about it.

When you say this is fantasy, what do you mean precisely?


What I mean is: I'm not sure what they base their statement that it's "a common practice among other companies" on. Unlikely they are talking about their peer companies. I suppose if you read the sentence literally, there surely exist one or more "other companies" in the broad universe of "other companies" that routinely do this kind of stuff. But I wouldn't think anywhere serious.

I mean, given this happened and it was sent to Sama it seems pretty clear that the images being generated from this were being sent to a labelling pipeline somewhere.

There’s probably an opt out / opt in clause somewhere in the terms and conditions, which makes it feasible for Meta (and other firms) to use this data.


Meta could at least pretend that they don't intend to capture people in their most intimate and vulnerable moments instead of slobbering on the sideline like "mm... Data..."

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: