How widely known were the pieces of text? Are we talking about a section of MLK's I Have a Dream speech or hand written birthday cards from your grandma?
I'm using those as the two extremes, but if it's anything by anyone moderately well known (even a lesser known piece of writing), I'm not too surprised that it didn't need the web to figure it out. It's like if you showed me a Wes Anderson film or played me a Bob Dylan song I'd never seen/heard before, I could probably still figure out who it is without looking anything up. I don't think it's surprising that an LLM can do that much better than a human can.
Now, if you're giving it things like personal emails between you and your family and it's able to guess who you are, that's much, much scarier.
As long as there's sufficient online presence otherwise I see no reason why a successful identity wouldn't be made. Unless there's significant effort put into making those emails different from the online content, and even then there will probably still be some "tells" that an AI can pick up on.
I mean I tried sending the pieces of text to Opus that Kelsey was referring to on her blog just to independently check the identification claim. Presumably those pieces of text first appeared on the web when the blog post was published a week ago, so no model should have memorized the exact text yet. My prompt had to specify no web search, otherwise Opus would try to search the web, though it didn't seem like Opus could find that blog post even when it did try to search the web.
I mentioned it in that thread, but when the HERMES bug was first reported multiple people on Reddit claimed that it could also be triggered with openclaw specific file names. It makes me think that instead of going just saying, "this approach for defending against 3rd party oauth isn't working" and rolling things back, they just tried to fix forward and continue with the strategy
I saw this bug mentioned on Reddit a few days ago when it first got reported and someone said it was also triggered by certain file names used in OpenClaw.
I don't think it's as sinister as you're implying. I think it's part of them disallowing 3rd party clients from using Claude Code subscription and someone making a bad assumption that certain files in a repo being a good signal that someone is attempting to bypass those rules.
It's still not a good look for Anthropic, but I don't take this as a secret attempt to sabotage a competitor. I take it as them trying to enforce rules that they had very publicly announced.
I don't have the exact ruling in front of me, but IIRC the judge pretty clearly said that training a model was fair use. IIRC, he declared it "quintessentially transformative".
The case by case basis was about acquisition and possession of the copyrighted material. Anthropic pirated a large number of books and illegally stored digital copies of many that they did purchase legally. The training being protected doesn't give them the right to violate copyright in that way.
Google, for example, purchased print versions of their training material and had a small army of employees digitize them and then delete the digital copies when they were done. That hasn't been challenged AFAIK, but would likely have been found to be not a violation. That's I think what was meant by case by case basis.
It's like if someone breaks into my house and I shoot them with my gun, that's very likely self defense, but if I'm not allowed to own a gun, I may still end up in trouble with the law.
Whether or not you’re pirating and making illegal copies of something depends greatly on the terms under which you’re allowed to make those copies. You can copy GPL-licensed code all day every day so long as you abide by the license. The same is true of the BSD licenses, MIT, ISC, Apache, et cetera.
If you’re copying or making substantially derivative works of them outside the terms of the license, you’re violating the copyright.
> If you’re copying or making substantially derivative works of them outside the terms of the license, you’re violating the copyright
I don't disagree with that.
What I'm saying is that the judge ruled that training a model using copyrighted books wasn't derivative. It was transformative, so the training wasn't a copyright violation.
He then went on to say that the way Anthropic acquired and handled that material was a copyright violation because Anthropic pirated and copied a large number of books that were not under a license like the ones you mentioned. The downloaded a bunch of books you would find at most bookstores and then actually purchased copies of them much later once they were accused of violation copyrights.
I'm just trying to make that clear because I've heard a lot of people who don't understand that the violation wasn't about the act of training or material they used, it was just how they acquired the training material.
That was one case in front of one judge. It’s weak precedent if it’s precedent at all.
Also, the reasoning behind it being transformative instead of derivative is that the output isn’t supposed to be large, unchanged chunks of the input. There’s no actual guarantee your small model run under OpenClaw won’t recreate whole modules of the input.
IIRC, in SF they're slightly more expensive before tip, but having ridden in them in SF, LA, and AZ I've always felt they were cheaper. Over the long run, they will probably end up being cheaper from the wholesale perspective since eventually the parts and technology cost will come down with time and scale while human wages will continue to rise.
That said, it doesn't really matter if they're cheaper as long as they're comparable.
The cars are newer and nicer (for now), they're almost always cleaner since they can rotating one car out for cleaning doesn't mean the driver is losing earnings, they're better drivers than the average ride-share driver, you don't feel the need to tip, and I've multiple of my friends who are women call out that they feel safer in them because there's no risk of the driver being creepy (or worse).
I don't think Waymo is trying to win on price right now. I think as long as they just stay somewhat competitive on that front the other benefits will continue to draw in customers.
Alphabet/Google/Waymo is a technology business, with emphasis on business. They're not running a charity. They're in it to make money. If it's a $20 Uber ride to somewhere, they're not going to leave money on the table and charge $10 because they don't have a driver to pay. They're going to charge $22 for the premium experience because they know people will pay that.
Of course, but I never claimed that they would do that. At some point though other autonomous services will enter the market and Waymo will have to compete with them on price. Even if that doesn't happen (which seems incredibly unlikely), they're still competing against public transit and people driving themselves (or privately owned self driving cars). Not having to pay a driver means the floor where they can make a profit is lower.
If we're in a world where human driven Uber's are $30, you're right that Waymo probably won't charge $20 just to be nice, even if it only costs them $10. They might charge $20 though if their data shows that it would 10x the number of riders or if they're also competing with another autonomous taxi company.
That sounds right. Passenger pays for lower risk, etc. The market sees the company making $2 extra and a competitor will see if they can do it for just $1 extra.
> When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand
Hard disagree on that second part. Take something like using a library to make an HTTP call. I think there are plenty of engineers who have more than a cursory understanding of what's actually going on under the hood.
It might just be social. When I use the open source http library, much of the reason I use it is because someone has put in the work of making sure it actually works across a diverse set of software and hardware platforms, catching common dumb off by ones, etc.
Sure, the LLM theoretically can write perfect code. Just like you could theoretically write perfect code. In real life though, maintenance is a huge issue
I believe you, but that doesn't mean the comment you're responding to is wrong. Large layoffs are like trying to doing surgery with a butcher knife while wearing an eye patch and a pair of mittens.
Since companies usually don't want to telegraph the layoffs too far in advance, they try and keep the people in the know as small as possible. That means the people making the decisions on who stays and who goes are often multiple levels removed from a lot of the people affected.
I'm really sorry to hear that you got let go and I hope you are able to find a new role soon.
Pretty much. In a prior role I didn't have a real job any longer but the people making the decisions for a fairly small layoff probably didn't know that. Would have been happy to have taken a decent severance package. Hung out for a while more or less.
I thought it's also mostly to preserve feeling, to obscure the connection between performance and layoff to ease employee transition to another job. That's why sometimes it's a branch all at once from the middle to bottom.
Except when it doesn't. From the same article, Seattle and Oakland had supply increase 13% and 6% respectively and both saw average rents go up. Long Beach, CA and Santa Ana, CA both had supply increase by over 10% and saw rents stay basically flat.
More supply generally will help, but it's not a silver bullet.
Maybe more recently it's up but Oakland has been a supply success story as prices diverged with SF starting around 2021. There's a chart past the fold that illustrates it well:
I'm using those as the two extremes, but if it's anything by anyone moderately well known (even a lesser known piece of writing), I'm not too surprised that it didn't need the web to figure it out. It's like if you showed me a Wes Anderson film or played me a Bob Dylan song I'd never seen/heard before, I could probably still figure out who it is without looking anything up. I don't think it's surprising that an LLM can do that much better than a human can.
Now, if you're giving it things like personal emails between you and your family and it's able to guess who you are, that's much, much scarier.
reply