Hacker Newsnew | past | comments | ask | show | jobs | submit | vorticalbox's commentslogin

i've not had the issue with codex, i was testing a public api i work on for issues, codex was happy to attempt to break it but did refuse to create a script that would automate the issue it found.

Yea I’ve seen this and stopped it and asked it about it.

Sometimes they notice bugs or issues and just completely ignore it.


This can result in some funny interactions. I don't know if Claude will say anything, but I've had some models act "surprised" when I commented on something in their thinking, or even deny saying anything about it until I insisted that I can see their reasoning output.

Supposedly (https://www.reddit.com/r/ClaudeAI/comments/1seune4/claude_ch...) they can't even see their own reasoning afterwards.

It depends on the version. For the more recent Claudes they've been keeping it.

I some times play about with local models via ollama/comfyui and more recently ace-step to generate music.

This is short bursts of heat 5-10 m during the render I would not be happy with that for multiple hours a day. I am sure that would have a negative effect on battery health.


Most agent/chats have access to web search. I’m not overly surprised that it can do it but it is very nice when it actually works.

You can get abliterated versions that have no (or very limited) refusals.

I tend to use Huihuiai versions.


i'm using the cursor cli when i just its build in it scored 10/16 tokens but i also have my own custom cli tool that does tasks for my job when i used that it scored 15/16. it missed the token on the Content Negotiation test.

This is how use cursor 99% of the time. The other 1% is in zed.


what happened around jan this year(26) that caused such a climb in usage?


Openclaw


i like ollama, mostly because the cli is pretty nice. its desktop app has stupid choices like if a model can support tools then the ui should give me the "search" option but it only shows for cloud models.

i have ran lmstudio for a while but i don't really use local models that much other than to mess about.


You can also use OpenWebUI locally which should give you a nice friendly UX once you set it up.


"OpenAI’s GPT-5" is ambiguous. Does that mean GPT-5, 5.1, 5.2, 5.3, or 5.4? Does it include the full model, or the nano/mini variants?


GPT-5 is not ambiguous, it's the official name of the model that released in August last year.

> All evaluations were done in March - August 2025.


while true, all the others got precise identifiers but for openAI it makes it hard to reproduce because i have no idea "which" GPT-5 was used.


It was called just GPT-5 at that point in time.


In that case, what tokenizer version? What was the temperature set to? topk? topp? FP32? FP16? Quantized? Hopper? Blackwell?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: