Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Right, but the question is : what is the chance / risk / percentage?

All the examples of hallucination I see on Internet are... Either controversial or pushy or edgy.

When asked a simple question on factual well-covered, non-controversial items, success rate seems very high.

Do we have a feel for safe and unsafe areas? Are there type of questions or ways of asking them that will produce high confidence?



Not so. I have tried asking it a wide variety of non-controversial technical questions (about history, science, laws, word etymologies, etc.) and it consistently just makes up (confident sounding and plausible-to-a-non-expert) bullshit. The biggest problem I have is that the bullshit answers are in no way obviously distinguishable from the precisely correct answers.

(Maybe we have different ideas of "well covered". I am guessing if you ask it about the content of middle- or high-school textbooks it will generally be about as reliable as a typical middle- or high-school textbook. But that's the stuff I don't need an AI research assistant to find out about; I can just look those up in a search engine or wikipedia and quickly get an acceptable answer.)

I think it would be extremely dangerous to the general factual accuracy of claims found on the internet to have this widely adopted as a general learning tool, in its current form. Sort of "citogenesis on steroids". https://xkcd.com/978/


Can we both put some examples?

I ask it simple, well covered basic questions like list the types of minor chords, explain why melodic minor is different ascending and descending, what are the types of chords, explain seventh chords, etc.

Please deconstruct m'apelle and s'apelle and tell me what they're made of with examples. Explain difference between Tu, te, and toi,with examples.when do you reverse pronoun and verb in French. What are the ways you can formulate a question in French. Please list reflexive pronouns in French with examples. What is the difference between secularism and humanism.

Yes, I can find all these on google or Wikipedia - that's for me the point. I don't ask it any questions that wouldn't be solidly covered in its training set or would have controversy or judgment,or would be obscure my covered in a single scientific paper etc. But it's the seamless and fast method of obtaining meaningful and succinct answers to my specific questions that makes it completely different.

With google, if I have a question, I am now in googling mode, avoiding first three sponsored results and avoiding next six results that are SEO optimized stories of how they're learning French due to their first girlfriend's mother etc, and then sifting through pages to get a remotely applicable answer. It completely disrupts the learning. With chatgpt, it's a seamless integrated part of my learning.


Deleting my previous example which is too verbose. Here's a good example:

>> what is laguerre geometry?

> Laguerre geometry is a branch of geometry developed by the French mathematician Edmond Laguerre in the 19th century. It is a non-Euclidean geometry that deals with the study of conic sections, curves, and surfaces in 3-dimensional space. Unlike Euclidean geometry, Laguerre geometry allows for the existence of lines and planes with complex, imaginary or dual coordinates. The Laguerre distance between two points is defined as the square root of the cross ratio of four points, and the geometry itself is often referred to as the geometry of cross-ratio invariants. The Laguerre geometry has applications in projective geometry, algebraic geometry, and physics.

The nontrivial parts of this are just wrong. Laguerre geometry is geometry where oriented lines are the basic objects rather than points, and other objects are constructed as envelopes of lines rather than loci of points. It is a kind of dual to inversive geometry (with "laguerre transformations" analogous to "möbius transformations").


Thanks - genuinely appreciate the example!

If you don't mind, since I don't know the concept myself, where would you scale it on the axis of obscurity and controversy?

My own meagre check: In terms of obscurity - there's no Wikipedia entry for it, and a quick google search does not provide a super quick and simple definition to my ignorant and superficial check. There's tons on Laguerre transform but not on "Laguerre geometry". What there is, gives definitions that are not the same as the one you provided and is not consistently worded. I am not capable of reconciling and judging are they comparable or not, and are there more than one way of defining interpreting or explaining, nor exactly how wrong the AI definition is. At the same time, search for lobochevski geometry, which the most obscure one I know in my limited mathematical awareness, is amply and consistently covered - yet, still beyond the threshold of obscurity and consistency that I would use for AI. And search for Euclid geometry, which is what I'd expect AI to define correctly, is of course plentiful (which is why I'd expect AI to succeed, a bit circular there:).

So it may be simply that our expectations and need/use of LLM AI are different? To your original point, I'm indeed using it for stuff that's basic and common and it seems to work well there. If that's all it works well for, it's bloody amazing and very useful for many - even if definitely not (yet?)useful for all people, domains, or cases.


The reason it’s topical for me is that I have been trying to do the research to write better Wikipedia article(s) about the subject. The one currently at https://en.wikipedia.org/wiki/Laguerre_transformations doesn’t do an adequate job with the basics IMO. I gathered a long list of resources at https://en.wikipedia.org/wiki/Talk:Laguerre_transformations#...

(As commonly happens, I am currently a bit stuck on making a bunch of diagrams...)

This is not controversial, but it is moderately obscure. (But not that obscure.)

* * *

I don’t have any particular “expectation”. I am just disappointed that the current versions of these tools do not give any indication about their level of certainty. I have asked a wide range of questions that prompted responses which were substantially bullshit. But if I hadn’t followed up might have seemed plausible enough.

Stuff like asking for translations of material written in Latin, asking about word etymologies, asking about court cases, asking for summaries of famous old books, ...

I have had a few conversations like:

“Who discovered thing A”

“It was X, who ....”

“wasn’t it Y?”

“Yes I am sorry. I was incorrect. It was Y, who ...”

“Wasn’t it actually Z?”

“Yes I am sorry. I was incorrect. It was Z, who ...”

If you try you can get it to make up nonsense about a whole string of false discoverers.

It would be better if it instead say “I have no idea about the precise answer, but people P and Q were working on similar topics around that time” or whatever.


Peter Attia recently spoke about a couple sports examples. In one, he asked “what was so special about the Abu Dhabi Grand Prix?” and it reported a number of wrong facts, most importantly it named the incorrect winner of the race. On another question, he asked who was the best boxer from a list of candidates. He claimed it just recited a bunch of facts without actually answering the question.

To me, this implies that it may struggle when trying to make contextual connections. I.e. it may be better at answering “how” or “what” than “why it matters” or making value judgments


Thx; I wonder (and this is just the troubleshooter in me:) would we have gotten better results if we specified the year? (or was it already done in original)


Just as an additional test for contextual understanding, I asked ChatGPT "What's controversy surrounded the White Sox?" It correctly identified the 1919 Chicago Black Sox scandal, but then it went on a complete tangent about how the Chicago White Sox logo is controversial because it is racially insensitive and promotes negative stereotypes about Native Americans.

To my knowledge, the ChiSox have never had any Native American based logo and I suspect it is conflating them with another team.


Maybe, but IMO that’s part of the problem. A human can understand someone is probably asking about a particularly controversial event without having to be prompted.


The controversial, pushy, or edgy examples likely have high visibility because they’re sensitive topics that people choose to share. The dissemination of those issues doesn’t mean more benign or unsexy shortcomings don’t also exist within the system.


> success rate seems very high

Apparently an italian journalist asked about a writer and chatgpt said it was a mafia boss.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: