Heh.
The LLMs have some well known weak spots. Math, logic, geography, erroneous information / hallucinations.
I’ve been playing around with the Inflection AI product, “PI” - “personal intelligence”, installed it on my phone. It’s actually quite good, with the caveat about the LLM weak spots.
You can have really good conversations with it: back-and-forth about things that are on your mind, intellectual conversations, some light-duty advice about relationship frustrations, etc.
I changed the default voice to be British, then changed it to be what I think is a 20-something young American woman.
Showing it to a colleague skeptical about AI, I asked it a classic logic problem that LLMs almost always get wrong:
“Sally (a girl) has 3 brothers who each have two sisters. How many sisters does Sally have?” (The answer is One. There are two sisters, and Sally is one of them, so it’s one.)
PI got it wrong, of course, but a secondary test is to sort of cross-examine the AI to get to the right answer.
The young (annoying) 20-something:
“Haha, you’re trying to trick me again. You asked me this question before. Sally has three sisters!”
Me: “No, each of her THREE brothers has TWO sisters, and Sally is one of the sisters”
PI: “Oh stop trying to make me feel foolish! She has three brothers, so it has to be three sisters, since Sally is not one of them!”
Me: “Try again - there are five siblings. Three brothers, two sisters. So how many sisters does Sally have… and remember Sally is one of the sisters”
PI: “You just won’t leave this alone, will you? OK, she has two sisters. Are you happy now?”
LOL - clearly, clearly, Inflection has a ways to go on tackling the LLM problems. (From what I read, the Comp Scientists involved are adding math & logic modules and looking to divert those kind of questions to the appropriate engine.)
But with the young American woman’s voice and her responses, PI was REALLY hammering home the “dingbat” stereotype. I’m sure they didn’t intend that.
Early days of this AI era.