LLM's get Reflexive
Self-Reflecting LLMs
My dad shared some fascinating news on GPT-4 (via this video): it performs better with reflexion, i.e. if you ask it why it provided an incorrect answer it will sometimes catch it, meaning it can sometimes self-reflect on wrong answers to offer an improvement. Here is a substack post from the paper’s authors Noah Shinn and Ashwin Gopinath – one of the provocative points is that GPT-4 crossed a threshold of complexity to be able to improve via reflexion, unlike the earlier GPT models.
A related paper describes a similar paradigm called Dialog-Enabled Resolving Agents (DERA) where the model challenges itself. Particularly interesting to me, the paper uses clinical documentation examples to show how DERA improves the quality and reduces hallucinations.
On a separate GPT note, I’d like to also read the HuggingGPT paper as an exploration of executive orchestration among task specific AI models. I took a glance at the pipelines available.
Exercise:
Off-topically: today I shave the mustache. It’s caterpillared too far.
I need to get back in the saddle and get some cycling miles under my belt. I’m happy I kept up with lifting while in LA, but didn’t get in the cardio / jogging like I was hoping. Definitely a problem with working with people out of the office house; there’s no real going back to your own place so no room to get in independent work.
This weekend is a good opportunity, though I need to balance it against basketball: full court ball for three hours wrings me out.
Exercise | Set | Weight | Reps |
---|---|---|---|
DBell Overhead Press | 1 | 102.5 | 5 |
DBell Overhead Press | 2 | 90 | 8 |
DBell Overhead Press | 3 | 90 | 8 |
Reverse Fly | 1 | 50 | 10 |
Reverse Fly | 2 | 50 | 11 |
Reverse Fly | 3 | 50 | 9 |
Lateral Raise | 1 | 50 | 8 |
Lateral Raise | 2 | 50 | 8 |
Lateral Raise | 3 | 50 | 7 |
Front Raise | 1 | 50 | 8 |
Front Raise | 2 | 50 | 9 |
Front Raise | 3 | 50 | 8 |
Next time I’ll drop the lateral raise weight down to 45lbs – it was super sloppy at 50lbs.