These findings validate the notion that making models reason in a much more representative latent space is better than making them talk to themselves via [Chain of Thought]
Your Next ‘Large’ Language Model Might Not Be Large After All | Towards Data Science 

Towards Data Science
Your Next ‘Large’ Language Model Might Not Be Large After All | Towards Data Science
A 27M-parameter model just outperformed giants like DeepSeek R1, o3-mini, and Claude 3.7 on reasoning tasks