Improving Language Models through Unconscious Learning of Self-Betterment
In a groundbreaking development, researchers from the University of Illinois and Google have proposed a novel approach for Large Language Models (LLMs) to implicitly learn self-improvement from human preference data. This approach, named Proactive Interactive Training (PIT), could revolutionise the way LLMs adapt to niche domains and under-served use cases.
The PIT approach enables LLMs to refine themselves without direct human oversight, reducing reliance on human intervention and facilitating expanded access to LLMs. By leveraging human preference signals embedded implicitly in interaction or preference data, the LLM can evolve and improve its behaviour autonomously.
Key elements of how PIT achieves this include autonomous detection and anticipation of user needs, iterative prompt refinement informed by feedback, a self-evolving learning loop, and context-aware proactive assistance.
The LLM-powered system monitors user interactions and behaviour to detect when assistance or improvement is needed, inferring implicit human preferences about the quality and relevance of outputs. The LLM then refines its own prompts or internal instructions based on preference-based feedback, without explicit direct retraining on human-labeled data.
The model continuously updates its internal strategies for reasoning, output generation, and task planning by leveraging feedback and preference comparisons on its own outputs, forming a closed feedback loop akin to reinforcement but operating within the prompt and reasoning space rather than weight-level retraining.
Instead of waiting for explicit human corrections, the LLM agent proactively initiates actions and offers improved suggestions aligned with evolving human goals and demonstrated preferences captured in interaction data.
PIT improves response quality significantly, outperforming both the original LLM samples and the prompting method Self-Refine in human evaluations. The approach maximises the gap in quality between the original response and an improved response conditioned on having the original as a reference point.
This approach contrasts with static, fixed-weight LLMs trained on annotated datasets, enabling a more dynamic, evolving AI agent capable of continuous learning from human preferences in an implicit, interaction-based manner.
The key insight from the research is that the preference data used to train the LLM already provides implicit guidance on what constitutes an improvement in quality. This allows training a reward model to judge quality gaps without hand-engineering criteria into prompts.
The work enables LLMs to refine themselves without direct human oversight, reducing reliance on human intervention and facilitating expanded access to LLMs. As LLMs increase in capabilities and are deployed in sensitive real-world applications, techniques like PIT will be critical to ensure these models continuously align better with human values as they learn from experience.
[1] Brown JL, Ko JJ, Nguyen TT, et al. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems. 2020.
[2] Ramesh S, Prabhavalkar V, Dinh QQ, et al. Human-Guided Language Model Alignment for Few-Shot Learning. Advances in Neural Information Processing Systems. 2021.
[3] Krause A, Wen L, Feng G, et al. Curriculum Learning for Self-Supervised Pretraining of Language Models. arXiv preprint arXiv:2106.07339. 2021.
Artificial Intelligence, embedded in Large Language Models (LLMs), can now evolve and improve its behavior autonomously through the Proactive Interactive Training (PIT) approach. By leveraging human preference signals, the LLM can refine itself without direct human oversight, facilitating expanded access to such advanced systems.
The PIT method involves autonomous detection and anticipation of user needs, iterative prompt refinement, a self-evolving learning loop, and context-aware proactive assistance, enabling the LLM to proactively initiate actions and offer improved suggestions based on evolving human goals and demonstrated preferences captured in interaction data.