AI pioneer OpenAI debuts two novel AI models, gpt-oss-120b and gpt-oss-20b, enabling local processing on Snapdragon PCs and NVIDIA RTX GPUs.
OpenAI, the leading AI research organisation, has announced the release of two new generative AI models: gpt-oss-120b and gpt-oss-20b. These models are designed to excel in text-only tasks and can work with cloud-based models for additional functionality.
System Requirements for gpt-oss-20b on Snapdragon-Powered PCs
For those interested in running the gpt-oss-20b model on Snapdragon-powered PCs, certain system requirements need to be met. At a minimum, the system should have at least 16GB of total memory (RAM + GPU VRAM), although a preferable configuration would be 24GB of RAM or more for stable and faster performance. A capable GPU that supports efficient floating-point precision (FP4) processing significantly improves performance, although Snapdragon processors with integrated AI accelerators can facilitate on-device execution with optimized low-latency AI experiences. The model size is approximately 12.2GB, demanding the entire model to be loaded in memory for reasonable performance.
Performance Considerations
While gpt-oss-20b can run on Snapdragon-powered PCs, it's important to note that performance is constrained by available RAM and GPU capabilities. For instance, a Surface Laptop with Snapdragon X and 16GB RAM managed to run the model but only at about 10 tokens per second, with little room for other tasks.
Open-Weight Models and Chain-of-Thought Reasoning
Both gpt-oss-120b and gpt-oss-20b are open-weight models, meaning their training parameters are available to the public. Additionally, both models use chain-of-thought reasoning, a more "human" approach to problem-solving.
Training Details
NVIDIA shared the news of the training of both gpt-oss-120b and gpt-oss-20b in a blog post. The more powerful gpt-oss-120b model required 2.1 million hours of training on NVIDIA H100 GPUs, while gpt-oss-20b required around one-tenth of that time.
Partnerships and Availability
OpenAI has partnered with several development platforms, including Azure, Hugging Face, vLLM, Ollama, AWS, and Fireworks, for its new models. The gpt-oss-20b model can be downloaded through Hugging Face. Coinciding with the release of gpt-oss-120b and gpt-oss-20b, Ollama released a new app that makes it easier to use local LLMs on a Windows 11 PC.
On-Device Inference and Future Developments
Qualcomm discussed the release of gpt-oss-20b, stating it is the first time OpenAI has made its model available for on-device inference. This opens up possibilities for private, low-latency on-device AI use. As these models continue to evolve, we can expect to see further advancements in the field of AI.
[1]: Link to the original source 1 [3]: Link to the original source 3 [5]: Link to the original source 5
- To optimize the performance of gpt-oss-20b on Snapdragon-powered PCs, consider a system with at least 16GB of total memory (RAM + GPU VRAM), preferably 24GB or more for smoother and faster execution.
- A capable GPU that supports efficient floating-point processing enhances performance, though Snapdragon processors with built-in AI accelerators can facilitate low-latency AI experiences on-device.
- The gpt-oss-20b model demands a memory of approximately 12.2GB to load for reasonable performance.
- Performance on Snapdragon-powered PCs is constrained by available RAM and GPU capabilities, resulting in reduced speeds compared to more robust systems.
- OpenAI's gpt-oss-20b model can be downloaded through Hugging Face, while Ollama has released an app that makes it easier to use local LLMs on a Windows 11 PC.
- Both gpt-oss-120b and gpt-oss-20b are open-weight models, with their training parameters accessible to the public, and employ chain-of-thought reasoning for problem-solving.
- OpenAI's gpt-oss-20b has made its first appearance for on-device inference, opening up possibilities for private, low-latency, on-device AI use in the future of technology and artificial intelligence.