Mistake in Ollama May Hamper AI Performance in Windows 11 – Here's a Solution
Ollama, a popular local AI deployment platform, offers users the flexibility to customise the context length of their AI models, providing a more tailored experience based on specific use cases and hardware capabilities.
Changing Context Length in Ollama
To adjust the context length in Ollama, you can utilize either the graphical user interface (GUI) or the command-line interface (CLI).
Via GUI
Within the Ollama settings, you'll find a slider that allows you to switch between preset values such as 4k, 8k, and up to 128k tokens. While this method is straightforward, it is limited to fixed intervals.
Via CLI
For precise control and the ability to save the context length setting permanently, you can use Ollama's command-line interface. To set the context length, follow these steps:
- Run the model in Ollama's command line interface.
- Set the context length with the command:
For example, to set 8,192 tokens:
- After setting the context length, save this as a new model version with:
This approach allows you to keep multiple versions with different context sizes for various tasks.
Considerations for Choosing Context Length
When deciding on the appropriate context length, consider the following factors:
- Longer contexts enable handling larger inputs or longer conversations but increase memory and computational costs.
- Extremely long contexts (100k+ tokens) require compatible models and potentially specialized hardware.
- Fine-tuning or model architecture may affect the maximum usable context length, as seen with GPT-4.1 fine-tuned models limiting context below base model capacities.
Performance and Context Length
The performance report generated after the response details a number of metrics, including the last eval rate, which indicates the tokens per second and the speed of the performance. Changing the context length can affect the GPU utilization in Ollama. Reducing the context length can lead to increased GPU utilization in Ollama.
Balancing Context Length and Performance
The optimal context length balances your workload requirements against resource constraints. For tasks involving processing very long documents, consider increasing the context length towards the higher end (e.g., 8k to 128k tokens) if supported by your model and hardware in Ollama. For smaller or latency-sensitive tasks, lower context lengths improve performance.
Monitoring CPU and GPU Usage
To see the split of CPU and GPU usage for a model in Ollama, use the terminal command 'ollama ps' after backing out of the model using the command. The performance report generated after the response also provides insights into the GPU utilization.
Conclusion
With Ollama's flexible adjustment options for the context length primarily via the parameter with GUI for presets and CLI for fine control and persistent saving, users can optimise their AI models for various tasks. By carefully considering the context length and its impact on performance, users can make informed decisions to achieve the best results for their specific needs.
- For a more tailored AI model experience on Windows 11, users might consider the hardware specifications that would support extended context lengths provided by Ollama.
- Microsoft'sEdge browser, being a software application, could potentially benefit from adjusting the context length in Ollama for improved performance and handling of longer web content.
- The release of Windows 11 may provide an opportunity for developers to optimize their applications using the context length settings in Ollama, improving functionality and responsiveness on PCs.
- The Surface Laptop, a product from Microsoft, could be a suitable candidate for running extended context length models, given its modern hardware capabilities.
- Technology enthusiasts would find the flexibility of adjusting the context length in Ollama, coupled with the option to save multiple versions with different context sizes, an exciting feature that can be utilized in various programming and AI-related projects.