All about technology.

Mistake in Ollama May Hamper AI Performance in Windows 11 – Here's a Solution

Integrating AI localized LLMs into daily workflows through Ollama could potentially boost efficiency, but potentially neglected performance could stem from a key oversight, as my own experience demonstrates.

, and Administrator

2025 August 29 . 5:39 PM

2 min read

Addressing and Resolving a Typical Error in Ollama Impacting AI Efficiency on Windows 11

Mistake in Ollama May Hamper AI Performance in Windows 11 – Here's a Solution

Ollama, a popular local AI deployment platform, offers users the flexibility to customise the context length of their AI models, providing a more tailored experience based on specific use cases and hardware capabilities.

Changing Context Length in Ollama

To adjust the context length in Ollama, you can utilize either the graphical user interface (GUI) or the command-line interface (CLI).

Via GUI

Within the Ollama settings, you'll find a slider that allows you to switch between preset values such as 4k, 8k, and up to 128k tokens. While this method is straightforward, it is limited to fixed intervals.

Via CLI

For precise control and the ability to save the context length setting permanently, you can use Ollama's command-line interface. To set the context length, follow these steps:

Run the model in Ollama's command line interface.
Set the context length with the command:

For example, to set 8,192 tokens:

After setting the context length, save this as a new model version with:

This approach allows you to keep multiple versions with different context sizes for various tasks.

Considerations for Choosing Context Length

When deciding on the appropriate context length, consider the following factors:

Longer contexts enable handling larger inputs or longer conversations but increase memory and computational costs.
Extremely long contexts (100k+ tokens) require compatible models and potentially specialized hardware.
Fine-tuning or model architecture may affect the maximum usable context length, as seen with GPT-4.1 fine-tuned models limiting context below base model capacities.

Performance and Context Length

The performance report generated after the response details a number of metrics, including the last eval rate, which indicates the tokens per second and the speed of the performance. Changing the context length can affect the GPU utilization in Ollama. Reducing the context length can lead to increased GPU utilization in Ollama.

Balancing Context Length and Performance

The optimal context length balances your workload requirements against resource constraints. For tasks involving processing very long documents, consider increasing the context length towards the higher end (e.g., 8k to 128k tokens) if supported by your model and hardware in Ollama. For smaller or latency-sensitive tasks, lower context lengths improve performance.

Monitoring CPU and GPU Usage

To see the split of CPU and GPU usage for a model in Ollama, use the terminal command 'ollama ps' after backing out of the model using the command. The performance report generated after the response also provides insights into the GPU utilization.

Conclusion

With Ollama's flexible adjustment options for the context length primarily via the parameter with GUI for presets and CLI for fine control and persistent saving, users can optimise their AI models for various tasks. By carefully considering the context length and its impact on performance, users can make informed decisions to achieve the best results for their specific needs.

Latest

Test your camera knowledge with this challenging questionnaire: Can you recognize all 6 cameras...

All about gadgets.

Challenge your photography expertise: Can you pinpoint all 6 cameras in my intricate photographic test?

Test your photography expertise: Identify the make of 6 cameras, utilizing only close-up photos as clues

, and Administrator

2025 August 29

Dubai's inauguration of its initial commercial facility producing fishery-feeding and palm tree...

All about technology.

Commercial Drone Factory in UAE: Producing Drones Capable of Feeding Fish, Detecting Palm Tree Diseases

Global discussions revolve around exporting the company's goods to international markets. In India, there are proposals to transition animal-powered deliveries in mountainous regions to drones capable of transporting up to 40kg of cargo.

, and Administrator

2025 August 29

Interactive division of Penn Entertainment records highest figures in Q2, contributing...

Finance

Interactive division of Penn Entertainment smashes Q2 records, contributing to the company's impressive financial performance

Quarterly financial reports disclosed by Penn reveal a rise in revenue and EBITDA, coupled with a substantial decrease in net loss.

, and Administrator

2025 August 29

Igniting the Heavens: Aerial Electricity Venture

All about technology.

Sky Electrification Unveiled

Airbus contends that hybrid-electric propulsion could slash aircraft's CO2 emissions by up to 5%, with helicopters potentially experiencing a reduction of up to 10%.

, and Administrator

2025 August 29

Mistake in Ollama May Hamper AI Performance in Windows 11 – Here's a Solution

Mistake in Ollama May Hamper AI Performance in Windows 11 – Here's a Solution

Changing Context Length in Ollama

Via GUI

Via CLI

Considerations for Choosing Context Length

Performance and Context Length

Balancing Context Length and Performance

Monitoring CPU and GPU Usage

Conclusion

Read also:

Related

Latest