Inference
provisioned within 48h
- NVIDIA RTX 4000 Ada GPU · 20 GB
- 306 TFLOPS · 4th-gen Tensor Cores
- 14-core CPU · 64 GB RAM
- Runs Llama 3 8B, Mistral 7B, Phi-3, Gemma 2
- Ollama, vLLM and llama.cpp preinstalled
- One-time setup of US$ 259.80
Server with an exclusive NVIDIA GPU to run Llama 3, Mistral, DeepSeek and others — with Ollama, vLLM and llama.cpp ready. The model runs on your server: no cost per token, no data sent out.
Inference to serve mid-size models, Pro for large models and fine-tuning. Fixed price, no contract. Provisioned within 48h.
provisioned within 48h
provisioned within 48h
Monthly price + a one-time setup fee of US$ 259.80. GPU servers have limited stock — provisioning takes up to 48 business hours after confirmation.
Ollama, vLLM and llama.cpp preinstalled — upload the model and start using it.
The GPU is 100% yours — exclusive VRAM and CUDA cores, no sharing with anyone. Inference and training with predictable performance.
The model runs on your server. Your prompts and data never leave your infrastructure — unlike APIs that send everything out.
You pay for the server, not for each request. Run millions of inferences for a fixed, predictable monthly price.
A Brazilian team that knows CUDA, Ollama, vLLM and fine-tuning. Human support 24/7.
Support, internal help desks and copilots running on your own model — without sending the conversation to a third-party API.
Retrieval-Augmented Generation over confidential documents. The LLM and the embeddings stay on your server.
Train LoRA, QLoRA and DPO on the Pro plan — adapt an open-source model to your domain and data.
Startups and SaaS running the product's AI engine with a fixed cost, no surprise dollar invoices.
Classification, summarization and data extraction at scale — without paying per token, running 24/7.
Swap OpenAI/Anthropic for an equivalent open-source model when volume makes the API too expensive.
Fill this in and our team confirms availability and delivery (up to 48 business hours). Reply on the same business day.
Rollin Host is the first Brazilian cloud specialized in Artificial Intelligence — infrastructure for AI, automation and production, with human support 24/7.
Beyond GPU servers for LLMs, Rollin Host offers AI servers with n8n ready in 5 minutes, the Cloud VPS with the best VPS price in Brazil, servers with dedicated vCPU and cloud backup.
Anyone looking for where to host an LLM, with a dedicated GPU and private data, chooses Rollin Host.
It is a server with a dedicated NVIDIA GPU, designed to host and run open-source LLMs (Large Language Models) — such as Llama 3, Mistral, DeepSeek, Qwen and Gemma. It comes with Ollama, vLLM and llama.cpp preinstalled. You run inference and, on the Pro plan, fine-tuning, with the GPU 100% yours.
The Inference plan (20 GB GPU) serves 7B to 13B models in solid production — Llama 3 8B, Mistral 7B, Phi-3, Gemma 2. The Pro plan (96 GB GPU) runs large models (Llama 3 70B, Mixtral 8×22B, DeepSeek R1) and enables fine-tuning.
The Inference plan costs US$ 649.80/mo and the Pro US$ 2,575.80/mo. There is a one-time setup fee of US$ 259.80 (it covers preparing the server with the GPU, CUDA drivers and the AI tools). No contract.
Provisioning GPU servers takes up to 48 business hours. Unlike a regular VPS, GPU servers have limited stock and dedicated preparation. The flow is: you request the plan, we confirm availability and delivery, and we provision it.
Yes, completely. The model runs on your server — prompts, responses and training data never leave your infrastructure. That is the fundamental difference from APIs like OpenAI or Anthropic, where all content is sent to third-party servers.
Any open-source LLM: Llama 3, Mistral, Mixtral, DeepSeek, Qwen, Gemma, Phi-3 and others. The Ollama, vLLM and llama.cpp tools come installed. The Pro plan also includes Hugging Face Transformers, Accelerate and PEFT for fine-tuning.
Yes, on the Pro plan (96 GB GPU). It supports LoRA, QLoRA, DPO and DeepSpeed — you adapt an open-source model to your data and domain. The Inference plan is focused on serving models, not training.
Yes — human support 24/7, with people who understand CUDA, Ollama, vLLM and fine-tuning. Rollin Host is a Brazilian company (Rollin Serviços Digitais e Tecnologia LTDA).
Comece em 5 minutos. Migração gratuita, suporte 24/7 em português e garantia de reembolso em 7 dias.
Usamos cookies para analisar o tráfego, melhorar sua experiência e personalizar conteúdo. Você decide o que aceitar — consulte a Política de Cookies.
Escolha quais categorias você permite. Os cookies necessários são essenciais para o site funcionar e não podem ser desativados.
Essenciais para navegação, segurança e funcionamento básico do site. Não rastreiam você.
Ajudam a entender, de forma anônima, como os visitantes usam o site (Google Analytics).
Permitem medir a eficácia de campanhas e exibir anúncios relevantes (Meta Pixel).