Cloud server preconfigured with Ollama, Open WebUI and RAG. Run Llama 3, Mistral and Mixtral on your server — no per-message billing, no rate limit, no data sent outside.
Rollin Host AI Cloud Server is a VPS preconfigured with Ollama, Open WebUI, pgvector and Qdrant to run open-source LLMs (Llama 3, Mistral, Mixtral, Qwen) on your own server. 5 plans from US$ 12 to US$ 140/month, with up to 120 GB of RAM to run Mixtral 8x22B. Automatic 5-minute setup, OpenAI-compatible REST API, 100% private data (LGPD) and 24/7 human support.
5 plans · automatic preinstall
Choose by the model you want to run
Each plan comes with a VPS sized for the recommended model. You can switch models at any time.
Self-hosted AI makes sense when privacy, predictable cost and independence matter more than GPT's latest feature.
Total privacy
Your data never leaves the server. Zero sending to OpenAI, Anthropic or Google. Critical for healthcare, legal, financial and any sensitive data.
Fixed monthly cost
You pay only the server. No per-token billing, no end-of-month surprises. US$ 12 or US$ 120 flat, regardless of 1k or 100M tokens.
No rate limit · no queue
Dedicated model, dedicated processing. Run big batches without waiting for OpenAI quota or paying premium tier.
Open-source models
Llama 3, Mistral, Mixtral, Qwen, DeepSeek — the whole open-source family runs natively via Ollama. Switch models in seconds.
Included stack
Everything preconfigured · access from browser in 5 minutes
Automatic setup: you sign up, get credentials, open Open WebUI and you're already chatting with AI.
Ollama
Open-source model manager. Download, run and switch models with 1 command. Supports Llama 3, Mistral, Mixtral, Qwen, Phi, Gemma, DeepSeek and dozens more.
Open WebUI
ChatGPT-like interface for your team in the browser. Saved chats, multi-user, document upload for RAG, shared prompts.
pgvector / Qdrant
Vector database for RAG. Index your documents and the AI answers based on your content, citing the source. Plus+ has dedicated Qdrant.
n8n · automation
Plus and Master include n8n integrated. Connect your AI to Gmail, WhatsApp, Sheets, CRM, ERP — visual workflows without code.
Compatibility
Which models run on each tier
Ollama has 100+ models. The table shows the sweet spot per tier.
Plan
Recommended models
Approx. speed
Use cases
Start
Llama 3.2 3B · Phi-3 Mini · Gemma 2B
15-30 tok/s
Simple chatbot, data extraction, classification
Pro
Llama 3 8B · Mistral 7B · Qwen 2.5 7B
8-15 tok/s
Customer support, RAG over docs, light agents
Plus
Llama 3 8B · Qwen 2.5 14B · CodeLlama 13B
5-10 tok/s
Corporate RAG, multi-project, n8n automation
Master
Mixtral 8x7B · Llama 3 70B Q4 · Qwen 2.5 32B
3-8 tok/s
Complex analysis, multiple models simultaneously
Enterprise
Llama 3 70B · Mixtral 8x22B · DeepSeek V3
2-5 tok/s
Compliance, isolation, audit
When self-hosted wins
Self-hosted vs OpenAI API · when it makes sense
Simple math: at how many tokens/month the server's fixed cost beats OpenAI's variable cost.
Volume
OpenAI GPT-4o-mini
Rollin IA Cloud
Veredito
100k tokens/month
US$ 1.5
US$ 12
OpenAI wins (low volume)
1M tokens/month
US$ 16
US$ 12
Tied (Start covers it)
10M tokens/month
US$ 160
US$ 26
Self-hosted Pro 6× cheaper
100M tokens/month
US$ 1,600
US$ 78
Self-hosted Master 20× cheaper
Sensitive data
Not applicable
US$ 120
Self-hosted Enterprise · only viable option
Why choose Rollin AI Cloud over Together.ai, Replicate or RunPod
It is a VPS preconfigured with Ollama, Open WebUI, pgvector and Qdrant to run open-source LLMs (Llama 3, Mistral, Mixtral, Qwen, DeepSeek) on your own server. 5 plans from US$ 12 to US$ 140/month, with automatic 5-minute setup. You access via browser (ChatGPT-like) and via OpenAI-compatible REST API.
How much does it cost to run self-hosted AI on Rollin Host?
From US$ 12/month (Start, 12 GB RAM, Llama 3.2 3B) to US$ 140/month (Enterprise, 120 GB RAM, Mixtral 8x22B). Most popular is Pro at US$ 26/month — 24 GB RAM, runs Llama 3 8B with RAG via pgvector. No lock-in, no per-token billing.
Is it slow without GPU?
Yes, slower than dedicated GPU. Llama 3 8B on CPU does 8-15 tokens/second — usable for chat and automation, slow for realtime streaming.
Is self-hosted worth it vs OpenAI?
Worth it from ~1 million tokens/month (Start US$ 12 already matches US$ 16 of OpenAI at the same volume). At 10M tokens/month, Pro is 6x cheaper; at 100M tokens, Master is 20x cheaper. For sensitive data (LGPD), self-hosted is the only viable option.
Can I switch models later?
Yes, at no cost. Ollama has 100+ models. Switch by command, only limit is disk space.
How does RAG work?
From the Pro plan it comes with pgvector. Upload documents, the system generates embeddings and indexes. AI answers citing the source.
Can I integrate with WhatsApp / n8n?
Yes, from Plus it comes with n8n with native Ollama node. Common workflows: classify tickets, respond to leads, transcribe audios.
Is fine-tuning possible?
Full fine-tuning needs GPU. Light LoRA (3B-7B) runs on Master/Enterprise with a few training hours.
What data stays on the server?
EVERYTHING: prompts, responses, RAG documents, history. Zero sending to third parties.
Can I connect via API (like OpenAI)?
Yes. Ollama exposes a REST API 100% compatible with OpenAI. Just point the SDK to http://your-server/v1.
How do I migrate from OpenAI to the AI Cloud Server?
In 3 steps: 1) sign up for the plan (Pro is most common to start), 2) change the OpenAI SDK base_url to your server URL, 3) pick the equivalent Ollama model (Llama 3 8B replaces GPT-3.5; Mixtral replaces GPT-4 in many cases). Average migration time: 1-2 hours.
Does the server crash if the model hangs?
No. Ollama runs as an isolated process, monitored via systemd. Restarts on its own if it hangs.
Is Rollin Host reliable for self-hosted AI?
Yes — Rollin Serviços Digitais e Tecnologia LTDA is a Brazilian company with a Tier III datacenter in São Paulo, NF-e, billing in BRL and 24/7 human support. First Brazilian cloud specialized in AI.
Pronto pra hospedar seu projeto de IA?
Comece em 5 minutos. Migração gratuita, suporte 24/7 em português e garantia de reembolso em 7 dias.