AI with your data · zero send to OpenAI/Google

Your own AI, on your server.

Cloud server preconfigured with Ollama, Open WebUI and RAG. Run Llama 3, Mistral and Mixtral on your server — no per-message billing, no rate limit, no data sent outside.

See the 5 plans Why self-hosted?

Rollin Host AI Cloud Server is a VPS preconfigured with Ollama, Open WebUI, pgvector and Qdrant to run open-source LLMs (Llama 3, Mistral, Mixtral, Qwen) on your own server. 5 plans from US$ 16 to US$ 143/month, with up to 96 GB of RAM to run Llama 3 70B Q6 or Mixtral 8x22B Q4. Automatic 5-minute setup, OpenAI-compatible REST API, 100% private data (LGPD) and 24/7 human support.

5 plans · automatic preinstall

Choose by the model you want to run

Each plan comes with a VPS sized for the recommended model. You can switch models at any time.

Start

IA Cloud Start

US$ 16.18/mo

no contract · cancel anytime

Get started Talk to a human

8 GB RAM · 4 vCPU AMD EPYC · 75 GB NVMe
Ollama + Open WebUI pré-instalados
Roda Phi-3 Mini · Gemma 2B · Llama 3.2 3B Q4
API REST 100% compatível com OpenAI
HTTPS automático com seu subdomínio
Backup diário dos modelos e chats
Suporte humano 24/7 em PT-BR

Mais escolhido

IA Cloud Pro

US$ 23.45/mo

no contract · cancel anytime

Get started Talk to a human

12 GB RAM · 6 vCPU AMD EPYC · 100 GB NVMe
Tudo do Start + pgvector pra RAG
Roda Llama 3.2 3B · Mistral 7B Q4 · Qwen 2.5 7B Q4
Upload de documentos com indexação automática
Multi-usuário no Open WebUI com permissões
API com chave própria por equipe

Plus

IA Cloud Plus

US$ 41.64/mo

no contract · cancel anytime

Get started Talk to a human

24 GB RAM · 8 vCPU AMD EPYC · 200 GB NVMe
Tudo do Pro + Qdrant + n8n
Roda Llama 3 8B · Mistral 7B FP16 · Qwen 2.5 14B Q4
Workflows n8n com nó Ollama nativo
RAG corporativo multi-projeto
Backup criptografado em disco

Master

IA Cloud Master

US$ 108.91/mo

no contract · cancel anytime

Get started Talk to a human

64 GB RAM · 16 vCPU AMD EPYC · 300 GB NVMe
Tudo do Plus + LiteLLM proxy multi-modelo
Roda Mixtral 8x7B · Llama 3 70B Q4 · Qwen 32B
Múltiplos modelos em paralelo (8B-13B) ou 1 grande
Rate limiting e balanceamento próprios
Suporte prioritário com SLA 4h

Enterprise

IA Cloud Enterprise

US$ 143.45/mo

no contract · cancel anytime

Get started Talk to a human

96 GB RAM · 18 vCPU AMD EPYC · 350 GB NVMe
Tudo do Master + logs auditáveis LGPD
Roda Llama 3 70B Q6 · Mixtral 8x22B Q4 · DeepSeek V3 Q4
Isolamento de rede + criptografia em disco
Conformidade pra bancos / jurídico / saúde
SLA 99,9% contratual · onboarding dedicado

Why self-hosted

Four reasons to stop paying OpenAI per message

Self-hosted AI makes sense when privacy, predictable cost and independence matter more than GPT's latest feature.

Total privacy

Your data never leaves the server. Zero sending to OpenAI, Anthropic or Google. Critical for healthcare, legal, financial and any sensitive data.

Fixed monthly cost

You pay only the server. No per-token billing, no end-of-month surprises. US$ 16 or US$ 143 flat, regardless of 1k or 100M tokens.

No rate limit · no queue

Dedicated model, dedicated processing. Run big batches without waiting for OpenAI quota or paying premium tier.

Open-source models

Llama 3, Mistral, Mixtral, Qwen, DeepSeek — the whole open-source family runs natively via Ollama. Switch models in seconds.

Included stack

Everything preconfigured · access from browser in 5 minutes

Automatic setup: you sign up, get credentials, open Open WebUI and you're already chatting with AI.

Ollama

Open-source model manager. Download, run and switch models with 1 command. Supports Llama 3, Mistral, Mixtral, Qwen, Phi, Gemma, DeepSeek and dozens more.

Open WebUI

ChatGPT-like interface for your team in the browser. Saved chats, multi-user, document upload for RAG, shared prompts.

pgvector / Qdrant

Vector database for RAG. Index your documents and the AI answers based on your content, citing the source. Plus+ has dedicated Qdrant.

n8n · automation

Plus and Master include n8n integrated. Connect your AI to Gmail, WhatsApp, Sheets, CRM, ERP — visual workflows without code.

Compatibility

Which models run on each tier

Ollama has 100+ models. The table shows the sweet spot per tier.

Plan	Recommended models	Approx. speed	Use cases
Start (8 GB)	Phi-3 Mini · Gemma 2B · Llama 3.2 3B Q4	20-40 tok/s	Simple chatbot, data extraction, classification
Pro (12 GB)	Llama 3.2 3B · Mistral 7B Q4 · Qwen 7B Q4	12-22 tok/s	Customer support, RAG over docs, light agents
Plus (24 GB)	Llama 3 8B · Mistral 7B FP16 · Qwen 14B Q4	8-15 tok/s	Corporate RAG, multi-project, n8n automation
Master (64 GB)	Mixtral 8x7B · Llama 3 70B Q4 · Qwen 32B	4-9 tok/s	Complex analysis, multiple models simultaneously
Enterprise (96 GB)	Llama 3 70B Q6 · Mixtral 8x22B Q4 · DeepSeek V3 Q4	2-6 tok/s	Compliance, isolation, audit

When self-hosted wins

Self-hosted vs OpenAI API · when it makes sense

Simple math: at how many tokens/month the server's fixed cost beats OpenAI's variable cost.

Volume	OpenAI GPT-4o-mini	Rollin IA Cloud	Veredito
100k tokens/month	US$ 1.5	US$ 16	OpenAI wins (low volume)
1M tokens/month	US$ 16	US$ 16	Tied (Start covers it)
10M tokens/month	US$ 160	US$ 23	Self-hosted Pro 7× cheaper
100M tokens/month	US$ 1,600	US$ 109	Self-hosted Master 15× cheaper
Sensitive data	Not applicable	US$ 143	Self-hosted Enterprise · only viable option

Why choose Rollin AI Cloud over Together.ai, Replicate or RunPod

Feature	Rollin AI Cloud	Together.ai	Replicate	RunPod
Billing	Fixed monthly (US$ 16-143)	Per token	Per second	Per GPU hour
Data stays on	Your server (LGPD)	Their infra (US)	Their infra (US)	Allocated pod
Stack included	Ollama + WebUI + RAG	API only	API only	You install
Runs on CPU	Yes (all plans)	GPU only	GPU only	GPU only
BR billing	NF-e + PIX	USD	USD	USD
Human support	24/7	English only	English only	English only

AI Cloud Server in numbers

DatacenterSão Paulo, Brazil (Tier III)
HardwareAMD EPYC + NVMe RAID
Plans5 (Start US$ 16 → Enterprise US$ 143)
Max RAM96 GB (runs Llama 3 70B Q6 or Mixtral 8x22B Q4)
Preinstalled stackOllama + Open WebUI + pgvector + Qdrant + n8n
SetupAutomatic in 5 minutes (cloud-init)
Supported models100+ via Ollama
APIREST 100% OpenAI-compatible

Frequently asked questions

Folks new to self-hosted AI usually ask:

Another question? Open a ticket.

What is Rollin Host's AI Cloud Server?

It is a VPS preconfigured with Ollama, Open WebUI, pgvector and Qdrant to run open-source LLMs (Llama 3, Mistral, Mixtral, Qwen, DeepSeek) on your own server. 5 plans from US$ 16 to US$ 143/month, with automatic 5-minute setup. You access via browser (ChatGPT-like) and via OpenAI-compatible REST API.

How much does it cost to run self-hosted AI on Rollin Host?

From US$ 16/month (Start, 8 GB RAM, Phi-3 Mini) to US$ 143/month (Enterprise, 96 GB RAM, Llama 3 70B Q6 or Mixtral 8x22B Q4). Most popular is Pro at US$ 23/month — 12 GB RAM, runs Llama 3.2 3B with RAG via pgvector. No lock-in, no per-token billing.

Is it slow without GPU?

Yes, slower than dedicated GPU. Llama 3 8B on CPU does 8-15 tokens/second — usable for chat and automation, slow for realtime streaming.

Is self-hosted worth it vs OpenAI?

Worth it from ~1 million tokens/month (Start US$ 16 already matches US$ 16 of OpenAI at the same volume). At 10M tokens/month, Pro is 7x cheaper; at 100M tokens, Master is 15x cheaper. For sensitive data (LGPD), self-hosted is the only viable option.

Can I switch models later?

Yes, at no cost. Ollama has 100+ models. Switch by command, only limit is disk space.

How does RAG work?

From the Pro plan it comes with pgvector. Upload documents, the system generates embeddings and indexes. AI answers citing the source.

Can I integrate with WhatsApp / n8n?

Yes, from Plus it comes with n8n with native Ollama node. Common workflows: classify tickets, respond to leads, transcribe audios.

Is fine-tuning possible?

Full fine-tuning needs GPU. Light LoRA (3B-7B) runs on Master/Enterprise with a few training hours.

What data stays on the server?

EVERYTHING: prompts, responses, RAG documents, history. Zero sending to third parties.

Can I connect via API (like OpenAI)?

Yes. Ollama exposes a REST API 100% compatible with OpenAI. Just point the SDK to http://your-server/v1.

How do I migrate from OpenAI to the AI Cloud Server?

In 3 steps: 1) sign up for the plan (Pro is most common to start), 2) change the OpenAI SDK base_url to your server URL, 3) pick the equivalent Ollama model (Llama 3 8B replaces GPT-3.5; Mixtral replaces GPT-4 in many cases). Average migration time: 1-2 hours.

Does the server crash if the model hangs?

No. Ollama runs as an isolated process, monitored via systemd. Restarts on its own if it hangs.

Is Rollin Host reliable for self-hosted AI?

Yes — Rollin Serviços Digitais e Tecnologia LTDA is a Brazilian company running on Tier III international datacenters (Europe and US) with a CDN in Brazil, NF-e, billing in BRL and 24/7 human support. First Brazilian cloud specialized in AI.

Pronto pra hospedar seu projeto de IA?

Comece em 5 minutos. Migração gratuita, suporte 24/7 em português e garantia de reembolso de 7 dias (30 dias em hospedagem de sites e WordPress).

Contratar agora Falar no WhatsApp