Migração 100% grátis + 1 mês grátis com cupom MIGRAR1MES · novos clientes em planos até R$ 200/mês Migrar agora
AI with your data · zero send to OpenAI/Google

Your own AI, on your server.

Cloud server preconfigured with Ollama, Open WebUI and RAG. Run Llama 3, Mistral and Mixtral on your server — no per-message billing, no rate limit, no data sent outside.

Rollin Host AI Cloud Server is a VPS preconfigured with Ollama, Open WebUI, pgvector and Qdrant to run open-source LLMs (Llama 3, Mistral, Mixtral, Qwen) on your own server. 5 plans from US$ 12 to US$ 140/month, with up to 120 GB of RAM to run Mixtral 8x22B. Automatic 5-minute setup, OpenAI-compatible REST API, 100% private data (LGPD) and 24/7 human support.

5 plans · automatic preinstall

Choose by the model you want to run

Each plan comes with a VPS sized for the recommended model. You can switch models at any time.

Why self-hosted

Four reasons to stop paying OpenAI per message

Self-hosted AI makes sense when privacy, predictable cost and independence matter more than GPT's latest feature.

Total privacy

Your data never leaves the server. Zero sending to OpenAI, Anthropic or Google. Critical for healthcare, legal, financial and any sensitive data.

Fixed monthly cost

You pay only the server. No per-token billing, no end-of-month surprises. US$ 12 or US$ 120 flat, regardless of 1k or 100M tokens.

No rate limit · no queue

Dedicated model, dedicated processing. Run big batches without waiting for OpenAI quota or paying premium tier.

Open-source models

Llama 3, Mistral, Mixtral, Qwen, DeepSeek — the whole open-source family runs natively via Ollama. Switch models in seconds.

Included stack

Everything preconfigured · access from browser in 5 minutes

Automatic setup: you sign up, get credentials, open Open WebUI and you're already chatting with AI.

Ollama

Open-source model manager. Download, run and switch models with 1 command. Supports Llama 3, Mistral, Mixtral, Qwen, Phi, Gemma, DeepSeek and dozens more.

Open WebUI

ChatGPT-like interface for your team in the browser. Saved chats, multi-user, document upload for RAG, shared prompts.

pgvector / Qdrant

Vector database for RAG. Index your documents and the AI answers based on your content, citing the source. Plus+ has dedicated Qdrant.

n8n · automation

Plus and Master include n8n integrated. Connect your AI to Gmail, WhatsApp, Sheets, CRM, ERP — visual workflows without code.

Compatibility

Which models run on each tier

Ollama has 100+ models. The table shows the sweet spot per tier.

PlanRecommended modelsApprox. speedUse cases
Start Llama 3.2 3B · Phi-3 Mini · Gemma 2B 15-30 tok/s Simple chatbot, data extraction, classification
Pro Llama 3 8B · Mistral 7B · Qwen 2.5 7B 8-15 tok/s Customer support, RAG over docs, light agents
Plus Llama 3 8B · Qwen 2.5 14B · CodeLlama 13B 5-10 tok/s Corporate RAG, multi-project, n8n automation
Master Mixtral 8x7B · Llama 3 70B Q4 · Qwen 2.5 32B 3-8 tok/s Complex analysis, multiple models simultaneously
Enterprise Llama 3 70B · Mixtral 8x22B · DeepSeek V3 2-5 tok/s Compliance, isolation, audit
When self-hosted wins

Self-hosted vs OpenAI API · when it makes sense

Simple math: at how many tokens/month the server's fixed cost beats OpenAI's variable cost.

VolumeOpenAI GPT-4o-miniRollin IA CloudVeredito
100k tokens/month US$ 1.5 US$ 12 OpenAI wins (low volume)
1M tokens/month US$ 16 US$ 12 Tied (Start covers it)
10M tokens/month US$ 160 US$ 26 Self-hosted Pro 6× cheaper
100M tokens/month US$ 1,600 US$ 78 Self-hosted Master 20× cheaper
Sensitive data Not applicable US$ 120 Self-hosted Enterprise · only viable option

Why choose Rollin AI Cloud over Together.ai, Replicate or RunPod

FeatureRollin AI CloudTogether.aiReplicateRunPod
Billing Fixed monthly (US$ 12-140) Per token Per second Per GPU hour
Data stays on Your server (LGPD) Their infra (US) Their infra (US) Allocated pod
Stack included Ollama + WebUI + RAG API only API only You install
Runs on CPU Yes (all plans) GPU only GPU only GPU only
BR billing NF-e + PIX USD USD USD
Human support 24/7 English only English only English only

AI Cloud Server in numbers

  • DatacenterSão Paulo, Brazil (Tier III)
  • HardwareAMD EPYC + NVMe RAID
  • Plans5 (Start US$ 12 → Enterprise US$ 140)
  • Max RAM120 GB (runs Mixtral 8x22B)
  • Preinstalled stackOllama + Open WebUI + pgvector + Qdrant + n8n
  • SetupAutomatic in 5 minutes (cloud-init)
  • Supported models100+ via Ollama
  • APIREST 100% OpenAI-compatible
Frequently asked questions

Folks new to self-hosted AI usually ask:

Another question? Open a ticket.

What is Rollin Host's AI Cloud Server?

It is a VPS preconfigured with Ollama, Open WebUI, pgvector and Qdrant to run open-source LLMs (Llama 3, Mistral, Mixtral, Qwen, DeepSeek) on your own server. 5 plans from US$ 12 to US$ 140/month, with automatic 5-minute setup. You access via browser (ChatGPT-like) and via OpenAI-compatible REST API.

How much does it cost to run self-hosted AI on Rollin Host?

From US$ 12/month (Start, 12 GB RAM, Llama 3.2 3B) to US$ 140/month (Enterprise, 120 GB RAM, Mixtral 8x22B). Most popular is Pro at US$ 26/month — 24 GB RAM, runs Llama 3 8B with RAG via pgvector. No lock-in, no per-token billing.

Is it slow without GPU?

Yes, slower than dedicated GPU. Llama 3 8B on CPU does 8-15 tokens/second — usable for chat and automation, slow for realtime streaming.

Is self-hosted worth it vs OpenAI?

Worth it from ~1 million tokens/month (Start US$ 12 already matches US$ 16 of OpenAI at the same volume). At 10M tokens/month, Pro is 6x cheaper; at 100M tokens, Master is 20x cheaper. For sensitive data (LGPD), self-hosted is the only viable option.

Can I switch models later?

Yes, at no cost. Ollama has 100+ models. Switch by command, only limit is disk space.

How does RAG work?

From the Pro plan it comes with pgvector. Upload documents, the system generates embeddings and indexes. AI answers citing the source.

Can I integrate with WhatsApp / n8n?

Yes, from Plus it comes with n8n with native Ollama node. Common workflows: classify tickets, respond to leads, transcribe audios.

Is fine-tuning possible?

Full fine-tuning needs GPU. Light LoRA (3B-7B) runs on Master/Enterprise with a few training hours.

What data stays on the server?

EVERYTHING: prompts, responses, RAG documents, history. Zero sending to third parties.

Can I connect via API (like OpenAI)?

Yes. Ollama exposes a REST API 100% compatible with OpenAI. Just point the SDK to http://your-server/v1.

How do I migrate from OpenAI to the AI Cloud Server?

In 3 steps: 1) sign up for the plan (Pro is most common to start), 2) change the OpenAI SDK base_url to your server URL, 3) pick the equivalent Ollama model (Llama 3 8B replaces GPT-3.5; Mixtral replaces GPT-4 in many cases). Average migration time: 1-2 hours.

Does the server crash if the model hangs?

No. Ollama runs as an isolated process, monitored via systemd. Restarts on its own if it hangs.

Is Rollin Host reliable for self-hosted AI?

Yes — Rollin Serviços Digitais e Tecnologia LTDA is a Brazilian company with a Tier III datacenter in São Paulo, NF-e, billing in BRL and 24/7 human support. First Brazilian cloud specialized in AI.

Pronto pra hospedar seu projeto de IA?

Comece em 5 minutos. Migração gratuita, suporte 24/7 em português e garantia de reembolso em 7 dias.