Migração 100% grátis + 1 mês grátis com cupom MIGRAR1MES · novos clientes em planos até R$ 200/mês Migrar agora
AI with your data · zero send to OpenAI/Google

Your own AI, on your server.

Cloud server preconfigured with Ollama, Open WebUI and RAG. Run Llama 3, Mistral and Mixtral on your server — no per-message billing, no rate limit, no data sent outside.

5 plans · automatic preinstall

Choose by the model you want to run

Each plan comes with a VPS sized for the recommended model. You can switch models at any time.

Why self-hosted

Four reasons to stop paying OpenAI per message

Self-hosted AI makes sense when privacy, predictable cost and independence matter more than GPT's latest feature.

Total privacy

Your data never leaves the server. Zero sending to OpenAI, Anthropic or Google. Critical for healthcare, legal, financial and any sensitive data.

Fixed monthly cost

You pay only the server. No per-token billing, no end-of-month surprises. US$ 12 or US$ 120 flat, regardless of 1k or 100M tokens.

No rate limit · no queue

Dedicated model, dedicated processing. Run big batches without waiting for OpenAI quota or paying premium tier.

Open-source models

Llama 3, Mistral, Mixtral, Qwen, DeepSeek — the whole open-source family runs natively via Ollama. Switch models in seconds.

Included stack

Everything preconfigured · access from browser in 5 minutes

Automatic setup: you sign up, get credentials, open Open WebUI and you're already chatting with AI.

Ollama

Open-source model manager. Download, run and switch models with 1 command. Supports Llama 3, Mistral, Mixtral, Qwen, Phi, Gemma, DeepSeek and dozens more.

Open WebUI

ChatGPT-like interface for your team in the browser. Saved chats, multi-user, document upload for RAG, shared prompts.

pgvector / Qdrant

Vector database for RAG. Index your documents and the AI answers based on your content, citing the source. Plus+ has dedicated Qdrant.

n8n · automation

Plus and Master include n8n integrated. Connect your AI to Gmail, WhatsApp, Sheets, CRM, ERP — visual workflows without code.

Compatibility

Which models run on each tier

Ollama has 100+ models. The table shows the sweet spot per tier.

PlanRecommended modelsApprox. speedUse cases
Start Llama 3.2 3B · Phi-3 Mini · Gemma 2B 15-30 tok/s Simple chatbot, data extraction, classification
Pro Llama 3 8B · Mistral 7B · Qwen 2.5 7B 8-15 tok/s Customer support, RAG over docs, light agents
Plus Llama 3 8B · Qwen 2.5 14B · CodeLlama 13B 5-10 tok/s Corporate RAG, multi-project, n8n automation
Master Mixtral 8x7B · Llama 3 70B Q4 · Qwen 2.5 32B 3-8 tok/s Complex analysis, multiple models simultaneously
Enterprise Llama 3 70B · Mixtral 8x22B · DeepSeek V3 2-5 tok/s Compliance, isolation, audit
When self-hosted wins

Self-hosted vs OpenAI API · when it makes sense

Simple math: at how many tokens/month the server's fixed cost beats OpenAI's variable cost.

VolumeOpenAI GPT-4o-miniRollin IA CloudVeredito
100k tokens/month US$ 1.5 US$ 12 OpenAI wins (low volume)
1M tokens/month US$ 16 US$ 12 Tied (Start covers it)
10M tokens/month US$ 160 US$ 26 Self-hosted Pro 6× cheaper
100M tokens/month US$ 1,600 US$ 78 Self-hosted Master 20× cheaper
Sensitive data Not applicable US$ 120 Self-hosted Enterprise · only viable option
Frequently asked questions

Folks new to self-hosted AI usually ask:

Another question? Open a ticket.

Is it slow without GPU?

Yes, slower than dedicated GPU. Llama 3 8B on CPU does 8-15 tokens/second — usable for chat and automation, slow for realtime streaming.

Can I switch models later?

Yes, at no cost. Ollama has 100+ models. Switch by command, only limit is disk space.

How does RAG work?

From the Pro plan it comes with pgvector. Upload documents, the system generates embeddings and indexes. AI answers citing the source.

Can I integrate with WhatsApp / n8n?

Yes, from Plus it comes with n8n with native Ollama node. Common workflows: classify tickets, respond to leads, transcribe audios.

Is fine-tuning possible?

Full fine-tuning needs GPU. Light LoRA (3B-7B) runs on Master/Enterprise with a few training hours.

What data stays on the server?

EVERYTHING: prompts, responses, RAG documents, history. Zero sending to third parties.

Can I connect via API (like OpenAI)?

Yes. Ollama exposes a REST API 100% compatible with OpenAI. Just point the SDK to http://your-server/v1.

Does the server crash if the model hangs?

No. Ollama runs as an isolated process, monitored via systemd. Restarts on its own if it hangs.

Pronto pra hospedar seu projeto de IA?

Comece em 5 minutos. Migração gratuita, suporte 24/7 em português e garantia de reembolso em 7 dias.