Janitor AI
Proxy & Model Guide
Every method available to connect Janitor AI to a powerful external model — from completely free options to self-hosted setups. Pick the section that matches your situation.
Lite Router
Fastest free setup. One URL, one key, many models. No payment required to get started. Takes five minutes.
DeepSeek Direct API
Cheapest paid option. No daily caps, highest reliability. ~$0.14/million tokens. $5 lasts thousands of messages.
Free Providers
Google AI Studio and OpenRouter both offer free API access with generous limits. Great for variety and redundancy.
Paid Providers
NanoGPT is the top pick — wide selection, competitive pricing. Alternatives like Chutes AI also covered.
AIClient-2-API
Self-hosted local server. Bridges CLI access from multiple pool providers into a usable API endpoint.
Extras
Troubleshooting desk, model comparisons, AI parameters explained, OOC prompting, and LoreBary.
Glossary — Learn the Language
Before touching any settings, understand these words. They will appear everywhere in this guide and across every Janitor AI community.
sk-abc123...). Treat it like a password — never share it publicly.429 Too Many Requests error. Paid tiers usually remove or raise this limit significantly.Lite Router — Best Free Start
The fastest way to get a working proxy right now. Lite Router aggregates many models behind one URL and one key. No payment required.
What is Lite Router? Free
Lite Router acts as a middleman: Janitor AI sends your request to Lite Router's endpoint, Lite Router forwards it to the actual model, and returns the response. Your key is what authorises the flow. You pick any model from their list without changing your proxy URL — just swap the model name.
Creating an Account & API Key
Never share your API key. Anyone who has it can make requests under your account, consuming your quota or causing your access to be revoked. Do not post it in Discord, Reddit, or anywhere public.
Proxy Setup in JanitorAI
Open JanitorAI → hamburger menu (top-left) → API Settings → Proxy tab → Create New. Fill in the fields exactly as below:
| Field | What to Enter |
|---|---|
| Proxy Name | Anything you like (e.g., "Lite Router") |
| Model Name | The exact model ID from the Available Models tab (e.g., glm-4.7). Spelling and casing must match. |
| Proxy URL | https://api.literouter.com/v1/chat/completions |
| API Key | The key you copied from your Lite Router dashboard |
Save → Refresh the page. If you get an error, double-check the model name spelling and ensure your API key was pasted with no extra spaces.
DeepSeek Official API
The cheapest reliable paid option. No daily caps, fastest possible response times, and highest reliability because you are going directly to the source.
Approximately $0.14 per million tokens for DeepSeek V3 (Chat). A $5 top-up realistically lasts thousands of messages. Minimum deposit is $2. Compared to OpenAI or Anthropic's direct pricing, this is an order of magnitude cheaper.
Step-by-Step Setup
sk-.| Field | What to Enter |
|---|---|
| Config Name | Official DeepSeek (or anything) |
| Model Name | deepseek-chat for creative roleplay, deepseek-reasoner for logic-focused |
| Proxy URL | https://api.deepseek.com/v1/chat/completions |
| API Key | Your DeepSeek key (starts with sk-) |
Save → Refresh → Test a message. Enable Text Streaming in your Janitor generation settings for the best experience.
Free Providers
Both Google AI Studio and OpenRouter offer free API access with generous limits. Good options for variety or as a fallback when Lite Router is busy.
Google AI Studio Free Tier
| Field | What to Enter |
|---|---|
| Proxy Name | Google Gemini (or anything) |
| Model Name | gemini-1.5-flash or gemini-2.5-flash |
| Proxy URL | https://generativelanguage.googleapis.com/v1beta/openai/ |
| API Key | Your Google AI Studio key |
OpenRouter Free Tier
One account, access to dozens of models. Several are permanently free.
openrouter.ai/keys. Click Create Key, name it, copy it.openrouter.ai/models and filter by Free. Copy the full model ID exactly (e.g., google/gemini-flash-1.5).| Field | What to Enter |
|---|---|
| Proxy Name | OpenRouter (or anything) |
| Model Name | The full model ID from their list (e.g., meta-llama/llama-3.1-8b-instruct:free) |
| Proxy URL | https://openrouter.ai/api/v1 |
| API Key | Your OpenRouter API key |
Note: Free models on OpenRouter often have rate limits or smaller context windows. If one stops working, simply swap the model name in your proxy settings for another free model from their list.
Paid Providers
When free options hit their limits, paid providers give you full access without caps. NanoGPT is the top recommendation.
NanoGPT Top Pick
NanoGPT uses a flexible credit system — top up as little or as much as you want and spend it across any of their models. Their monthly plans also give unified access to almost all major models (Gemini, Claude, OpenAI) from one account, often at a fraction of maintaining separate subscriptions.
| Field | What to Enter |
|---|---|
| Proxy Name | NanoGPT (or anything) |
| Model Name | Model ID from their list (e.g., gpt-4o, claude-3-5-sonnet) |
| Proxy URL | https://nano-gpt.com/api/v1 |
| API Key | Your NanoGPT API key |
Other Paid Alternatives
Competitive alternative to NanoGPT with a similar credit-based model. Specialises in open-source and some proprietary models. Setup is identical: account → add credits → generate key → paste into Janitor AI proxy with their base URL. Find the exact model IDs in their documentation.
By adding credits to your free OpenRouter account, you unlock premium models including GPT-4o and Claude. Proxy setup remains exactly the same as the free tier — you simply gain access to the expanded model list.
AIClient-2-API — Self-Hosted
A self-hosted approach that lets you run a local server bridging many providers' CLI access into a usable API endpoint. Requires some terminal comfort. Highly effective once set up.
github.com/justlovemaki/AIClient-2-API — Simulates Gemini CLI, Grok, and Kiro client requests, compatible with the OpenAI API format. Supports thousands of Gemini requests per day and free use of the built-in Claude model via Kiro.
Prerequisites
Verify: open Command Prompt and run
node --version. You should see something like v20.x.x.Installation
Windows tip: use Command Prompt (search "cmd" in Start menu). To paste: Right-Click inside the black window.
git clone https://github.com/justlovemaki/AIClient-2-API.gitWait until the text stops moving and the blinking cursor returns.
cd AIClient-2-APInpm installThis takes a minute. Wait for the cursor to return.
npm startDo not close this terminal window. The server only works while this window is open. If you see "Listening on port 3000", you have succeeded.
Connecting to JanitorAI
While the terminal is running, open your browser on the same machine and go to http://localhost:3000. Default password: admin123.
The red (0/0 available) indicator should turn green and show (1/1 available) once authenticated.
Current Pool URLs
Gemini CLI OAuth — Same models as Google AI Studio but with far more generous quotas:
http://localhost:3000/gemini-cli-oauth/v1/chat/completionsClaude Kiro OAuth — Claude models via Kiro:
http://localhost:3000/claude-kiro-oauth/v1/chat/completionsWhen using Claude models through this pool, turn OFF Text Streaming in your Janitor Generation / Proxy settings to prevent response bugs.
| Field | What to Enter |
|---|---|
| Proxy Name | AIClient Local |
| Model Name | Model from the Web UI (e.g., claude-opus-4, gemini-2.5-flash) |
| Proxy URL | http://localhost:3000/ + the specific pool endpoint |
| API Key | The key generated in the Configuration tab |
Maintenance & Restart
As long as the CMD terminal shows \AIClient-2-API> with npm start running, your proxy works. To restart after closing:
Win → cmd → Enter
cd AIClient-2-API → Enter
npm start → EnterTroubleshooting AIClient
| Error | Cause | Fix |
|---|---|---|
| "Connection refused" | Server is not running | Navigate to the folder in CMD and run npm start again |
| "Unauthorised" | OAuth session expired | Run gen Auth again in the web UI to re-authenticate |
| "Wrong port" | Server started on a different port | Check the terminal output when npm start ran and update your proxy URL accordingly |
Is It Safe to Use?
A common concern when running open-source tools: "Can the developer see my API keys or messages?" Here is the technical reality.
It is a Local-First Tool
When you run those command scripts, you are telling your laptop to act as a tiny private post office — just for you. There is no "AIClient-2-API" central cloud server that your data passes through first. Your browser talks to your laptop, and your laptop talks directly to Google or Anthropic. The developer is not sitting in the middle watching your traffic.
The Community Audit Effect
The repository has over 6,800 GitHub stars and is actively watched by developers worldwide. For a "phone home" exploit to survive in a project this popular, the developer would need to hide a secret line of code that sends your keys to a private server. In a project with this level of visibility, other developers inspect every code change. Such a line would be spotted and the project removed from GitHub almost immediately. This is not a hypothetical protection — it is how the open-source ecosystem self-polices.
What the Scripts Actually Do
Files like install-and-run.bat or install-and-run.sh are automation shortcuts. They check if Node.js is installed, download the necessary code libraries (npm install), and start the local server — so you do not have to type long commands every time. Nothing more.
What to Keep an Eye On
Do not share your config or static folder. This is where your specific API keys are stored on your machine.
Only download from the official repo. Clones or re-uploads on other sites may have been tampered with. Only use the link: github.com/justlovemaki/AIClient-2-API
Watch the CMD window. If you ever see it trying to connect to an unknown IP address that is not Google, Anthropic, or OpenAI, that is a red flag worth investigating.
Model Guide — Which is Best for You?
A detailed breakdown of every major model family available through the proxies above. Performance notes are based on community usage and published benchmarks.
DeepSeek R1 · V3 · V3.2
Designed for natural conversational tasks. Does not surface its internal "thinking" process before answering — responses feel immediate and immersive. Faster generation and lower cost than the Reasoner. The community's top pick for creative roleplay because it never breaks immersion with visible reasoning steps.
Exposes its chain-of-thought (CoT) reasoning before producing a final response. You may see a "thinking" block appear first. This can feel distracting during fast-paced RP, but it produces noticeably smarter, more internally consistent decisions over very long chats. Ideal when a character needs to behave with exceptional logical coherence across 100+ messages.
The latest hybrid model, released March 2024. Introduces DeepSeek Sparse Attention (DSA), dramatically improving performance on very long contexts while maintaining output quality. Can operate in both chat and reasoning mode via a flag. Best for users who want cutting-edge performance and are comfortable with a slightly less stable model.
deepseek-chat for immersive roleplay. Use deepseek-reasoner when you need a character to make unusually logical, consistent decisions across a long story arc.
Further reading: DeepSeek Tutorial for Actual Dummies (Reddit)
GLM Series 4.7 · 5.0 · 5.1 Zhipu AI / Z.ai
A capable all-rounder. Works well for most RP scenarios without needing special prompting. Good general performance and stable output. Slightly cheaper and faster than 5.0 if you do not need extended long-term coherence.
A significant jump — roughly 20% overall improvement over 4.7 across benchmarks. Can autonomously handle complex multi-step scenarios and maintain character consistency across very long chats (100+ messages). The GLM to use for epic, ongoing storylines with intricate world-building.
Optimised primarily for coding agents (Claude Code, OpenClaw) — it delivers a 28% coding benchmark improvement over 5.0. Not a dramatic roleplay leap beyond 5.0, but its improved logical reasoning can help characters make more believable decisions in mechanically complex scenarios. Mostly relevant if you also use coding tools.
Claude Opus · Sonnet · Haiku Anthropic
Anthropic's most capable model. Exceptional creative writing, emotional depth, and nuance. The best model available for romantic, literary, or emotionally complex narratives. Stays in character better than any other model over very long chats. Context window of 200k tokens. Most expensive at approximately $3/million tokens input — but for high-quality RP, many consider it worth every cent.
A faster, more affordable Claude. Still delivers very strong creative writing and character consistency for most RP scenarios. The practical choice when Opus feels too expensive for everyday use. Moderate cost with 200k context.
The cheapest and fastest Claude model. Good for short conversations, quick testing, or casual chats. Less creative depth than Opus or Sonnet for extended narratives, but very capable for simple scenarios.
Best balance: Claude Sonnet — handles most RP well without breaking the bank.
Testing / casual: Claude Haiku.
Further reading: Which Claude model is best for JAI RP? (Reddit)
Gemini Flash · Pro · 2.0 Google
Fast, free, and capable for short to medium RP. Smaller 32k context window means it will forget things in very long sessions. Best for casual chats, testing proxy setups, and situations where cost is the top priority.
Larger 128k context window and significantly better reasoning and writing quality than Flash. A free tier is available but with lower rate limits. Best for longer, more involved RP sessions when you want free access with more depth.
The best Gemini for roleplay. Improved reasoning, writing quality, and instruction-following. Paid tier but still significantly cheaper than Claude. 128k context. A solid alternative when you want a Google model with proper RP quality.
Best free for longer sessions: Gemini 1.5 Pro (rate-limited).
Best paid Gemini: Gemini 2.0 Pro — cheaper than Claude, better than Flash.
Model Quick Reference
Ranked by overall suitability for Janitor AI roleplay. Intelligence and creativity ratings are community consensus.
| Model | Intelligence | Creativity | Context | Cost | Best For |
|---|---|---|---|---|---|
| Claude Opusclaude-3-opus-20240229 | ★★★★★ | ★★★★★ | 200k | Expensive | Emotional, literary, romantic RP. King of prose. |
| Claude Sonnetclaude-3-sonnet | ★★★★★ | ★★★★★ | 200k | Moderate | Balanced quality + cost. Still expensive vs alternatives. |
| GLM-5.0Zhipu AI | ★★★★★ | ★★★★★ | 128k | Cheap | Very long, consistent story arcs. 100+ message coherence. |
| DeepSeek V3 (Chat) Best All-Rounddeepseek-chat | ★★★★★ | ★★★★★ | 128k | Very cheap | Long, complex story-driven RP. The best all-rounder for most users. |
| DeepSeek R1 (Reasoner)deepseek-reasoner | ★★★★★ | ★★★★★ | 128k | Very cheap | Logical, consistent character decisions. Shows reasoning steps. |
| GLM-4.7Zhipu AI | ★★★★★ | ★★★★★ | 128k | Cheap | General RP. Comparable to DeepSeek R1 for most scenarios. |
| Gemini 2.0 ProGoogle | ★★★★★ | ★★★★★ | 128k | Cheap | Solid paid alternative. Less creative than Claude or DeepSeek. |
| Gemini 2.0 FlashGoogle | ★★★★★ | ★★★★★ | 32k | Free | Casual short chats. Good for testing. Not for deep RP. |
| GPT-4o (OpenAI) AvoidOpenAI | ★★★★★ | ★★★★★ | 128k | Moderate | Heavy filtering, frequent refusals, character inconsistency. Not suited for JanitorAI RP. |
Recommendation Summary
Claude Opus
DeepSeek V3 (Chat)
GLM-5.0 or DeepSeek R1 (Reasoner)
Gemini 2.0 Flash (via OpenRouter free tier)
GPT-4o (ChatGPT) — Heavy filtering and poor character consistency make it the worst option for JanitorAI.
Core Setup — Any Proxy
These steps are the foundation for configuring any proxy in Janitor AI. Follow them once for each proxy you want to add.
Troubleshooting Desk
When routing through a proxy, things occasionally break. Here is how to fix the most common issues immediately.
| Error | What it Means | Fix |
|---|---|---|
| 401 Unauthorized | Your API key is missing, typed wrong, or expired. | Double-check your API URL and re-paste your key in API Settings. Ensure no accidental spaces at the beginning or end. |
| 429 Rate Limited | You are sending messages faster than the proxy allows, or you have run out of credits. | Wait 30–60 seconds. If using a paid proxy, check your account balance. If free tier, switch to another provider temporarily. |
| 500 / 504 Timeout | The proxy server crashed or the model took too long to generate. | Refresh the page, wait a few minutes, and try again. Try lowering your Max New Tokens setting. |
| Context Length Exceeded | Chat history + character definition is larger than the model's memory. | Go to Generation Settings. Lower the Context Size slider, or clear some old chat history. |
| Endless Looping / Repetition | The AI is stuck repeating phrases or actions. | Raise Temperature and Repetition Penalty. Manually edit the AI's last message to break the pattern. |
AI Models Explained
When people talk about "70B" or "128k context" they're describing a model's capabilities. Here is what those numbers actually mean for roleplay.
Parameters — The "B" Numbers
This is the size of the model's "brain." More parameters generally means more intelligence, nuance, and creative range — but also slower generation and higher cost.
| Size Range | Characteristics | Suited For |
|---|---|---|
| Small (8B – 14B) | Fast, lightweight, cheaper to run. Can lose the plot after a while and may begin repeating itself. | Simple, short conversations. Testing. Not extended RP. |
| Medium (30B – 70B) | Solid balance of speed and intelligence. Understands nuance, stays in character longer. | Most everyday roleplay scenarios. |
| Large (100B+) | Very smart, very creative, excellent long-term lore retention. Slower and more expensive. | Complex narratives, intricate world-building, emotional depth. |
Context Window
The AI's short-term memory. It determines how much of your conversation history the model can "see" when generating its next response.
- A 4k context window remembers roughly the last 3,000 words. Anything earlier is forgotten.
- A 32k or 128k window remembers entire long roleplay sessions, keeping details consistent across hundreds of messages.
- For deep roleplay, bigger is almost always better. DeepSeek has 128k; Claude has 200k.
Censorship vs Uncensored
"Censored" (or "aligned") models have been trained to refuse certain content — violence, explicit themes, dark narratives — and may interrupt your RP with refusals or lectures. "Uncensored" (often called "base") models follow your instructions without applying moral filters. For unrestricted roleplay, uncensored models are strongly preferred. Most free models on OpenRouter are censored to some degree.
Quick Model Family Vibe Check
| Family | The Vibe | Best For |
|---|---|---|
| Claude (Anthropic) | The brilliant novelist. Highly descriptive and emotionally intelligent, with strict safety filters requiring clever prompting or proxies. | Deep emotional storytelling and complex narratives. |
| DeepSeek / GLM | The logical powerhouses. Excellent at following complex multi-step instructions and maintaining consistent character logic. | Intricate world-building and strict adherence to roleplay formats. |
| Gemini (Google) | The capable generalist. Good all-round performance with a free tier. Less creative flair than Claude but reliable. | Everyday RP when cost matters. Testing setups. |
| Llama 3 (Meta) | The energetic crowd-pleaser. Great conversational flow, fast, and easy to run on cheaper hardware. | Fast-paced dialogue and action scenes. |
Prompts & OOC Guide
A prompt is the director's script you hand to the AI. It dictates how the AI should write — not just what it should write.
When you hit send, the AI sees a hidden document combining: the Character Definition, Scenario, Persona, chat history, and your System Prompt (jailbreak / custom prompt). A well-written prompt forces the AI to drop its default robotic register and adopt the voice you need.
The Power of OOC (Out of Character)
OOC is a cheat code. It lets you pause the roleplay and speak directly to the "AI Director" to correct course — change pacing, fix hallucinations, force a scene transition — without breaking your character's dialogue.
How to use it: Wrap your instructions in square brackets at the end of your message.
OOC instructions are highly effective because the model processes them as a direct meta-instruction separate from the in-character content. Use them whenever the AI drifts off-track, breaks character, or needs a specific narrative adjustment.
Top Tier Prompt Resources
The underground goldmine. Advanced roleplayers host preset guides and system prompts here — pages that are often unindexed by Google. These can be massive (5,000+ words) covering everything from physics simulation to emotional subtext handling. Find links shared in power-user Discord servers by searching "LLM Rentry."
Developers publish curated LLM prompt collections and persona-layering techniques. Search "llm prompts" on GitHub for hundreds of community-maintained collections.
LoreBary — Sofia's Plugin Hub
LoreBary is a server-side plugin bridge for Janitor AI. It lets you inject powerful prompt expansions and lorebooks using short activation codes rather than pasting thousands of words into your prompt field.
lorebary.com — Browse available plugins, lorebooks, and prompt templates maintained by the community.
Why codes exist: They act as server-side shortcuts to keep your prompt field clean, prevent context bloat, and handle dynamic injection without manual rewriting each session.