
Pick a model that suits your hardware and goals. Popular choices:
- LLaMA 3 (Meta)
- Mistral
- Gemma
- GPT4All
These models are available in GGUF format and optimized for local use.
Step 2: Install Ollama
Ollama is a user-friendly tool to run LLMs locally.
curl -fsSL https://ollama.com/install.sh | sh
Then run your model:
ollama run llama3
This downloads and launches the model locally.
Step 3: Set Up Your Python Project
Create a folder like waifu-chatbot, and inside it:
- main.py — your Python script
- requirements.txt — dependencies
In requirements.txt, add:
fastapi
uvicorn
requests
Install them:
pip install -r requirements.txt
Step 4: Build a Local API with FastAPI
Here’s a basic main.py to send prompts to your waifu:
from fastapi import FastAPI, Request
Python:
import requests
app = FastAPI()
@app.post("/chat")
async def chat(request: Request):
data = await request.json()
prompt = data.get("prompt")
response = requests.post("http://localhost:11434/api/generate", json={"model": "llama3", "prompt": prompt})
return response.json()
Run it with:
uvicorn main:app --reload
Step 5: Customize Your Waifu
You can fine-tune personality by:
- Prepending a system prompt like: "You are a cute anime waifu who loves cats and ramen."
- Using prompt engineering to shape responses
- Saving chat history for memory simulation
Step 6: Test It!
Use curl or Postman to send a prompt:
curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" -d '{"prompt": "Hi waifu!"}'