AI voice cloner locally and not soy puke

fukurou

the supreme coder
ADMIN
### Voice Cloning Walkthrough for Windows (PyCharm Terminal)

**Forget Bash. This is for the PyCharm terminal on Windows 11.**

#### 1. Create Project & Virtual Environment
1. Open PyCharm
2. Create new project: `voice_clone`
3. Open Terminal (bottom tab)

#### 2. Create requirements.txt
In PyCharm:
1. Right-click project → New → File
2. Name it `requirements.txt`
3. Paste this exact content:

```txt
torch==2.0.1+cu118
torchvision==0.15.2+cu118
torchaudio==2.0.2+cu118
--index-url https://download.pytorch.org/whl/cu118

TTS==0.20.2
soundfile==0.12.1
librosa==0.10.1
```

#### 3. Install Dependencies
In PyCharm Terminal run:

```cmd
# Create virtual environment
python -m venv venv

# Activate it
.\venv\Scripts\activate.bat

# Install everything from requirements.txt
pip install -r requirements.txt
```

Wait for everything to install (will take several minutes).

#### 4. Create clone_script.py
Right-click project → New → Python File → Name it `clone_script.py`

Paste this code:

```python
Python:
from TTS.api import TTS
import torch


# Setup
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")


# Init model (will download ~2GB first time)
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)


# Generate audio
tts.tts_to_file(
    text="Whatever you want the voice to say goes here.",
    speaker_wav="sample.wav",  # Put your audio file in project folder
    language="en",
    file_path="output.wav"
)


print("Done. Check output.wav")
```

#### 5. Add Your Audio Sample
1. Get a clean 5-10 second audio file (.wav format)
2. Name it `sample.wav`
3. Drag and drop it into your PyCharm project folder

#### 6. Run It
In PyCharm Terminal (make sure `(venv)` is showing):

```cmd
python clone_script.py
```

First run will download the model. Subsequent runs take seconds.

#### 7. Find Your Output
- Look in your project folder for `output.wav`
- Play it with any media player

**Notes:**
- Use forward slashes in paths: `"C:/Users/Name/audio.wav"`
- If you get errors, delete the `venv` folder and restart from step 3
- GPU makes it fast, CPU makes it slow but works
 

fukurou

the supreme coder
ADMIN
### Complete Walkthrough: New Project Setup

I'll break it down step-by-step for a **brand new PyCharm project**.

#### Step 1: Create Your New Project
1. Open PyCharm
2. Click **"New Project"**
3. Name it `voice_app` (or whatever you want)
4. Click **"Create"**

#### Step 2: Set Up the Terminal
1. In PyCharm, open the **Terminal** (bottom tab)
2. You should see something like: `(venv) PS C:\Users\You\voice_app>`
3. If you don't see `(venv)`, run this first:
```cmd
python -m venv venv
.\venv\Scripts\activate.ps1
```

#### Step 3: Install Everything
**In the PyCharm Terminal, run these commands:**

```cmd
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install TTS soundfile librosa
```

#### Step 4: Create Your Voice Cloner File
1. In PyCharm, right-click on your project name (`voice_app`)
2. Select **New → Python File**
3. Name it `voice_cloner.py`
4. Paste this code:

```python
Python:
import torch
from TTS.api import TTS
import os


class VoiceCloner:
    def __init__(self):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Loading AI model on {self.device}...")
        self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
        self.voice_loaded = False
        
    def load_voice(self, audio_path):
        """Load a voice from an audio file"""
        if not os.path.exists(audio_path):
            raise FileNotFoundError(f"Audio file not found: {audio_path}")
        self.reference_audio = audio_path
        self.voice_loaded = True
        print(f"Voice loaded: {os.path.basename(audio_path)}")
    
    def speak(self, text, output_path="output.wav", language="en"):
        """Make the cloned voice say anything"""
        if not self.voice_loaded:
            raise ValueError("Load a voice first with load_voice()")
        
        self.tts.tts_to_file(
            text=text,
            speaker_wav=self.reference_audio,
            language=language,
            file_path=output_path
        )
        print(f"Saved: {output_path}")
        return output_path


# Create one instance to use everywhere in your project
cloner = VoiceCloner()
```

#### Step 5: Add Your Voice Sample
1. Find a audio file (.wav) of the voice you want to clone
2. **Drag and drop** the file into your PyCharm project folder
3. Remember the filename (e.g., `my_voice.wav`)

#### Step 6: Create Your Main Program
1. Right-click on your project → **New → Python File**
2. Name it `main.py`
3. Paste this code:

```python
from voice_cloner import cloner

# Load your voice sample (do this once)
cloner.load_voice("my_voice.wav") # ← Change this to your filename

# Now use it anywhere in your code
def main():
while True:
text = input("Enter text to speak (or 'quit' to exit): ")
if text.lower() == 'quit':
break
cloner.speak(text, "output.wav")
print("Audio generated!")

if __name__ == "__main__":
main()
```

#### Step 7: Run It
**In the PyCharm Terminal:**
```cmd
python main.py
```

**What happens:**
1. First run: Downloads the AI model (takes a few minutes)
2. Loads your voice sample
3. Lets you type any text to generate audio instantly

#### Step 8: Use in Other Projects
For your **next project**, just:
1. Copy the `voice_cloner.py` file into the new project
2. Add your voice sample file
3. Use the same import code:
```python
from voice_cloner import cloner
cloner.load_voice("my_voice.wav")
cloner.speak("Your text here")
```

That's it. One file, one voice sample, and you can use it anywhere. No separate setups or complicated paths.
 

fukurou

the supreme coder
ADMIN
You're 100% right. That was overcomplicated. Let's fuse everything into **one single self-contained project** that handles its own setup.

### All-in-One Voice Cloner Project

#### Step 1: Create New Project
1. Open PyCharm → **New Project** → Name it `voice_cloner_app` → **Create**

#### Step 2: Create JUST ONE FILE
Right-click project → **New → Python File** → Name it `voice_cloner.py`

Paste this **complete** code:

```python
Python:
import subprocess
import sys
import os
import torch
from TTS.api import TTS


class SelfInstallingVoiceCloner:
    def __init__(self):
        self.check_dependencies()
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Loading AI model on {self.device}...")
        self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
        self.voice_loaded = False
    
    def check_dependencies(self):
        """Install everything automatically if missing"""
        try:
            import TTS
            print("All dependencies already installed.")
        except ImportError:
            print("Installing required packages...")
            requirements = [
                "torch==2.0.1+cu118",
                "torchvision==0.15.2+cu118",
                "torchaudio==2.0.2+cu118",
                "TTS==0.20.2",
                "soundfile==0.12.1",
                "librosa==0.10.1"
            ]
            
            for package in requirements:
                subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            
            print("Installation complete. Please restart the script.")
            sys.exit(1)
    
    def load_voice(self, audio_path):
        """Load a voice from audio file"""
        if not os.path.exists(audio_path):
            raise FileNotFoundError(f"Audio file not found: {audio_path}")
        self.reference_audio = audio_path
        self.voice_loaded = True
        print(f"Voice loaded: {os.path.basename(audio_path)}")
    
    def speak(self, text, output_path="output.wav", language="en"):
        """Make cloned voice say anything"""
        if not self.voice_loaded:
            raise ValueError("Load a voice first with load_voice()")
        
        self.tts.tts_to_file(
            text=text,
            speaker_wav=self.reference_audio,
            language=language,
            file_path=output_path
        )
        print(f"Audio saved: {output_path}")
        return output_path


# Create global instance
cloner = SelfInstallingVoiceCloner()


if __name__ == "__main__":
    # Demo usage
    cloner.load_voice("my_voice.wav")  # ← Put your audio file in project folder
    cloner.speak("Voice cloning setup complete!", "test.wav")
```

#### Step 3: Add Your Voice Sample
1. Get your `.wav` audio file
2. **Drag and drop** it into the PyCharm project folder
3. **Edit line 52** in the code: Change `"my_voice.wav"` to your actual filename

#### Step 4: RUN IT
**Just run the file directly in PyCharm:**
- Right-click `voice_cloner.py` → **Run 'voice_cloner'**

**What happens:**
1. First run: Installs all dependencies automatically
2. Asks you to restart the script (just run it again)
3. Second run: Loads AI model + your voice + generates test audio

#### Step 5: Use It in Other Scripts
Now create **another file** in the same project:

**File: `my_script.py`**
```python
Python:
from voice_cloner import cloner


# The cloner is already set up and ready to use
cloner.speak("This is so much simpler", "output1.wav")
cloner.speak("No separate installation needed", "output2.wav")
```

### How It Works:
- **One file** handles everything: installation + voice cloning
- **No terminal commands needed** - it installs dependencies automatically
- **Self-contained** - everything happens within this project
- **No virtual environment setup** required
- **Just run the file** and it works

This is what you wanted: **one project that handles its own setup completely**. No separate steps, no terminal commands, just run the file.
 

fukurou

the supreme coder
ADMIN
Python:
# DiTTS_clone.py

import os
from voice_cloner import cloner  # assumes cloner is a SelfInstallingVoiceCloner instance
from LivinGrimoire import Skill

class DiTTS_clone(Skill):
    def __init__(self):
        super().__init__()
        self.set_skill_type(3)  # continuous skill
        self.set_skill_lobe(2)  # output lobe
        self.voice_sample = "my_voice.wav"  # your cloned voice sample
        self.sounds_dir = "sounds"
        os.makedirs(self.sounds_dir, exist_ok=True)

        cloner.load_voice(self.voice_sample)

    def input(self, ear: str, skin: str, eye: str):
        if not ear:
            return

        filename = self.__sanitize_filename(ear)
        path = os.path.join(self.sounds_dir, f"{filename}.wav")

        if os.path.isfile(path):
            cloner.play(path)
        else:
            cloner.speak(ear, path)
            cloner.play(path)

    def __sanitize_filename(self, txt: str) -> str:
        return txt.translate(str.maketrans('', '', "?':,\n")).replace(" ", "_")
 
Top