hushtype

Real-time voice dictation for Windows -- type anywhere with your voice using GPU-accelerated OpenAI Whisper speech-to-text.

hushtype is a system-wide voice typing tool that turns your microphone into a keyboard. Press Ctrl+Alt+V to start dictating into any application -- text editors, browsers, terminals, chat apps, IDEs, email. It runs silently in the background with a minimal status indicator and supports 40+ voice commands for editing, navigation, and formatting.

Unlike transcription tools that process pre-recorded audio files, hushtype is a live input method -- speech appears as typed text in real time, wherever your cursor is. All processing runs locally on your GPU. No audio data ever leaves your machine.

Features

Speech Recognition

OpenAI Whisper speech-to-text engine (turbo model by default)
NVIDIA CUDA GPU acceleration for fast, low-latency transcription
Silero VAD (Voice Activity Detection) with ONNX optimization -- 4-5x faster than default
Streaming dictation via RealtimeSTT with early transcription on silence
Configurable silence duration -- control how long to wait before finalizing speech
Hallucination filtering -- strips known Whisper ghost phrases (silence artifacts)
NVIDIA Broadcast virtual microphone auto-detection for AI noise suppression

Voice Commands

40+ built-in voice commands for hands-free control. Most commands have natural aliases (e.g., "scratch that" and "undo" both trigger undo).

Category	Commands	What it does
Undo/Redo	"scratch that", "undo", "delete that", "delete last"	Undo last action
	"redo"	Redo last action
Selection	"select", "select word", "select that", "select last"	Select previous word
	"select all"	Select all text
	"select line"	Select current line
Deletion	"delete word", "backspace word"	Delete previous word
	"delete line", "clear line"	Delete current line
Navigation	"new line", "enter", "next line"	Insert new line
	"new paragraph", "next paragraph"	Insert blank line
	"go to start", "go to beginning"	Jump to document start
	"go to end"	Jump to document end
	"go up", "go down"	Move cursor up/down
Clipboard	"copy", "copy that"	Copy selection
	"paste", "paste that"	Paste (terminal-aware)
	"cut", "cut that"	Cut selection
Formatting	"tab", "indent"	Insert tab
	"outdent", "unindent"	Remove tab
File	"save", "save file", "save that"	Save file (Ctrl+S)
Control	"stop listening", "pause", "stop"	Pause dictation

Smart Typing

Auto-spacing -- automatically adds spaces between utterances
Context-aware spacing reset -- navigation and editing commands reset the space flag so text appears at the cursor without a leading space
Terminal-aware paste -- automatically uses Ctrl+Shift+V in Windows Terminal, ConEmu, PuTTY, mintty, and other terminal emulators
Lossless clipboard -- saves and restores your clipboard contents (full Unicode support) so dictation never overwrites what you copied
Enter key tracking -- pressing Enter on your keyboard also resets auto-spacing for natural line-start behavior

Visual Feedback

Always-on-top status indicator -- small draggable window showing current state
Color-coded states: green (LISTENING), orange (WORKING), gray (PAUSED)
System-wide hotkey: Ctrl+Alt+V to toggle listening on/off
Audio feedback -- beep tones on toggle (high pitch on, low pitch off)

Reliability

Automatic recorder restart on connection loss or repeated errors
Graceful shutdown with resource cleanup on Ctrl+C
Log file (hushtype.log) for diagnostics, truncated each session
Debounced hotkey to prevent double-toggles

Requirements

Windows 10 or 11 (uses Win32 APIs for system-wide input)
NVIDIA GPU with CUDA support (RTX 20-series or newer recommended)
Microphone (built-in, USB, or NVIDIA Broadcast virtual mic)
~4 GB VRAM for the turbo model (~1 GB for tiny/base/small)
Python 3.10+ (only needed if running from source)

Installation

Option 1: Download the executable (easiest)

Download hushtype.exe from the latest release and double-click to run. No Python or setup required.

On first run, the Whisper model (~1.5 GB) downloads automatically. This is a one-time download. Your firewall may prompt you to allow the connection -- this is only for the model download. After that, all speech recognition runs locally on your GPU.

Option 2: Run from source

Requires Python 3.10+ and an NVIDIA GPU with CUDA.

# 1. Install PyTorch with CUDA (check https://pytorch.org for your CUDA version)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu124

# 2. Clone and install
git clone https://github.com/turqoisehex/hushtype.git
cd hushtype
pip install -r requirements.txt

# 3. Run
python hushtype.py

NVIDIA Broadcast (optional)

If you have NVIDIA Broadcast, hushtype automatically detects and uses its virtual microphone for AI-powered noise and echo suppression. No configuration needed -- just make sure Broadcast is running.

Usage

hushtype [options]

Options:
  --model MODEL        Whisper model: tiny, base, small, medium, large-v3, turbo (default: turbo)
  --silence SECONDS    Seconds of silence before finalizing speech (default: 2.5)
  --sensitivity VALUE  VAD sensitivity 0.0-1.0, higher = more sensitive (default: 0.55)
  --offline            Disable all network access (model must already be cached)
  --cache-dir DIR      Custom HuggingFace cache directory for Whisper models
  --version            Show version and exit
  --help               Show help and exit

Quick start

Run hushtype.exe (or python hushtype.py)
Wait for "Model loaded. Ready." (first run downloads the model)
A small status indicator appears in the bottom-right corner showing PAUSED
Press Ctrl+Alt+V -- indicator turns green, showing LISTENING
Speak naturally -- text appears wherever your cursor is
Say "stop listening" or press Ctrl+Alt+V again to pause
Press Ctrl+C in the terminal to quit

Tuning speech detection

If speech is cutting off too early, increase the silence threshold:

hushtype --silence 3.0

If hushtype triggers on background noise, lower VAD sensitivity:

hushtype --sensitivity 0.4

For smaller/faster models (less accurate but lower VRAM usage):

hushtype --model small    # ~1 GB VRAM
hushtype --model tiny     # ~1 GB VRAM, fastest

Using an existing model cache

If you already have Whisper models downloaded elsewhere, point hushtype to that directory:

hushtype --cache-dir "C:\path\to\huggingface"

How It Works

Silero VAD continuously monitors the microphone for speech activity
When speech is detected, audio is streamed to faster-whisper (CTranslate2-optimized Whisper)
Transcribed text is matched against the voice command table
If no command matches, text is typed via clipboard paste (handles Unicode, instant)
Auto-spacing adds a leading space between consecutive utterances
The previous clipboard contents are saved and restored after each paste

Building the Executable

The executable is built automatically by GitHub Actions when a version tag is pushed. To build locally:

pip install pyinstaller
pyinstaller hushtype.spec
# Output: dist/hushtype.exe

Privacy

All speech recognition runs locally on your GPU using faster-whisper
No audio data is sent to any server -- ever
No telemetry, no analytics, no tracking
On first run, the Whisper model is downloaded from HuggingFace (the only network request)
Use --offline to block all network access after the model is cached

FAQ

Q: Why Windows only? hushtype uses Win32 APIs (GetAsyncKeyState, GetForegroundWindow, win32clipboard) for system-wide hotkey detection, terminal window class identification, and clipboard management. These are Windows-specific. Cross-platform support is a possible future addition.

Q: What GPU do I need? Any NVIDIA GPU with CUDA support. The turbo model uses ~4 GB VRAM. Smaller models (tiny, base, small) use ~1 GB. CPU-only mode is not currently supported.

Q: My firewall is asking about hushtype -- is it safe? Yes. The only network request is to download the Whisper model from HuggingFace on first run (~1.5 GB). After that, hushtype runs fully offline. You can use --offline to verify no network access occurs.

Q: Does it work with any microphone? Yes. Any Windows audio input device works. If NVIDIA Broadcast is installed, hushtype automatically uses the Broadcast virtual microphone for AI noise suppression.

Q: Can I add custom voice commands? Not yet via config file, but the VOICE_COMMANDS dictionary in hushtype.py is straightforward to edit. Custom command support via config file is planned.

Q: Why clipboard paste instead of simulated keystrokes? Clipboard paste handles Unicode characters, accented text, and special symbols that SendKeys or pyautogui.write() cannot type. It is also instantaneous regardless of text length. Your clipboard contents are automatically saved and restored after each paste.

Q: Is my audio sent to the cloud? No. All speech recognition runs locally on your GPU using faster-whisper. No audio data leaves your machine. See the Privacy section.

Q: How do I stop hushtype? Say "stop listening" to pause dictation (press Ctrl+Alt+V to resume). Press Ctrl+C in the terminal window to quit entirely.

Contributing

Contributions are welcome! This is a side project maintained in spare time, so please be patient with response times.

See CONTRIBUTING.md for development setup and guidelines. Check out issues labeled good first issue for beginner-friendly tasks.

Keywords

voice dictation, speech to text, whisper, voice typing, dictation software, speech recognition, stt, voice input, nvidia gpu, cuda, windows voice dictation, real-time transcription, live transcription, voice commands, hands-free typing, accessibility, python speech to text, openai whisper, faster-whisper, voice control, dictation app, system-wide dictation, microphone to text, talk to type, voice to text windows, speech to text windows, gpu transcription, local speech recognition, offline speech to text, voice input method, whisper turbo, silero vad, realtime stt

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
hushtype.py		hushtype.py
hushtype.spec		hushtype.spec
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
runtime_hook.py		runtime_hook.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hushtype

Features

Speech Recognition

Voice Commands

Smart Typing

Visual Feedback

Reliability

Requirements

Installation

Option 1: Download the executable (easiest)

Option 2: Run from source

NVIDIA Broadcast (optional)

Usage

Quick start

Tuning speech detection

Using an existing model cache

How It Works

Building the Executable

Privacy

FAQ

Contributing

Keywords

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hushtype

Features

Speech Recognition

Voice Commands

Smart Typing

Visual Feedback

Reliability

Requirements

Installation

Option 1: Download the executable (easiest)

Option 2: Run from source

NVIDIA Broadcast (optional)

Usage

Quick start

Tuning speech detection

Using an existing model cache

How It Works

Building the Executable

Privacy

FAQ

Contributing

Keywords

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages