diff --git a/fern/apis/api/generators.yml b/fern/apis/api/generators.yml
index bdd9f4416..6793192ed 100644
--- a/fern/apis/api/generators.yml
+++ b/fern/apis/api/generators.yml
@@ -11,7 +11,7 @@ groups:
   python-sdk:
     generators:
       - name: fernapi/fern-python-sdk
-        version: 4.37.1
+        version: 4.55.5
         api:
           settings:
             prefer-undiscriminated-unions-with-literals: true
diff --git a/fern/calls/call-concurrency.mdx b/fern/calls/call-concurrency.mdx
index 69c4bbb19..97615c674 100644
--- a/fern/calls/call-concurrency.mdx
+++ b/fern/calls/call-concurrency.mdx
@@ -10,31 +10,91 @@ description: Learn how concurrency slots work, how to stay within the default li
 Call concurrency represents how many Vapi calls can be active at the same time. Each call occupies one slot, similar to using a finite set of phone lines.
 
 **In this guide, you'll learn to:**
-- Understand the default concurrency allocation and when it is usually sufficient
+- Understand how concurrency is allocated at the account level
 - Keep outbound and inbound workloads within plan limits
-- Increase reserved capacity directly from the Vapi Dashboard
+- Scale capacity through Dashboard add-ons, startup programs, or enterprise plans
 - Inspect concurrency data through API responses and analytics queries
+- Apply best practices for high-volume operations
 
-## What is concurrency?
+## How concurrency works
 
-Every Vapi account includes **10 concurrent call slots** by default. When all slots are busy, new outbound dials or inbound connections wait until a slot becomes free.
+Every Vapi account includes **10 concurrent call slots** by default. When all slots are busy, new outbound dials or inbound connections are blocked until a slot becomes free.
 
-<CardGroup cols={2}>
-  <Card title="Inbound agents" icon="phone" iconType="solid">
-    Rarely hit concurrency caps unless traffic surges (launches, seasonal spikes).
+### Account-level allocation
+
+Concurrency is an **account-level limit**, not a per-phone-number or per-assistant limit. All calls across your organization draw from the same pool:
+
+<CardGroup cols={3}>
+  <Card title="Inbound calls" icon="phone" iconType="solid">
+    Phone calls and SIP connections that reach your assistants.
   </Card>
-  <Card title="Outbound agents" icon="phone-outgoing" iconType="solid">
-    More likely to reach limits when running large calling batches.
+  <Card title="Outbound calls" icon="phone-outgoing" iconType="solid">
+    Calls your application initiates through the API or campaigns.
+  </Card>
+  <Card title="Web calls" icon="globe" iconType="solid">
+    Browser-based sessions using the Vapi Web SDK.
   </Card>
 </CardGroup>
 
-These limits ensure the underlying compute stays reliable for every customer. Higher concurrency requires reserving additional capacity, which Vapi provides through custom or add-on plans.
+A single pool of 10 slots is shared across all three call types. If 6 inbound calls and 3 outbound calls are active, only 1 slot remains for any new call regardless of type.
+
+### Organization pool sharing
+
+When multiple organizations share the same billing subscription, they share the same concurrency pool. A call on Organization A counts against the same limit as a call on Organization B.
+
+<Note>
+If your inbound and outbound workloads compete for slots, consider separating them into different organizations under the same subscription. This does not increase total capacity, but lets you monitor and manage each workload independently.
+</Note>
+
+## Scaling your concurrency
+
+Vapi offers several paths to increase concurrency beyond the default 10 slots.
+
+| Tier | Concurrent slots | How to access |
+|---|---|---|
+| Default | 10 | Included with every account |
+| Self-serve add-ons | Up to hundreds | Dashboard billing page, **$10 per line per month** |
+| Startup program | 100 | Apply through the Vapi startup program |
+| Enterprise | Custom | Contact Vapi sales for dedicated capacity |
+| Platform scale | 1,000+ | Available for enterprise customers with reserved infrastructure |
+
+### Purchase add-on lines
+
+You can raise your concurrency limit without contacting support:
+
+<Steps>
+  <Step title="Open the billing page">
+    Go to the [Vapi Dashboard](https://dashboard.vapi.ai/settings/billing) and navigate to **Settings > Billing**.
+  </Step>
+  <Step title="Find reserved concurrency">
+    Locate the **Reserved Concurrency (Call Lines)** section on the billing page.
+  </Step>
+  <Step title="Increase the limit">
+    Add the number of extra lines you need. Each additional line costs **$10 per month**.
+  </Step>
+  <Step title="Confirm the change">
+    Changes apply immediately. New slots are available for your next call.
+  </Step>
+</Steps>
+
+### Startup program
+
+The Vapi startup program provides **100 concurrent call lines** to qualifying early-stage companies. Visit the [Vapi startup program](https://vapi.ai/startup) page to check eligibility and apply.
+
+### Enterprise plans
+
+Enterprise customers can negotiate custom concurrency allocations that include:
+- Dedicated infrastructure for consistent performance at scale
+- Concurrency levels above 1,000 simultaneous calls
+- Priority support for capacity planning
+
+Contact [Vapi sales](https://vapi.ai/enterprise) to discuss enterprise concurrency needs.
 
 ## Managing concurrency
 
 ### Outbound campaigns
 
-Batch long lead lists into smaller chunks (for example, 50–100 numbers) and run those batches sequentially. This keeps your peak concurrent calls near the default limit while still working through large sets quickly.
+Batch long lead lists into smaller chunks (for example, 50-100 numbers) and run those batches sequentially. This keeps your peak concurrent calls near the default limit while still working through large sets quickly.
 
 ### High-volume operations
 
@@ -47,16 +107,13 @@ If you regularly exceed **50,000 minutes per month**, talk with Vapi about:
 Use billing reports to pair minute usage with concurrency spikes so you can upgrade before calls are blocked.
 </Tip>
 
-## Increase your concurrency limit
-
-You can raise or reserve more call lines without contacting support:
+### Separate inbound and outbound workloads
 
-1. Open the [Vapi Dashboard](https://dashboard.vapi.ai/settings/billing).
-2. Navigate to **Settings → Billing**.
-3. Find **Reserved Concurrency (Call Lines)**.
-4. Increase the limit or purchase add-on concurrency lines.
+For production deployments where inbound availability is critical, create separate organizations for inbound and outbound traffic under the same subscription. This approach:
 
-Changes apply immediately, so you can scale ahead of known traffic surges.
+- Prevents outbound campaign surges from blocking incoming customer calls
+- Gives each workload its own concurrency monitoring
+- Simplifies capacity planning for each use case
 
 ## View concurrency in call responses
 
@@ -146,8 +203,50 @@ curl 'https://api.vapi.ai/analytics' \
 
 Adjust the `timeRange.step` to inspect usage by hour, day, or week. Peaks that align with campaign launches, seasonality, or support events highlight when you should reserve additional call lines.
 
+## Video walkthrough
+
+Watch a walkthrough of concurrency management and scaling in the Vapi Dashboard:
+
+<iframe
+  width="560"
+  height="315"
+  src="https://www.youtube.com/embed/iWZD_HBk9zQ"
+  title="Vapi Concurrency Management"
+  frameBorder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowFullScreen
+/>
+
+## FAQ
+
+<AccordionGroup>
+  <Accordion title="Is the concurrency limit per phone number or per account?">
+    Concurrency is an **account-level limit**. All calls across every phone number, assistant, and call type (inbound, outbound, web) share the same pool of slots.
+  </Accordion>
+
+  <Accordion title="What happens when all slots are full?">
+    New calls receive a `concurrencyBlocked: true` response and do not connect. For inbound calls, the caller hears a busy signal or your telephony provider's default behavior. Monitor the `remainingConcurrentCalls` field to avoid this.
+  </Accordion>
+
+  <Accordion title="How quickly do add-on lines activate?">
+    Add-on lines purchased through the Dashboard billing page activate **immediately**. No restart or configuration change is needed.
+  </Accordion>
+
+  <Accordion title="Do web calls count against my concurrency limit?">
+    Yes. Web calls initiated through the Vapi Web SDK consume concurrency slots from the same account-level pool as phone-based inbound and outbound calls.
+  </Accordion>
+
+  <Accordion title="Can I share concurrency across multiple organizations?">
+    Organizations under the same billing subscription share a single concurrency pool. A call on one organization reduces available slots for all organizations on that subscription.
+  </Accordion>
+
+  <Accordion title="How do I monitor concurrency in real time?">
+    Check the `subscriptionLimits` object returned with every `POST /call` response. For historical trends, use the Analytics API with the `subscription` table and `concurrency` column.
+  </Accordion>
+</AccordionGroup>
+
 ## Next steps
 
-- **[Call queue management](mdc:docs/calls/call-queue-management):** Build a Twilio queue to buffer calls when you hit concurrency caps.
-- **[Outbound campaign planning](mdc:docs/outbound-campaigns/overview):** Design outbound strategies that pair batching with analytics.
-- **[Enterprise plans](mdc:docs/enterprise/plans):** Review larger plans that include higher default concurrency.
+- **[Call queue management](/calls/call-queue-management):** Build a Twilio queue to buffer calls when you hit concurrency caps.
+- **[Outbound campaign planning](/outbound-campaigns/overview):** Design outbound strategies that pair batching with analytics.
+- **[Voice latency optimization](/voice-latency):** Reduce response times across the voice AI pipeline.
diff --git a/fern/docs.yml b/fern/docs.yml
index f114ec8fe..2a92f3a42 100644
--- a/fern/docs.yml
+++ b/fern/docs.yml
@@ -368,6 +368,9 @@ navigation:
           - page: IVR navigation
             path: ivr-navigation.mdx
             icon: fa-light fa-phone-office
+          - page: Voice latency optimization
+            path: voice-latency.mdx
+            icon: fa-light fa-gauge-high
           - section: Testing
             collapsed: true
             icon: fa-light fa-clipboard-check
diff --git a/fern/voice-latency.mdx b/fern/voice-latency.mdx
new file mode 100644
index 000000000..4996db8da
--- /dev/null
+++ b/fern/voice-latency.mdx
@@ -0,0 +1,298 @@
+---
+title: Voice latency optimization
+subtitle: Reduce response times across the voice AI pipeline
+slug: voice-latency
+description: Learn how to minimize latency in voice AI agents by optimizing each stage of the audio pipeline, from speech recognition to text-to-speech output.
+---
+
+## Overview
+
+Voice AI latency is the time between when a user finishes speaking and when the assistant begins responding. Vapi targets **sub-500ms response times** for production voice agents. Every stage of the pipeline contributes to total latency, and optimizing each one compounds into noticeably faster conversations.
+
+**In this guide, you'll learn to:**
+- Understand how audio moves through the voice pipeline and where latency accumulates
+- Select and configure LLM, STT, and TTS providers for minimal delay
+- Optimize tool calls, squad handoffs, and network configuration
+- Monitor latency and apply Vapi platform features that reduce response times
+
+## Voice AI pipeline
+
+Each voice interaction passes through a series of processing stages. Latency at any stage delays the entire response.
+
+```
+Microphone → Network → Telephony → STT → LLM → TTS → Network → Playback
+```
+
+<CardGroup cols={2}>
+  <Card title="Audio capture" icon="microphone" iconType="solid">
+    User speech is captured by the microphone, transmitted over the network, and received by the telephony layer.
+  </Card>
+  <Card title="Speech-to-text (STT)" icon="ear-listen" iconType="solid">
+    The transcriber converts audio to text. Model choice and endpointing configuration directly affect speed.
+  </Card>
+  <Card title="LLM processing" icon="brain" iconType="solid">
+    The language model generates a response. Model selection, prompt size, and caching have the largest impact on this stage.
+  </Card>
+  <Card title="Text-to-speech (TTS)" icon="volume-high" iconType="solid">
+    The TTS engine converts the LLM response into audio for playback to the user.
+  </Card>
+</CardGroup>
+
+The total time from end of user speech to start of assistant audio is called **time-to-first-byte (TTFB)**. Reducing TTFB at each stage is the primary goal of latency optimization.
+
+## LLM optimization
+
+The LLM stage typically accounts for the largest portion of response latency. Model selection, prompt design, and caching are the highest-impact optimizations.
+
+### Model selection
+
+Choose models that balance quality with speed for your use case:
+
+| Model | Relative speed | Best for |
+|---|---|---|
+| GPT-4.1 | Fast | General-purpose voice agents with good quality-to-speed ratio |
+| GPT-4.1-mini | Faster | High-volume agents where speed is the priority |
+| GPT-4.1-nano | Fastest | Simple tasks like routing, FAQs, and data collection |
+| Claude 3.5 Sonnet | Fast | Complex reasoning tasks that need high accuracy |
+| Gemini 2.0 Flash | Fast | Multimodal use cases and fast general responses |
+
+<Tip>
+Start with GPT-4.1 for most voice agents. It provides a strong balance of response quality and low latency. Switch to a smaller model only if latency requirements demand it and output quality remains acceptable.
+</Tip>
+
+### Prompt caching
+
+Prompt caching stores your system prompt in the LLM provider's cache so it does not need to be re-processed on every turn. This reduces LLM latency by **40-80%** on cache hits.
+
+Prompt caching activates automatically when:
+- Your system prompt exceeds the provider's minimum token threshold (typically 1,024 tokens for OpenAI)
+- The same prompt is used across multiple turns or calls
+
+<Note>
+Prompt caching is provider-dependent. OpenAI, Anthropic, and Google each have different minimum prompt sizes and caching behavior. Check your provider's documentation for specifics.
+</Note>
+
+### Prompt design for speed
+
+Keep prompts concise to reduce processing time:
+
+- Put critical instructions at the beginning of the system prompt
+- Remove redundant or overly verbose instructions
+- Use structured formats (numbered lists, short sentences) over long paragraphs
+- Avoid including large reference documents inline when a tool call can retrieve them on demand
+
+## STT optimization
+
+The speech-to-text stage converts user audio into text for the LLM. Transcriber choice and endpointing configuration determine how quickly this handoff occurs.
+
+### Transcriber selection
+
+Vapi supports multiple STT providers. Select based on your language and accuracy needs:
+
+| Provider | Strengths | Considerations |
+|---|---|---|
+| Deepgram | Low latency, strong English accuracy | Best default for English-primary agents |
+| Talkscriber | Fast endpointing, good multilingual support | Good for non-English deployments |
+| Gladia | Multilingual, real-time streaming | Useful for agents serving multiple languages |
+| Assembly AI | High accuracy for complex audio | Slightly higher latency for accuracy gains |
+
+### Endpointing configuration
+
+Endpointing determines when the system decides the user has finished speaking. Faster endpointing reduces wait time but risks cutting off mid-sentence.
+
+For English conversations, use smart endpointing:
+
+```json
+{
+  "startSpeakingPlan": {
+    "smartEndpointingPlan": {
+      "provider": "livekit"
+    },
+    "waitSeconds": 0.4
+  }
+}
+```
+
+For non-English languages, use transcription-based endpointing:
+
+```json
+{
+  "startSpeakingPlan": {
+    "transcriptionEndpointingPlan": {
+      "onPunctuationSeconds": 0.1,
+      "onNoPunctuationSeconds": 1.5,
+      "onNumberSeconds": 0.5
+    },
+    "waitSeconds": 0.4
+  }
+}
+```
+
+See **[Voice pipeline configuration](/customization/voice-pipeline-configuration)** for the complete endpointing reference.
+
+### Transcriber fallbacks
+
+Configure fallback transcribers to maintain availability if your primary provider has an outage or degradation:
+
+```json
+{
+  "transcriber": {
+    "provider": "deepgram",
+    "model": "nova-3",
+    "fallbackPlan": {
+      "transcribers": [
+        {
+          "provider": "talkscriber",
+          "model": "whisper"
+        }
+      ]
+    }
+  }
+}
+```
+
+Fallbacks add resilience without affecting latency during normal operation. The fallback provider activates only when the primary fails.
+
+## TTS optimization
+
+The text-to-speech stage converts the LLM response into audio. Voice selection and formatting directly affect both latency and speech quality.
+
+### Voice selection
+
+Choose TTS providers and voices optimized for low latency:
+
+| Provider | Strengths | Considerations |
+|---|---|---|
+| Vapi Voices (Beta) | Optimized for Vapi pipeline, lowest latency | Limited voice selection during beta |
+| ElevenLabs | High quality, natural-sounding voices | Higher latency than streaming-first providers |
+| Deepgram | Fast streaming TTS | Good balance of speed and quality |
+| PlayHT | Low-latency streaming, voice cloning | Good for custom brand voices |
+| OpenAI | Consistent quality | Moderate latency |
+
+<Tip>
+Vapi Voices (Beta) are purpose-built for the Vapi pipeline and deliver the lowest TTS latency. Try them first for latency-sensitive deployments, then fall back to other providers if voice quality requirements differ.
+</Tip>
+
+### Voice formatting and pronunciation
+
+Improve TTS output quality without adding latency:
+
+- **SSML tags:** Use SSML for precise control over pronunciation, pauses, and emphasis where supported by your TTS provider
+- **Pronunciation guides:** Add phonetic spellings for product names, technical terms, and proper nouns in your system prompt
+- **Sentence structure:** Short, declarative sentences render faster and sound more natural than long, complex ones
+
+## Tool call optimization
+
+Tool calls (function calls) add latency because the pipeline pauses while waiting for the external service to respond. Minimize this impact with these strategies:
+
+### Reduce tool response time
+
+- Host tool endpoints geographically close to Vapi's infrastructure
+- Set aggressive timeouts (2-3 seconds) and return fallback responses when tools are slow
+- Cache frequently requested data to avoid redundant API calls
+
+### Minimize unnecessary tool calls
+
+- Preload commonly needed data into the system prompt or conversation context
+- Combine multiple related lookups into a single tool call where possible
+- Use the LLM's context window to avoid re-fetching information already present in the conversation
+
+### Async tool patterns
+
+For tools that do not need to block the conversation (like logging or CRM updates), configure them as async server events instead of blocking tool calls. This lets the assistant continue speaking while background tasks complete.
+
+## Squad design
+
+Squads let you connect multiple assistants that hand off conversations between each other. Each handoff introduces latency, so squad architecture matters for response time.
+
+### Keep squads small
+
+Limit squads to **2-5 members**. Each additional member increases the routing complexity and potential handoff latency.
+
+### Optimize handoff triggers
+
+- Use clear, specific conditions for transfers rather than broad intent matching
+- Pre-warm destination assistants when a transfer is likely
+- Avoid chains of transfers (A to B to C) by routing directly to the correct specialist
+
+### Minimize context transfer size
+
+When transferring a conversation between squad members, only pass the context the receiving assistant needs. Large context payloads increase the handoff duration.
+
+## Network and infrastructure
+
+Network conditions between the caller, Vapi, and your backend services affect end-to-end latency.
+
+### Reduce network hops
+
+- Deploy tool call backends in regions close to Vapi's infrastructure (US East)
+- Use direct connections rather than routing through multiple proxies or API gateways
+- Minimize TLS handshake overhead by using persistent connections where possible
+
+### Jitter buffers
+
+Vapi uses adaptive jitter buffers to smooth out network inconsistencies. These add a small, deliberate delay to prevent audio dropouts caused by packet timing variations. The buffer size adjusts automatically based on network conditions.
+
+For callers on unstable connections (mobile networks, international calls), slightly higher jitter buffering prevents audio quality issues at the cost of a few milliseconds of additional latency.
+
+## Platform features for low latency
+
+Vapi includes several platform-level features that reduce latency without additional configuration.
+
+### Weekly release channel
+
+The Weekly release channel gives you access to the latest platform optimizations and latency improvements before they reach the stable channel. Enable it in your Dashboard settings to receive performance improvements as they ship.
+
+<Note>
+The Weekly channel includes newer features that may have less testing than the stable release. Test thoroughly in a staging environment before enabling for production traffic.
+</Note>
+
+### Background audio prefetch
+
+Vapi begins TTS processing as soon as the first tokens arrive from the LLM, streaming audio generation in parallel with continued LLM output. This pipeline parallelism is automatic and requires no configuration.
+
+### Monitoring latency
+
+Track your agent's latency performance using Vapi's built-in analytics:
+
+- **Call logs:** Review per-call latency breakdowns in the Dashboard call detail view
+- **Analytics API:** Query aggregate latency metrics across calls to identify trends
+- **Webhook events:** Monitor `speech-update` and `conversation-update` events for real-time latency data
+
+## Optimization checklist
+
+Use this checklist to audit your voice agent's latency configuration:
+
+<Steps>
+  <Step title="Select an appropriate LLM">
+    Choose GPT-4.1 or equivalent for general use. Use smaller models for simple tasks.
+  </Step>
+  <Step title="Verify prompt caching">
+    Ensure your system prompt exceeds the provider's minimum token threshold for cache activation.
+  </Step>
+  <Step title="Configure endpointing">
+    Use smart endpointing for English, transcription-based for other languages.
+  </Step>
+  <Step title="Choose a low-latency TTS voice">
+    Try Vapi Voices (Beta) first. Fall back to streaming providers like Deepgram or PlayHT.
+  </Step>
+  <Step title="Optimize tool calls">
+    Set timeouts, cache data, and use async patterns for non-blocking operations.
+  </Step>
+  <Step title="Review squad size">
+    Limit squads to 2-5 members with direct routing to specialists.
+  </Step>
+  <Step title="Check network proximity">
+    Deploy backends near Vapi infrastructure. Minimize proxy layers.
+  </Step>
+  <Step title="Enable Weekly release channel">
+    Opt in to receive the latest latency optimizations.
+  </Step>
+</Steps>
+
+## Additional resources
+
+- **[Voice pipeline configuration](/customization/voice-pipeline-configuration):** Configure endpointing, interruptions, and conversation timing in detail.
+- **[Understanding call concurrency](/calls/call-concurrency):** Scale concurrent call capacity to handle production traffic.
+- **[Vapi engineering blog - Achieving sub-500ms voice AI](https://vapi.ai/blog/achieving-sub-500ms-voice-ai):** Deep dive into Vapi's latency optimization approach.
+- **[Vapi engineering blog - Real-time audio streaming](https://vapi.ai/blog/real-time-audio-streaming-at-scale):** Technical details on audio pipeline architecture.