LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Table of Contents
  • References
  • Installation and Setup
  • Windows Users: Important Note
  • Verification
  • Important Note About Package Installation
  • Verifying Package Installation
  • Audio Device Configuration
  • Audio Device Selection and Testing
  • Speech Recognition Setup
  • Basic Usage
  • Step 1: Record Audio from Your Microphone
  • Step 2: Convert Speech to Text
  • Step 3: Transform Text into SQL Queries
  • Step 4: Putting It All Together
  • Example Queries
  • Advanced Usage and Troubleshooting
  • Common Issues and Solutions
  1. 19-Cookbook
  2. 01-SQL

SpeechToSQL

PreviousTextToSQLNext02-RecommendationSystem

Last updated 28 days ago

  • Author:

  • Peer Review : ,

  • Proofread :

  • This is a part of

Overview

The Speech to SQL system is a powerful tool that converts spoken language into SQL queries. It combines advanced speech recognition with natural language processing to enable hands-free database interactions.

Key Features:

  • Real-time Speech Processing: Captures and processes voice input in real-time, supporting various microphone configurations.

  • Accurate Speech Recognition: Uses Whisper model for reliable speech-to-text conversion with support for clear English queries.

  • SQL Query Generation: Transforms natural language questions into properly formatted SQL queries.

System Requirements:

  • Python 3.8 or higher

  • Working microphone

Table of Contents

References

Installation and Setup

Before we begin, let's install all necessary packages. This tutorial requires several Python packages for speech processing, SQL operations, and machine learning:

  1. LangChain Components:

    • langchain-community: Core LangChain functionality and community components

    • langchain-openai: OpenAI integration

    • langchain-core: Essential LangChain components

  2. Database and API:

    • openai: For OpenAI API access

    • sqlalchemy: For database operations

    • python-dotenv: For environment variable management

    • torch: For faster-whisper

  3. Audio Processing:

    • sounddevice: For audio capture

    • numpy: For data processing

    • wavio: For audio file handling

    • faster-whisper: For speech recognition

  4. Additional dependencies:

    • blosc2: For data compression

    • cython: For Python-C integration

    • black: For code formatting

After running the installation cell, you may need to restart the kernel for the changes to take effect. We'll verify the installation in the next step.

Windows Users: Important Note

If you encounter a permission error during installation such as "Access is denied", you have two options:

  1. Use the --user option with pip (recommended):

    • This installs packages in your user directory, avoiding permission issues

    • We've already included this option in the installation command

  2. Alternative: Run Jupyter as Administrator:

    • Only if the first option doesn't work

    • Right-click on Jupyter Notebook

    • Select "Run as administrator"

    • Then try the installation again

After installation, you'll need to restart the kernel regardless of which method you use.

Verification

After installation and kernel restart, run the verification cell below to ensure everything is set up correctly:

Run the following cells to install all required packages:

%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages with compatible versions
# -*- coding: utf-8 -*-
import subprocess
import sys

def install_packages():
   packages = [
       'langchain-core>=0.3.29,<0.4.0',
       'langchain-community==0.0.24',
       'langchain-openai==0.0.5', 
       'openai==1.12.0',
       'sqlalchemy==2.0.27',
       'python-dotenv==1.0.1',
       'sounddevice==0.4.6',
       'numpy==1.24.3',
       'wavio==0.0.8',
       'faster-whisper==0.10.0',
       'blosc2~=2.0.0',
       'cython>=0.29.21',
       'black>=22.3.0'
   ]
   
   for package in packages:
       try:
           subprocess.check_call([sys.executable, '-m', 'pip', 'install', package, '--quiet'])
       except subprocess.CalledProcessError:
           print(f"Failed to install {package}")
           continue
   
   print("✓ Installation complete!")

install_packages()
✓ Installation complete!

Important Note About Package Installation

After running the installation cell, you might see messages like: This is normal! Here's what you need to do:

'Note: you may need to restart the kernel to use updated packages.'

  1. First, look for the "✓ All packages installed successfully!" message to confirm the installation worked

  2. Then, restart the Jupyter kernel to ensure all packages are properly loaded:

    • Click on the "Kernel" menu at the top

    • Select "Restart Kernel..."

    • Click "Restart" when prompted

After restarting the kernel, run the following verification cell to make sure everything is set up correctly:

Now let's verify that everything is ready to use:

try:
    import sounddevice as sd
    import numpy as np
    from faster_whisper import WhisperModel
    print("✓ All set! Let's move on to the next step.")
except ImportError as e:
    print(f"✗ Something's missing. Please try running the installation command again.")
✓ All set! Let's move on to the next step.

Verifying Package Installation

After installing the packages and restarting the kernel, let's verify that everything is set up correctly.

If you see any ✗ marks, it means that package wasn't installed correctly. Try these steps:

  1. Run the installation cell again

  2. Restart the kernel

  3. Run the verification cell again

If you still see errors, make sure you have sufficient permissions and a stable internet connection.

# Import necessary libraries
import sounddevice as sd
import numpy as np
import wavio
import os
import time
from faster_whisper import WhisperModel
import torch
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)
True

Audio Device Configuration

A crucial first step is selecting the correct audio input device. Let's identify and configure your system's microphone.

Note: You'll see a filtered list of input devices only, making it easier to choose the correct microphone.

def list_audio_input_devices():
    """Display only audio input devices with clear formatting."""
    print("\nAvailable Audio Input Devices:")
    print("=" * 50)
    
    input_devices = []
    for idx, device in enumerate(sd.query_devices()):
        if device['max_input_channels'] > 0:  # Only show input devices
            # Skip duplicate devices (different APIs)
            device_name = device['name'].split(',')[0]  # Remove API information
            if not any(d['name'].startswith(device_name) for d in input_devices):
                input_devices.append({
                    'index': idx,
                    'name': device_name,
                    'channels': device['max_input_channels'],
                    'sample_rate': device['default_samplerate']
                })
                
                print(f"Device {idx}: {device_name}")
                print(f"  Channels: {device['max_input_channels']}")
                print(f"  Sample Rate: {device['default_samplerate']}Hz")
                print("-" * 50)
    
    return input_devices

# List available input devices
input_devices = list_audio_input_devices()
    Available Audio Input Devices:
    ==================================================
    Device 0: Microsoft 사운드 매퍼 - Input
      Channels: 2
      Sample Rate: 44100.0Hz
    --------------------------------------------------
    Device 1: 마이크 배열(디지털 마이크용 인텔® 스마트 사운드 기술)
      Channels: 2
      Sample Rate: 44100.0Hz
    --------------------------------------------------
    Device 4: 주 사운드 캡처 드라이버
      Channels: 2
      Sample Rate: 44100.0Hz
    --------------------------------------------------
    Device 8: Realtek ASIO
      Channels: 2
      Sample Rate: 44100.0Hz
    --------------------------------------------------
    Device 13: PC Speaker (Realtek HD Audio output with SST)
      Channels: 2
      Sample Rate: 48000.0Hz
    --------------------------------------------------
    Device 14: Input 1 (Realtek HD Audio Mic input with SST)
      Channels: 2
      Sample Rate: 48000.0Hz
    --------------------------------------------------
    Device 15: Input 2 (Realtek HD Audio Mic input with SST)
      Channels: 4
      Sample Rate: 16000.0Hz
    --------------------------------------------------
    Device 16: Stereo Mix (Realtek HD Audio Stereo input)
      Channels: 2
      Sample Rate: 48000.0Hz
    --------------------------------------------------
    Device 18: Headset (@System32\drivers\bthhfenum.sys
      Channels: 1
      Sample Rate: 8000.0Hz
    --------------------------------------------------
    Device 19: Microphone Array 1 ()
      Channels: 2
      Sample Rate: 48000.0Hz
    --------------------------------------------------
    Device 20: Microphone Array 2 ()
      Channels: 4
      Sample Rate: 16000.0Hz
    --------------------------------------------------
    Device 22: Headset Microphone (@System32\drivers\bthhfenum.sys
      Channels: 1
      Sample Rate: 8000.0Hz
    --------------------------------------------------
def test_audio_device(device_index, duration=1):
    """
    Test if an audio device works properly.
    Args:
        device_index (int): The index of the device to test
        duration (float): Test duration in seconds
    Returns:
        bool: True if device works, False otherwise
    """
    try:
        print(f"Testing audio device {device_index}...")
        with sd.InputStream(device=device_index, channels=1, samplerate=16000):
            print("✓ Device initialized successfully")
            return True
    except Exception as e:
        print(f"✗ Device test failed: {str(e)}")
        return False

Audio Device Selection and Testing

After viewing the available devices above, you'll need to select and test your microphone. Choose a device with input channels (marked as "Channels: X" where X > 0).

Important Tips:

  • Choose a device with clear device name (avoid generic names like "Default Input")

  • Prefer devices with 1 or 2 input channels

  • If using a USB microphone, make sure it's properly connected

  • Test the device before proceeding to actual recording

# Let's test the first available input device as default
if input_devices:
    default_device = input_devices[0]
    print(f"\nTesting default device: {default_device['name']}")
    if test_audio_device(default_device['index']):
        # Set as default device
        os.environ['DEFAULT_DEVICE'] = str(default_device['index'])
        os.environ['SAMPLE_RATE'] = str(int(default_device['sample_rate']))
        print(f"\nDefault device set to: {default_device['name']}")
        print(f"Sample rate: {default_device['sample_rate']}Hz")
    else:
        print("\nPlease select a different device and try again.")
else:
    print("\nNo input devices found. Please check your microphone connection.")
    Testing default device: Microsoft 사운드 매퍼 - Input
    Testing audio device 0...
    ✓ Device initialized successfully
    
    Default device set to: Microsoft 사운드 매퍼 - Input
    Sample rate: 44100.0Hz

Speech Recognition Setup

Now let's set up the speech recognition component using the Whisper model.

Note: The first time you run this, it will download the Whisper model. This might take a few minutes depending on your internet connection.

def initialize_whisper():
    """Initialize the Whisper model."""
    try:
        # Initialize Whisper model with base configuration
        model = WhisperModel(
            model_size_or_path="base",  # Using 'base' model for faster CPU processing
            device="cpu",
            compute_type="int8"  # Optimized for CPU
        )
        print("✓ Whisper model initialized successfully")
        return model
    except Exception as e:
        print(f"✗ Error initializing Whisper model: {str(e)}")
        print("Please make sure all packages are installed correctly.")
        return None

model = initialize_whisper()
✓ Whisper model initialized successfully

Basic Usage

Let's implement the core components for speech-to-SQL conversion. We'll create a robust system that can:

  1. Record audio from your microphone

  2. Convert speech to text

  3. Transform the text into SQL queries

Step 1: Record Audio from Your Microphone

The AudioRecorder class records audio input from the user's microphone and saves it as a temporary audio file.

# 1. Record audio from your microphone
import sounddevice as sd
import numpy as np
import wavio
import tempfile

class AudioRecorder:
    def __init__(self):
        self._samplerate = 16000
        self.audio_data = []
        self.recording = False
        self.stream = None

    def start_recording(self, device_id=0):
        """Start recording audio"""
        try:
            self.stream = sd.InputStream(
                channels=1,
                samplerate=self._samplerate,
                callback=self._audio_callback,
                device=device_id
            )
            self.audio_data = []
            self.recording = True
            self.stream.start()
            print("Recording started. Speak now!")
            return True
        except Exception as e:
            print(f"Recording failed: {str(e)}")
            return False

    def _audio_callback(self, indata, frames, time, status):
        if self.recording:
            self.audio_data.append(indata.copy())

    def stop_and_process(self):
        """Stop recording and save audio data"""
        if self.stream:
            self.stream.stop()
            self.stream.close()
            self.recording = False
            if len(self.audio_data) > 0:
                audio = np.concatenate(self.audio_data)
                with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmpfile:
                    wavio.write(tmpfile.name, audio, self._samplerate, sampwidth=2)
                return tmpfile.name
        return None

Step 2: Convert Speech to Text

We use the Whisper model for accurate transcription of recorded audio into text.

# 2. Convert speech to text

from faster_whisper import WhisperModel

def initialize_whisper():
    """Initialize the Whisper model with English language setting"""
    return WhisperModel("base", device="cpu", compute_type="int8")

class AudioProcessor:
    def __init__(self, model):
        self.model = model

    def transcribe_audio(self, audio_file):
        """Transcribe audio to text using Whisper with English language enforcement"""
        try:
            segments, _ = self.model.transcribe(audio_file, language="en")  
            return " ".join([segment.text for segment in segments])
        except Exception as e:
            print(f"Transcription failed: {str(e)}")
            return None

Step 3: Transform Text into SQL Queries

We use the LangChain library to transform natural language text into SQL queries.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage  
from langchain_openai import ChatOpenAI

class SQLQueryGenerator:
   def __init__(self):
       self.llm = ChatOpenAI(model="gpt-4o", temperature=0)
       self.template = ChatPromptTemplate.from_messages([
           ("system", "You are an SQL query generator."),
           ("human", "{query_text}")
       ])

   def generate_sql(self, query_text):
       try:
           prompt = self.template.format_messages(query_text=query_text)
           result = self.llm.invoke(prompt)
           return result.content
       except Exception as e:
           return f"Error: {str(e)}"

Step 4: Putting It All Together

Finally, we combine all the components into a single process that listens for audio input, transcribes it, and generates an SQL query.

import time

def process_speech_to_sql(duration=5):
    """Main function for speech-to-SQL conversion"""
    print("\n=== Starting Speech to SQL Process ===")
    print("Recording will start in:")
    for i in range(3, 0, -1):
        print(f"{i}...")
        time.sleep(1)

    # Step 1: Record Audio
    recorder = AudioRecorder()
    if recorder.start_recording():
        print("\nSpeak your query now... (10 seconds)")
        time.sleep(duration)
        audio_file = recorder.stop_and_process()
        print(f"Saved audio file: {audio_file}")

        # Step 2: Speech-to-Text
        if audio_file:
            model = initialize_whisper()
            processor = AudioProcessor(model)
            print("Processing audio...")
            text = processor.transcribe_audio(audio_file)
            print(f"Transcribed Text: {text}")

            # Step 3: SQL Query Generation
            sql_generator = SQLQueryGenerator()
            sql_query = sql_generator.generate_sql(text)
            print(f"Generated SQL Query: {sql_query}")
            return sql_query

Let's try it out! Run this command to start recording:

query_text = process_speech_to_sql(duration=10)  # 10 seconds recording
    === Starting Speech to SQL Process ===
    Recording will start in:
    3...
    2...
    1...
    Recording started. Speak now!
    
    Speak your query now... (10 seconds)
    Saved audio file: C:\Users\rhkre\AppData\Local\Temp\tmp47m8xvlx.wav
    Processing audio...
    Transcribed Text:  Find Top 10 Customers by Revenue
    Generated SQL Query: To find the top 10 customers by revenue, you would typically need a table that contains customer information and a table that records transactions or orders, including the revenue generated by each transaction. Assuming you have a `customers` table and an `orders` table, where the `orders` table includes a `customer_id` and a `revenue` column, you can use the following SQL query:
    
    ```sql
    SELECT c.customer_id, c.customer_name, SUM(o.revenue) AS total_revenue
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.customer_name
    ORDER BY total_revenue DESC
    LIMIT 10;
    ```
    
    This query does the following:
    - Joins the `customers` table with the `orders` table on the `customer_id`.
    - Groups the results by each customer to calculate the total revenue for each customer.
    - Orders the results in descending order based on the total revenue.
    - Limits the results to the top 10 customers. 
    
    Make sure to replace `customer_name` and `revenue` with the actual column names used in your database schema if they differ.

Example Queries

Here are some example queries you can try with the system:

  1. "Show sales figures for the last quarter"

  2. "Find top 10 customers by revenue"

  3. "List all products with inventory below 100 units"

  4. "Calculate total sales by region"

  5. "Get employee performance metrics for 2023"

These queries demonstrate the range of SQL operations our system can handle.

Advanced Usage and Troubleshooting

Common Issues and Solutions

  1. No audio device found

    • Check if your microphone is properly connected

    • Try unplugging and reconnecting your microphone

    • Verify microphone permissions in your OS settings

  2. Poor recognition accuracy

    • Speak clearly and at a moderate pace

    • Minimize background noise

    • Keep the microphone at an appropriate distance

  3. Device initialization errors

    • Try selecting a different audio device

    • Restart your Python kernel

    • Check if another application is using the microphone


Faster Whisper Documentation > Python API Reference
SoundDevice Documentation > Python API Reference
Wavio Documentation > Audio File Handling
NumPy Documentation > Audio Processing
Dooil Kwak
Ilgyun Jeong
Jaehun Choi
Juni Lee
LangChain Open Tutorial
Overview
Installation and Setup
Audio Device Configuration
Speech Recognition Setup
Basic Usage
Advanced Usage and Troubleshooting