pacing.core.transcription_interfaces
Transcription interfaces for the PACING platform.
These interfaces define how audio is converted to text. Implementations can use various speech-to-text services (Deepgram, Whisper, Google Speech, etc.) or mock transcribers for testing.
- class pacing.core.transcription_interfaces.ITranscriber[source]
Abstract interface for speech-to-text transcription.
This interface allows the system to support multiple transcription backends without changing the core logic. The transcriber is responsible for:
Converting audio chunks to text
Providing confidence scores for transcriptions
Handling partial (streaming) transcriptions
Speaker diarization (if supported)
Design Philosophy: - Transcribers should be stateless or manage their own state - They should handle their own buffering and context management - Confidence scores must be normalized to [0.0, 1.0]
- get_model_info() dict[source]
Get information about the transcription model.
- Returns:
Model metadata (name, version, language, etc.)
- Return type:
dict
- abstract supports_speaker_diarization() bool[source]
Check if this transcriber supports speaker diarization.
- Returns:
True if speaker_id will be populated in TranscriptionResult
- Return type:
bool
- abstract async transcribe_chunk(audio_chunk: ndarray, sample_rate: int, is_final: bool = False) TranscriptionResult[source]
Transcribe a single audio chunk.
- Parameters:
audio_chunk – Audio samples (typically float32 or int16)
sample_rate – Sample rate in Hz
is_final – Whether this is the final chunk in a sequence
- Returns:
The transcription with confidence score
- Return type:
Notes
For streaming transcription, is_final=False produces partial results
Implementations should handle silence gracefully
Empty audio should return empty text with high confidence
- async transcribe_stream(audio_stream: AsyncIterator[ndarray], sample_rate: int) AsyncIterator[TranscriptionResult][source]
Transcribe a stream of audio chunks.
This is a convenience method that processes an audio stream and yields transcription results. The default implementation calls transcribe_chunk() for each audio chunk.
- Parameters:
audio_stream – Async iterator of audio chunks
sample_rate – Sample rate in Hz
- Yields:
TranscriptionResult – Transcriptions as they become available
Example
- async for result in transcriber.transcribe_stream(audio_stream, 16000):
print(f”{result.text} (confidence: {result.confidence_score})”)