pacing.core.transcription_interfaces

Transcription interfaces for the PACING platform.

These interfaces define how audio is converted to text. Implementations can use various speech-to-text services (Deepgram, Whisper, Google Speech, etc.) or mock transcribers for testing.

class pacing.core.transcription_interfaces.ITranscriber[source]

Abstract interface for speech-to-text transcription.

This interface allows the system to support multiple transcription backends without changing the core logic. The transcriber is responsible for:

  1. Converting audio chunks to text

  2. Providing confidence scores for transcriptions

  3. Handling partial (streaming) transcriptions

  4. Speaker diarization (if supported)

Design Philosophy: - Transcribers should be stateless or manage their own state - They should handle their own buffering and context management - Confidence scores must be normalized to [0.0, 1.0]

get_model_info() dict[source]

Get information about the transcription model.

Returns:

Model metadata (name, version, language, etc.)

Return type:

dict

abstract supports_speaker_diarization() bool[source]

Check if this transcriber supports speaker diarization.

Returns:

True if speaker_id will be populated in TranscriptionResult

Return type:

bool

abstract async transcribe_chunk(audio_chunk: ndarray, sample_rate: int, is_final: bool = False) TranscriptionResult[source]

Transcribe a single audio chunk.

Parameters:
  • audio_chunk – Audio samples (typically float32 or int16)

  • sample_rate – Sample rate in Hz

  • is_final – Whether this is the final chunk in a sequence

Returns:

The transcription with confidence score

Return type:

TranscriptionResult

Notes

  • For streaming transcription, is_final=False produces partial results

  • Implementations should handle silence gracefully

  • Empty audio should return empty text with high confidence

async transcribe_stream(audio_stream: AsyncIterator[ndarray], sample_rate: int) AsyncIterator[TranscriptionResult][source]

Transcribe a stream of audio chunks.

This is a convenience method that processes an audio stream and yields transcription results. The default implementation calls transcribe_chunk() for each audio chunk.

Parameters:
  • audio_stream – Async iterator of audio chunks

  • sample_rate – Sample rate in Hz

Yields:

TranscriptionResult – Transcriptions as they become available

Example

async for result in transcriber.transcribe_stream(audio_stream, 16000):

print(f”{result.text} (confidence: {result.confidence_score})”)