Go To MPAI-MMC AI Modules

1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Audio Segmentation (MMC-AUS):

Receives Audio Object Audio to be segmented
Identifies Speech Time Time of audio segmentation
Extracts Target Speech Objects Speech Object to be extracted
Detects Speech Overlap Data Type about speech overlap
Produces Speech Time Duration of speech segment
Speech Overlap Data Type about speech overlap
Speech Objects Each Speech Object includes a Speaker’s Turn, i.e., one or more adjacent utterances from the same Speaker.

2     Reference Model

Figure 1 depicts the Reference Model of the Audio Segmentation (MMC-AUS) AIM.

Figure 1 – Reference Model of Audio Segmentation (MMC-AUS) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio Segmentation (MMC-AUS) AIM.

Table 1 – I/O Data of the Audio Segmentation (MMC-AUS) AIM

Input Description
Audio Object Input Audio in a file.
Output Description
Speaker Time Time one or more Speakers start speaking.
Speech Overlap Number of overlapping speakers.
Speech Object Speech Object containing the utterance(s) of the Speaker(s).

4     SubAIMs

No SubAIMs

5     JSON Metadata

https://schemas.mpai.community/MMC/V2.3/AIMs/AudioSegmentation.json

6     Profiles

No Profiles.

7     Reference Software

7.1    Disclaimers

  1. This MMC-AUS Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this MMC-AUS Reference Software is to show a working Implementation of OSD-AUS, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2    Guide to the code

MMC-AUS splits the input WAV file into speech segments – called speakers’ turns – a belonging to one – still unidentified speaker. See see “start and end times of each speaker’s turn, as well as the speaker labels” at https://www.aimodels.fyi/models/huggingFace/speaker-diarization-pyannote. A turn is defined as a sequence of one or more speech segments belonging to the same speaker. See https://dokumen.pub/speech-recognition-technology-and-applications-9798886971798.html.

Use of this Reference Software for MMC-AUS AI Module is for developers who are familiar with Python, Docker, RabbitMQ, and downloading models from HuggingFace.

The MMC-AUS Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
  3. requirements.txt: dependencies installed in the Docker image
  4. README.md: commands for cloning https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb and https://huggingface.co/pyannote/segmentation
  5. diar_conf.yaml: YML setting up a diarization pipeline. Copy it to $AI_FW_DIR/confs/mmc_aus

Library: https://github.com/pyannote/pyannote-audio

7.3    Acknowledgements

This version of the OSD-AUS Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8     Conformance Testing

Table 2 provides the Conformance Testing Method for Audio Segmentation (MMC-AUS) AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for Audio Segmentation (MPAI-MMC) AIM

Input Audio Object Shall validate against Audio Object schema.
Audio Data shall conform with Audio Qualifier.
Output Speaker Time Shall validate against Time schema.
Speech Overlap Shall validate against Speech Overlap schema.
Speech Object Shall validate against Speech Object schema.

9     Performance Assessment

Go To MPAI-MMC AI Modules