Go To MPAI-MMC AI Modules

1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Speaker Identity Recognition (MMC-SIR):

Receives Auxiliary Text related to the Speech Object.
Speech Object of which the Speaker id requested.
Speech Time for which a Speaker ID is requested.
Speech Overlap signalling which parts of Speech Object have Speech Overlap
Speech Scene Geometry of the scene where the Speaker is located.
Produces Speaker Identifier

2     Reference Model

The Reference Architecture is depicted in Figure 1.

Figure 1 – The Speaker Identity Recognition AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Visual Scene Description AIM.

Table 1 – I/O Data of the Visual Scene Description AIM

Input Description
Auxiliary Text Text with content related to Speaker ID.
Speech Object Speech Object emitted by the Speaker.
Speech Time The start and end time of the Speech.
Speech Overlap Information about overlapping Speech.
Speech Scene Geometry Information about Speech Object location.
Output Description
Speaker Identifier The Visual Descriptors of the Visual Scene.

4     SubAIMs

No SubAIMs

5     JSON Metadata

https://schemas.mpai.community/MMC/V2.2/AIMs/SpeakerIdentityRecognition.json

6     Profiles

No Profiles.

7. Reference Software

7.1    Disclaimers

  1. This MMC-SIR Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this MMC-SIR Reference Software is to show a working Implementation of MMC-SIR, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2    Guide to the MMC-SIR code

MMC-SIR performs speaker verification with a pretrained ECAPA-TDNN model; that is, it identifies the speaker of each speech segment by comparison with a dataset consisting of short clips of human speech.

The MMC-SIR Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
  3. requirements.txt: dependencies installed in the Docker image
  4. README.md: commands for cloning https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb

Library: https://github.com/speechbrain/speechbrain

7.3    Acknowledgements

This version of the MMC-SIR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8. Conformance Testing

9. Performance Assessment

Go To MPAI-MMC AI Modules