1 Function | 2 Reference Model | 3 Input/Output Data |
4 SubAIMs | 5 JSON Metadata | 6 Profiles |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
Speaker Identity Recognition (MMC-SIR):
Receives | Auxiliary Text related to the Speech Object. |
Speech Object of which the Speaker id requested. | |
Speech Time for which a Speaker ID is requested. | |
Speech Overlap signalling which parts of Speech Object have Speech Overlap | |
Speech Scene Geometry of the scene where the Speaker is located. | |
Produces | Speaker Identifier |
2 Reference Model
The Reference Architecture is depicted in Figure 1.
Figure 1 – The Speaker Identity Recognition AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Visual Scene Description AIM.
Table 1 – I/O Data of the Visual Scene Description AIM
Input | Description |
Auxiliary Text | Text with content related to Speaker ID. |
Speech Object | Speech Object emitted by the Speaker. |
Speech Time | The start and end time of the Speech. |
Speech Overlap | Information about overlapping Speech. |
Speech Scene Geometry | Information about Speech Object location. |
Output | Description |
Speaker Identifier | The Visual Descriptors of the Visual Scene. |
4 SubAIMs
No SubAIMs
5 JSON Metadata
https://schemas.mpai.community/MMC/V2.2/AIMs/SpeakerIdentityRecognition.json
6 Profiles
No Profiles.
7. Reference Software
7.1 Disclaimers
- This MMC-SIR Reference Software Implementation is released with the BSD-3-Clause licence.
- The purpose of this MMC-SIR Reference Software is to show a working Implementation of MMC-SIR, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to the MMC-SIR code
MMC-SIR performs speaker verification with a pretrained ECAPA-TDNN model; that is, it identifies the speaker of each speech segment by comparison with a dataset consisting of short clips of human speech.
The MMC-SIR Reference Software is found at the MPAI gitlab site. It contains:
- src: a folder with the Python code implementing the AIM
- Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
- requirements.txt: dependencies installed in the Docker image
- README.md: commands for cloning https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
Library: https://github.com/speechbrain/speechbrain
7.3 Acknowledgements
This version of the MMC-SIR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).