1 Function | 2 Reference Model | 3 Input/Output Data |
4 SubAIMs | 5 JSON Metadata | 6 Profiles |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
Speaker Identity Recognition (MMC-SIR):
Receives | Auxiliary Text | Text related to the Speech. |
Speech Object | Speech of which the Speaker id requested. | |
Speech Time | Time during whose duration Speaker ID is requested. | |
Speech Overlap | Data signaling which parts of Speech Data have overlapping speech. | |
Speech Scene Geometry | Disposition of Speech Data of the scene where the Speech whose speaker is to be identified is located. | |
Produces | Speaker Identifier | ID of speaker. |
2 Reference Model
The Reference Architecture of Speaker Identity Recognition (MMC-SIR) is depicted in Figure 1.
Figure 1 – The Speaker Identity Recognition (MMC-SIR) AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Speaker Identity Recognition (MMC-SIR) AIM.
Table 1 – I/O Data of the Speaker Identity Recognition (MMC-SIR) AIM
Input | Description |
Auxiliary Text | Text with content related to Speaker ID. |
Speech Object | Speech Object emitted by the Speaker. |
Speech Time | The start and end time of the Speech. |
Speech Overlap | Information about overlapping Speech. |
Speech Scene Geometry | Information about Speech Object location. |
Output | Description |
Speaker Identifier | The Visual Descriptors of the Visual Scene. |
4 SubAIMs
No SubAIMs
5 JSON Metadata
https://schemas.mpai.community/MMC/V2.3/AIMs/SpeakerIdentityRecognition.json
6 Profiles
No Profiles.
7. Reference Software
7.1 Disclaimers
- This MMC-SIR Reference Software Implementation is released with the BSD-3-Clause licence.
- The purpose of this MMC-SIR Reference Software is to show a working Implementation of MMC-SIR, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to the MMC-SIR code
MMC-SIR performs speaker verification with a pretrained ECAPA-TDNN model; that is, it identifies the speaker of each speech segment by comparison with a dataset consisting of short clips of human speech.
The MMC-SIR Reference Software is found at the MPAI gitlab site. It contains:
- src: a folder with the Python code implementing the AIM
- Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
- requirements.txt: dependencies installed in the Docker image
- README.md: commands for cloning https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
Library: https://github.com/speechbrain/speechbrain
7.3 Acknowledgements
This version of the MMC-SIR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).
8. Conformance Testing
Table 2 provides the Conformance Testing Method for MMC-SIR AIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for MMC-SIR AIM
Input | Text Object | Shall validate against Text Object schema. Auxiliary Text Data shall conform with Text Qualifier. |
Speech Object | Shall validate against Speech Object schema. Speech Data shall conform with Speech Qualifier. |
|
Speech Time | Shall validate against Time schema. | |
Speech Overlap | Shall validate against Speech Overlap schema. Speech Data shall conform with Speech Qualifier. |
|
Speech Scene Geometry | Shall validate against Speech Scene Geometry schema. | |
Output | Speaker Identifier | Shall validate against Instance ID schema. |
9. Performance Assessment
Performance Assessment of an MMC-SIR AIM Implementation shall be performed using a dataset of speech segments all in the same language, for each segment of which the Identity of the Speaker is provided with reference to a Taxonomy.
The Performance Assessment Report of an MMC-SIR AIM Implementation shall include:
- The Identifier of the MMC-SIR AIM.
- The Identifier of the speech segment dataset.
- The language of the speech segment dataset.
- The Taxonomy of Speaker Identifiers.
- The Performance of the MMC-SIR AIM expressed as the Accuracy of the Identifiers provided by the MMC-SIR AIM computed on all speech segments of the dataset referenced in 2.