1 Function | 2 Reference Model | 3 Input/Output Data |
4 SubAIMs | 5 JSON Metadata | 6 Profiles |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
The Speaker Identity Recognition (MMC-SIR) AIM receives an input speech and produces the identifier of the Entity producing the input speech. the (MMC-SIR) AIM may also receive auxiliary text connected with the input speech, the start and end time during which the identifier of the speaker Entity is requested, the Speech Overlap data type signaling if more than one speaker has produces the input speech and the Geometry of the Speech Scene:
Receives | Auxiliary Text | Text related to the Speech. |
Speech Object | Speech of which the Speaker is requested. | |
Speech Time | Time during whose duration Speaker ID is requested. | |
Speech Overlap | Data signaling which parts of Speech Data have overlapping speech. | |
Speech Scene Geometry | Disposition of Speech Data of the scene where the Speech whose speaker is to be identified is located. | |
Produces | Speaker Identifier | ID of speaker. |
2 Reference Model
The Reference Architecture of Speaker Identity Recognition (MMC-SIR) is depicted in Figure 1.
Figure 1 – The Speaker Identity Recognition (MMC-SIR) AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Speaker Identity Recognition (MMC-SIR) AIM.
Table 1 – I/O Data of the Speaker Identity Recognition (MMC-SIR) AIM
Input | Description |
Auxiliary Text Object | Text with content related to Speaker ID. |
Speech Object | Speech Object emitted by the Speaker. |
Speech Time | The start and end time of the Speech. |
Speech Overlap | Information about overlapping Speech. |
Speech Scene Geometry | Information about Speech Object location. |
Output | Description |
Speaker Identifier | The Visual Descriptors of the Visual Scene. |
4 SubAIMs
No SubAIMs
5 JSON Metadata
https://schemas.mpai.community/MMC/V2.4/AIMs/SpeakerIdentityRecognition.json
6 Profiles
No Profiles.
7. Reference Software
7.1 Disclaimers
- This MMC-SIR Reference Software Implementation is released with the BSD-3-Clause licence.
- The purpose of this MMC-SIR Reference Software is to show a working Implementation of MMC-SIR, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to the MMC-SIR code
MMC-SIR performs speaker verification with a pretrained ECAPA-TDNN model; that is, it identifies the speaker of each speech segment by comparison with a dataset consisting of short clips of human speech.
The MMC-SIR Reference Software is found at the MPAI gitlab site. It contains:
- src: a folder with the Python code implementing the AIM
- Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
- requirements.txt: dependencies installed in the Docker image
- README.md: commands for cloning https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
Library: https://github.com/speechbrain/speechbrain
7.3 Acknowledgements
This version of the MMC-SIR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).
8. Conformance Testing
Table 2 provides the Conformance Testing Method for MMC-SIR AIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for MMC-SIR AIM
Input | Text Object | Shall validate against Text Object schema. Auxiliary Text Data shall conform with Text Qualifier. |
Speech Object | Shall validate against Speech Object schema. Speech Data shall conform with Speech Qualifier. |
|
Speech Time | Shall validate against Time schema. | |
Speech Overlap | Shall validate against Speech Overlap schema. Speech Data shall conform with Speech Qualifier. |
|
Speech Scene Geometry | Shall validate against Speech Scene Geometry schema. | |
Output | Speaker Identifier | Shall validate against Instance ID schema. |
9. Performance Assessment
Performance Assessment of an MMC-SIR AIM Implementation shall be performed using a dataset of speech segments all in the same language, for each segment of which the Identity of the Speaker is provided with reference to a Taxonomy.
The Performance Assessment Report of an MMC-SIR AIM Implementation shall include:
- The Identifier of the MMC-SIR AIM.
- The Identifier of the speech segment dataset.
- The language of the speech segment dataset.
- The Taxonomy of Speaker Identifiers.
- The Performance of the MMC-SIR AIM expressed as the Accuracy of the Identifiers provided by the MMC-SIR AIM computed on all speech segments of the dataset referenced in 2.