Object and Scene Description (MPAI-OSD) is a project for a standard specifying technologies for object description and their localisation in space. Such technologies are used across several use cases of several MPAI standards.

Figure 1 gives two examples that assume the types of output to Audio and Visual Scene Descriptors.

Figure 1 – Audio and Visual Scene Description

The next Figure 2 provides one solution to the problem of assigning identifiers to the Objects – extracted from an audio-visual scene, especially for the purpose of identifying those that are audio-visual such as a human and their speech.

Figure 2 – Audio-Visual Alignment

Another example is provided by Figure 3.

Figure 3 – Visual Spatial Object Identification

Figure 4 is an example of the Conversation with Personal Status use case that makes use of all the (Composite) AI Modules described above.

Figure 4 – Reference Model of Conversation with Personal Status (MPAI-CPS)

