5.5.9 Audio-Visual Basic Scene Descriptors.
5.5.10 Audio-Visual Scene Descriptors.
5.5.11 Audio Scene Descriptors.
5.5.12 Visual Scene Descriptors.
5.5.26 Ego-Remote HCI Messages
MPAI has already issued a Technical Specification of a subset of CAV-HCI Data Types in [11]. This section will heavily draw on that specification.
Data obtained from Audio, Visual, and LiDAR sensors are used by HCI. However, their specification is delegated to the Environment Sensing Subsystem.
1.1.1 Personal Preferences
1.1.1.1 Definition
Personal Preferences includes passenger-specific preferences that enable an HCI to have access to information that facilitates human-HCI interaction. This is particularly useful when the passenger uses a rented CAV.
1.1.1.2 Functional Requirements
The data in the Personal Preferences should include:
- Language
- Seat position.
- Mirror position.
- Display characteristics.
- Preferred driving style.
- Preferential routes.
- Preferred information sources.
- Preferred entertainment sources
- …
1.1.1.3 Syntax
https://schemas.mpai.community/CAV2/V1.0/data/PersonalPreferences.json
1.1.1.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-PPR-V” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Byte | Minor version – 1 or 2 characters |
humanID | N4 Bytes | ID of the human the Personal Profile refers to. |
PersonalPreferenceID | N5 Bytes | ID of Personal Profile. |
PersonalPreferences | N6 Bytes | Set of Personal Preferences. |
· Language | N7 Bytes | Preferred Language. |
· Seat position. | N8 Bytes | Preferred seat position. |
· Mirror position. | N9 Bytes | Preferred mirror position |
· Display characteristics. | N10 Bytes | Preferred display characteristics |
· Preferred driving style. | N11 Bytes | Preferred driving style |
· Preferential routes | N12 Bytes | Preferred routes. |
DescrMetadata | N13 Bytes | Descriptive Metadata |
1.1.1.5 Data Formats
The following Data Formats are identified:
- Language Preference.
- Seat position.
- Mirror position.
- Display characteristics.
- Preferred driving style.
- Preferred routes.
1.1.1.6 To Respondents
Respondents are invited to:
- Comment and elaborate on or extend the functional requirements of the Personal Preferences identified above.
- Propose representation formats of the Personal Preferences.
- Propose new Personal Preferences and their formats.
1.1.2 Personal Profile
This specification is shared with the planned Technical Specification: MPAI-Metaverse Model (MPAI-MMM) – Technologies (MMM-TEC) V1.0.
1.1.2.1 Definition
Data identifying a human.
1.1.2.2 Functional Requirements
Personal Profile includes humanID and First Name, Last Name, Age, Nationality, and Email of the human.
1.1.2.3 Syntax
https://schemas.mpai.community/CAV2/V1.0/data/PersonalProfile.json
1.1.2.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-PPR-V” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Byte | Minor version – 1 or 2 characters |
humanID | N5 Bytes | ID of the human the Personal Profile refers to. |
PersonalProfileID | N6 Bytes | ID of Personal Profile. |
PersonalProfile | N7 Bytes | The number of Bytes composing the Personal Profile. |
· First Name | N8 Bytes | The human’s given name |
· Last Name | N9 Bytes | The human’s family name |
· Age | N10 Bytes | The human’s age |
· Nationality | N11 Bytes | The human’s country |
N12 Bytes | The human’s address | |
DescrMetadata | N13 Bytes | Descriptive Metadata |
1.1.2.5 Data Formats
No Data Formats involved, save for Text for which MPAI has selected ISO/IEC 10646.
1.1.2.6 To Respondents
MPAI requests comments on the Functional Requirements of Personal Profile.
1.1.3 Input Text
1.1.3.1 Definition
Textual information represented using a Character Set.
1.1.3.2 Functional Requirements
The Character Set should be able to represent the characters for most of the currently used languages.
MPAI has selected the Text Format [15].
1.1.3.3 Data Formats
No new Data Formats required.
1.1.3.4 To Respondents
Respondents are invited to comment or elaborate on the MPAI selection of Text.
1.1.4 Spatial Attitude
1.1.4.1 Definition
A Data Type representing an Object’s Position, Orientation and their velocity and acceleration.
1.1.4.2 Functional Requirements
- The Position of an Object is that of a representative point in the Object.
- Cartesian and Polar Coordinate Systems are supported.
- The following media used types are supported: Audio; Visual; Audio-Visual; Haptic; Smell; RADAR; LiDAR; Ultrasound.
- Error is measured as a percentage of the value of each of CartPosition, SpherPosition, Orientation, CartVelocity, SpherVelocity, OrientVelocity, CartAccel, SpherAccel, OrientAccel.
- Error is assumed to be the same for the three components of each value set.
1.1.4.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/SpatialAttitude.json
https://schemas.mpai.community/OSD/V1.1/data/CoordinateTypeID.json
https://schemas.mpai.community/OSD/V1.1/data/ObjectTypeID.json
https://schemas.mpai.community/OSD/V1.1/data/MediaTypeID.json”
1.1.4.4 Semantics
Table 7 provides the semantics of the components of the Spatial Attitude. The following should be noted:
- Each of Position, Velocity, and Acceleration is provided either in Cartesian (X,Y,Z) or Spherical (r,φ,θ) Coordinates.
- The Euler angles are indicated by (α,β,γ).
Table 7 – Components of the Spatial Attitude
Header | N1 Bytes | ||||||||||||||||||
· Standard-Spatial Attitude | 9 Bytes | The characters “OSD-OSA-V” | |||||||||||||||||
· Version | N2 Bytes | Major version – 1 or 2 characters | |||||||||||||||||
· Dot-separator | 1 Byte | The character “.” | |||||||||||||||||
· Subversion | N3 Bytes | Minor version – 1 or 2 characters | |||||||||||||||||
ObjectSpatialAttitudeID | N4 Bytes | Identifier of Object Spatial Attitude. | |||||||||||||||||
General | |||||||||||||||||||
· CoordType | bit 0 | 0: Cartesian, 1: Spherical | |||||||||||||||||
· ObjectTypeID | bit 1-2 | 00: Digital Human
01: Generic 10 and 11: reserved |
|||||||||||||||||
· MediaTypeID | bit 3-5 | 000: Audio; 001: Visual; 010: Audio-Visual; 011: Haptic; 100: Smell; 101: RADAR; 110 LiDAR; 111: Ultrasound. | |||||||||||||||||
· Precision | bit 6 | 0: single precision; 1: double precision | |||||||||||||||||
· Reserved | bit 7 | reserved | |||||||||||||||||
· SpatialAttitudeMask | 2 Bytes | 3*3 matrix of booleans (by rows)
|
|||||||||||||||||
Position and Orientation | |||||||||||||||||||
· CartPosition (X,Y,Z) | 12/24 Bytes | Array (in metres) | |||||||||||||||||
· SpherPosition (r,φ,θ) | 12/24 Bytes | Array (in metres and degrees) | |||||||||||||||||
· Orient (α,β,γ) | 12/24 Bytes | Array (in degrees) | |||||||||||||||||
Velocity of Position and Orientation | |||||||||||||||||||
· CartVelocity (X,Y,Z) | 12/24 Bytes | Array (in metres) | |||||||||||||||||
· SpherVelocity (r,φ,θ) | 12/24 Bytes | Array (in metres and degrees) | |||||||||||||||||
· OrientVelocity (α,β,γ) | 12/24 Bytes | Array (in degrees) | |||||||||||||||||
Acceleration of Position and Orientation | |||||||||||||||||||
· CartAccel (X,Y,Z) | 12/24 Bytes | Array (in metres) | |||||||||||||||||
· SpherAccel (r,φ,θ) | 12/24 Bytes | Array (in metres and degrees) | |||||||||||||||||
· OrientAccel (α,β,γ) | 12/24 Bytes | Array (in degrees) | |||||||||||||||||
Errors | |||||||||||||||||||
· ErrCartPosition | N5 Bytes | Err/CartPosition*100 | |||||||||||||||||
· ErrSpherPosition | N6 Bytes | Err/SpherPosition*100 | |||||||||||||||||
· ErrOrientation | N7 Bytes | Err/Orientation*100 | |||||||||||||||||
· ErrCartVelocity | N8 Bytes | Err/CartVelocity*100 | |||||||||||||||||
· ErrSpherVelocity | N9 Bytes | Err/SpherVelocity*100 | |||||||||||||||||
· ErrOrientVelocity | N10 Bytes | Err/OrientVelocity*100 | |||||||||||||||||
· ErrCartAccel | N11 Bytes | Err/CartAccel*100 | |||||||||||||||||
· ErrSpherAccel | N12 Bytes | Err/SpherAccel*100 | |||||||||||||||||
· ErrOrientAccel | N13 Bytes | Err/OrientAccel*100 |
1.1.4.5 Data Types
Object Types, Media Types, and Coordinate Types are required.
1.1.4.6 To Respondents
Respondents are requested to comment on Functional Requirements and Object Type ID, Media Type ID, and Coordinate Type ID.
1.1.5 Point of View
1.1.5.1 Definition
Position and Orientation of an Object, e.g., a Virtual Human in a Virtual Environment.
1.1.5.2 Functional Requirements
The static subset of Spatial Attitude.
1.1.5.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/SpatialAttitude.json
https://schemas.mpai.community/OSD/V1.1/data/CoordinateTypeID.json
https://schemas.mpai.community/OSD/V1.1/data/ObjectTypeID.json
https://schemas.mpai.community/OSD/V1.1/data/MediaTypeID.json
1.1.5.4 Semantics
Table 7 provides the semantics of the components of Point of View. The following should be noted:
- Each of Position, Velocity, and Acceleration is provided either in Cartesian (X,Y,Z) or Spherical (r,φ,θ) Coordinates.
- The Euler angles are indicated by (α,β,γ).
Table 8 – Semantics of Point of View
Header | N1 Bytes | Point of View Data Type header | |||||||||
· Standard-Point of View | 9 Bytes | The characters “OSD-OPV-V” | |||||||||
· Version | N2 Bytes | Major version – 1 or 2 characters | |||||||||
· Dot-separator | 1 Byte | The character “.” | |||||||||
· Subversion | N3 Bytes | Minor version – 1 or 2 characters | |||||||||
PointOfViewID | N4 Bytes | Identifier of Object Point of View. | |||||||||
General | |||||||||||
· CoordType | bit 0 | 0: Cartesian, 1: Spherical | |||||||||
· ObjectType | bit 1-2 | 00: Digital Human
01: Generic 10 and 11: reserved |
|||||||||
· MediaType | bit 3-5 | 000: Audio; 001: Visual; 010: Audio-Visual; 011: Haptic; 100: Smell; 101: RADAR; 110 LiDAR; 111: Ultrasound. | |||||||||
· Precision | bit 6 | 0: single precision; 1: double precision | |||||||||
· Reserved | bit 7 | reserved | |||||||||
· PointOfViewMask | 2 Bytes | Vector of booleans
|
|||||||||
Position and Orientation | |||||||||||
· CartPosition (X,Y,Z) | 12/24 Bytes | Array (in metres) | |||||||||
· SpherPosition (r,φ,θ) | 12/24 Bytes | Array (in metres and degrees) | |||||||||
· Orient (α,β,γ) | 12/24 Bytes | Array (in degrees) | |||||||||
Errors | |||||||||||
· ErrCartPosition | N5 Bytes | Err/CartPosition*100 | |||||||||
· ErrSpherPosition | N6 Bytes | Err/SpherPosition*100 | |||||||||
· ErrOrientation | N7 Bytes | Err/Orientation*100 | |||||||||
DescrMetadata | N8 Bytes | Descriptive Metadata |
1.1.5.5 Data Types
Object Types, Media Types, and Coordinate Types are required.
1.1.5.6 To Respondents
Respondents are requested to comment on Functional Requirements and Object Type ID, Media Type ID, and Coordinate Type ID.
1.1.6 Audio Object
1.1.6.1 Definition
An Object with Audio perceptibility attributes.
1.1.6.2 Functional Requirements
An Audio Object supports:
- The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
- The ID of the Audio Object.
- The ID(s) of Parent Object(s) supporting two cases:
- The Parent Object has spawned two (or more) Objects.
- Two (or more) Parent Objects have merged into one.
- The Audio Object-specific Data:
- The ID of Audio Object Data Format.
- The length in Bytes of the Audio Object.
- The URI of the Data of the Audio Object.
- The Audio Object Space-Time Attributes.
- The Audio Object Attributes:
- The ID of Audio Object Attributes’ Data Formats.
- The length in Bytes of the Audio Object Attributes.
- The URI of the Data of the Audio Object Attributes.
MPAI has specified Audio Object in Technical Specification: Context-based Audio Enhancement (MPAI-CAE) – Use Cases (CAE-USC) V2.2. It is reported here for the reader’s convenience.
1.1.6.3 Syntax
https://schemas.mpai.community/CAE1/V2.2/data/AudioAttributeID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
https://schemas.mpai.community/CAE1/V2.2/data/AudioAttributeID.json
https://schemas.mpai.community/CAE1/V2.2/data/AudioAttributeFormatID.json
1.1.6.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard-AudioObject | 9 Bytes | The characters “CAE-AOB-V” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 characters |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
AudioObjectID | N5 Bytes | Identifier of the Audio Object. |
ParentAudioObjects | N6 Bytes | Identifier(s) of Parent Audio Objects. |
AudioObjectData | N7 Bytes | Data associated to Audio Object. |
· AudioObjectFormatID | N8 Bytes | Audio Object Format Identifier |
· AudioObjectLength | N9 Bytes | Number of Bytes in Audio Object |
· AudioObjectDataURI | N10 Bytes | URI of Data of Audio Object |
AudioObjectSpaceTimeAttributes | N11 Bytes | |
AudioObjectAttributes[] | N14 Bytes | |
· AudioObjectAttributeID | N15 Bytes | ID of Attribute of Audio Object |
· AudioObjectAttributeFormatID | N16 Bytes | ID of Attribute Format of Audio Object |
· AudioObjectAttributeLength | N17 Bytes | Number of Bytes in Audio Object |
· AudioObjectAttributeDataURI | N18 Bytes | URI of Data of Audio Object |
DescrMetadata | N18 Bytes | Descriptive Metadata |
1.1.6.5 Data Formats
Audio Object requires the specification of
- Audio Formats
- Audio Attributes
- Audio Attribute Formats.
1.1.6.6 To Respondents
MPAI advises Respondents that that the Audio Object Functional Requirements have been developed considering the needs of the various application domains of its Technical Specifications. The current draft specification supports them all. An application not needing some functionalities is allowed to drop them.
- Comments on Functional Requirements and their support by the JSON Syntax.
- Comments on existing Audio Formats, Attributes, and Attribute Formats and Proposal for new entries.
1.1.7 Visual Object
1.1.7.1 Definition
An Object with Visual perceptibility attributes.
1.1.7.2 Functional Requirements
A Visual Object supports:
- The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
- The ID of the Visual Object.
- The ID(s) of Parent Object(s) supporting two cases:
- The Parent Object has spawned two (or more) Objects.
- Two (or more) Parent Objects have merged into one.
- The Visual Object-specific Data:
- The ID of Visual Object Data Format.
- The length in Bytes of the Visual Object.
- The URI of the Data of the Visual Object.
- The Visual Object Space-Time Attributes.
- The Visual Object Attributes:
- The ID of Visual Object Attributes’ Data Formats.
- The length in Bytes of the Visual Object Attributes.
- The URI of the Data of the Visual Object Attributes.
1.1.7.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/VisualObject.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
https://schemas.mpai.community/PAF/V1.1/data/VisualAttributeID.json
https://schemas.mpai.community/PAF/V1.1/data/VisualAttributeFormatID.json
1.1.7.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard-Visual Object | 9 Bytes | The characters “CAE-VOB-V” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 characters |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
VisualObjectID | N5 Bytes | Identifier of the Visual Object. |
ParentVisualObjects | N6 Bytes | Identifier(s) of Parent Visual Objects. |
VisualObjectData | N7 Bytes | Data associated to Visual Object. |
· VisualObjectFormatID | N8 Bytes | Visual Object Format Identifier |
· VisualObjectLength | N9 Bytes | Number of Bytes in Visual Object |
· VisualObjectDataURI | N10 Bytes | URI of Data of Visual Object |
VisualObjectSpaceTimeAttributes | N11 Bytes | Space-Time of Visual Object |
VisualObjectAttributes[] | N14 Bytes | Attributes of Visual Object |
· VisualObjectAttributeID | N15 Bytes | ID of Attribute of Visual Object |
· VisualObjectAttributeFormatID | N16 Bytes | ID of Attribute Format of Visual Object |
· VisualObjectAttributeLength | N17 Bytes | Number of Bytes in Visual Object |
· VisualObjectAttributeDataURI | N18 Bytes | URI of Data of Visual Object |
DescrMetadata | N19 Bytes | Descriptive Metadata |
1.1.7.5 Data Formats
Visual Object requires the specification of
- Visual Formats
- Visual Attributes
- Visual Attribute Formats.
1.1.7.6 To Respondents
MPAI advises Respondents that that the Visual Object Functional Requirements have been developed considering the needs of the various application domains of its Technical Specifications. The current draft specification supports them all. An application not needing some functionalities is allowed to drop them.
MPAI requests:
- Comments on Functional Requirements and their support by the JSON Syntax.
- Comments on existing Visual Formats, Attributes, and Attribute Formats and Proposal for new entries.
1.1.8 Audio-Visual Object
1.1.8.1 Definition
A Data Type representing an object with both Audio and Visual perceptibility attributes.
1.1.8.2 Functional Requirements
Audio-Visual Object supports:
- The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
- The ID of the Audio Objects and their Space-Time Attributes in the Audio-Visual Scene possibly different from those of the Objects.
- The ID of the Visual Objects and their Space-Time Attributes in the Audio-Visual Scene possibly different from those of the Object.
- The Audio-Visual Object’s Space-Time Attributes.
1.1.8.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/AudioVisualObject.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.8.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard-Item | 9 Bytes | The characters “OSD-AVO-V” |
· Version | N2 Byte | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
AVObjectID | N5 Bytes | Identifier of AV Object. |
AudioObjectID | N6 Bytes | Identifier of Audio Object |
AudioObjectSpaceTimeAttributes | N7 Bytes | Space-Time Attributes of Audio Object |
VisualObjectID | N8 Bytes | Identifier of Visual Object |
VisualObjectSpaceTimeAttributes | N9 Bytes | Space-Time Attributes of Visual Object |
DescrMetadata | N10 Bytes | Descriptive Metadata |
1.1.8.5 Data Formats
Audio-Visual Object requires the specification of
- Audio-Visual Formats
- Audio-Visual Attributes
- Audio-Visual Attribute Formats.
1.1.8.6 To Respondents
MPAI requests:
- Comments on Functional Requirements and their support by the JSON Syntax.
- Comments on existing Audio-Visual Formats, Attributes, and Attribute Formats and Proposal for new entries.
1.1.9 Audio-Visual Basic Scene Descriptors
1.1.9.1 Definition
A Data Type including the Objects of an Audio-Visual Scene and their arrangement in the Scene.
1.1.9.2 Functional Requirements
Audio-Visual Basic Scene Descriptors include:
- The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
- The ID of the Audio-Visual Scene Descriptors.
- The number of Objects in the Scene.
- The Space-Time Attributes of the Scene Descriptors.
- The Audio Objects including their inherent ID, Spatial Attitude, and Scene-specific Attributes.
- The Visual Objects including their inherent ID, Spatial Attitude, and Scene-specific Attributes.
- The AV Objects including their inherent ID, Spatial Attitude, and Scene-specific Attributes.
1.1.9.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/AudioVisualBasicSceneDescriptors.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.9.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard-AVScene | 9 Bytes | The characters “OSD-AVS-” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 characters |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
AVBasicSceneDescriptorsID | N5 Bytes | Identifier of the AV Object. |
ObjectCount | N6 Bytes | Number of Objects in Scene |
AVBasicSceneDescriptorsSpaceTime | N7 Bytes | Data about Time |
AVSceneAudioObjects[] | N8 Bytes | Set of Audio Objects |
· AVSceneAudioObjectID | N9 Bytes | ID of Attribute of Audio Object |
o AVSceneAudioObjectSpaceTime | N10 Bytes | ID of Attribute Format of Audio Object |
· AVSceneAudioObjectAttributes[] | N11 Bytes | Length of Attribute Format Data of Audio Object |
o AVSceneAudioObjectAttributeID | N12 Bytes | URI of Attribute Format Data of Audio Scene Object |
o AVSceneAudioObjectAttributeFormatID | N13 Bytes | ID of Attribute Format of Audio Scene Object |
o AVSceneAudioObjectAttributeDataLength | N14 Bytes | Length of Attribute Format of Audio Object |
o AVSceneAudioObjectAttributeDataURI | N15 Bytes | URI of Attribute Format Data of Audio Object |
AVSceneVisualObjects[] | N16 Bytes | Set of Visual Objects |
· AVSceneVisualObjectID | N17 Bytes | ID of Attribute of Visual Object |
o AVSceneVisualObjectSpaceTime | N18 Bytes | ID of Attribute Format of Visual Object |
· AVSceneVisualObjectAttributes[] | N19 Bytes | Length of Attribute Format Data of Visual Object |
o AVSceneVisualObjectAttributeID | N20 Bytes | URI of Attribute Format Data of Visual Object |
o AVSceneVisualObjectAttributeFormatID | N21 Bytes | ID of Attribute Format of Visual Object |
o AVSceneVisualObjectAttributeDataLength | N22 Bytes | Length of Attribute Format of Visual Object |
o AVSceneVisualObjectAttributeDataURI | N23 Bytes | URI of Attribute Format Data of Visual Object |
AVSceneAVObjects[] | N24 Bytes | Set of AV Objects |
· AVSceneAVObjectID | N25 Bytes | ID of Attribute of AV Scene Object |
· AVSceneAVObjectSpaceTime | N26 Bytes | ID of Attribute Format of AV Object |
· AVSceneAVObjectAttributes[] | N27 Bytes | Length of Attribute Format Data of AV Object |
o AVSceneAVObjectAttributeID | N28 Bytes | URI of Attribute Format Data of AV Object |
o AVSceneAVObjectAttributeFormatID | N29 Bytes | ID of Attribute Format of AV Object |
o AVSceneAVObjectAttributeDataLength | N30 Bytes | Length of Attribute Format of AV Object |
o AVSceneAVObjectAttributeDataURI | N31 Bytes | URI of Attribute Format Data of AV Object |
DescrMetadata | N32 Bytes | Descriptive Metadata |
1.1.9.5 Data Formats
No new Data Formats
1.1.9.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the Functional Requirements.
- Propose extensions to or alternative formats For the Audio-Visual Basic Scene Descriptors.
1.1.10 Audio-Visual Scene Descriptors
1.1.10.1 Definition
A Data Type including the Objects of an Audio-Visual Scene, Audio-Visual Scenes, and their arrangement in the Scene.
1.1.10.2 Functional Requirements
Audio-Visual Scene Descriptors include Basic Scenes in addition to Objects.
1.1.10.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/AudioVisualSceneDescriptors.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
https://schemas.mpai.community/OSD/V1.1/data/AVBasicSceneDescriptors.json
https://schemas.mpai.community/OSD/V1.1/data/AVSceneDescriptors.json
1.1.10.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard-AVScene | 9 Bytes | The characters “OSD-AVS-” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 characters |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
AVBasicSceneDescriptorsID | N5 Bytes | Identifier of the AV Object. |
ObjectCount | N6 Bytes | Number of Objects in Scene |
AVBasicSceneDescriptorsSpaceTime | N7 Bytes | Data about Time |
AVSceneAudioObjects[] | N8 Bytes | Set of Audio Objects |
· AVSceneAudioObjectID | N9 Bytes | ID of Attribute of Audio Object |
o AVSceneAudioObjectSpaceTime | N10 Bytes | ID of Attribute Format of Audio Object |
· AVSceneAudioObjectAttributes[] | N11 Bytes | Length of Attribute Format Data of Audio Object |
o AVSceneAudioObjectAttributeID | N12 Bytes | URI of Attribute Format Data of Audio Scene Object |
o AVSceneAudioObjectAttributeFormatID | N13 Bytes | ID of Attribute Format of Audio Scene Object |
o AVSceneAudioObjectAttributeDataLength | N14 Bytes | Length of Attribute Format of Audio Object |
o AVSceneAudioObjectAttributeDataURI | N15 Bytes | URI of Attribute Format Data of Audio Object |
AVSceneVisualObjects[] | N16 Bytes | Set of Visual Objects |
· AVSceneVisualObjectID | N17 Bytes | ID of Attribute of Visual Object |
o AVSceneVisualObjectSpaceTime | N18 Bytes | ID of Attribute Format of Visual Object |
· AVSceneVisualObjectAttributes[] | N19 Bytes | Length of Attribute Format Data of Visual Object |
o AVSceneVisualObjectAttributeID | N20 Bytes | URI of Attribute Format Data of Visual Object |
o AVSceneVisualObjectAttributeFormatID | N21 Bytes | ID of Attribute Format of Visual Object |
o AVSceneVisualObjectAttributeDataLength | N22 Bytes | Length of Attribute Format of Visual Object |
o AVSceneVisualObjectAttributeDataURI | N23 Bytes | URI of Attribute Format Data of Visual Object |
AVSceneAVObjects[] | N24 Bytes | Set of AV Objects |
· AVSceneAVObjectID | N25 Bytes | ID of Attribute of AV Scene Object |
· AVSceneAVObjectSpaceTime | N26 Bytes | ID of Attribute Format of AV Object |
· AVSceneAVObjectAttributes[] | N27 Bytes | Length of Attribute Format Data of AV Object |
o AVSceneAVObjectAttributeID | N28 Bytes | URI of Attribute Format Data of AV Object |
o AVSceneAVObjectAttributeFormatID | N29 Bytes | ID of Attribute Format of AV Object |
o AVSceneAVObjectAttributeDataLength | N30 Bytes | Length of Attribute Format of AV Object |
o AVSceneAVObjectAttributeDataURI | N31 Bytes | URI of Attribute Format Data of AV Object |
DescrMetadata | N32 Bytes | Descriptive Metadata |
1.1.10.5 Data Formats
No new Data Formats
1.1.10.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the Functional Requirements.
- Propose extensions to or alternative formats for the Audio-Visual Scene Descriptors.
1.1.11 Audio Scene Descriptors
The specification of Audio (Basic) Scene Descriptors is derived from that of the Audio-Visual (Basic) Scene Descriptors by removing Visual Objects and Audio-Visual Objects. The Headers are, respectively:
Audio Basic Scene Descriptors
“Header”: {
“type”: “string”,
“pattern”: “^CAE-ABS-V[0-9]{1,2}[.][0-9]{1,2}$”
},
Audio Scene Descriptors
“Header”: {
“type”: “string”,
“pattern”: “^CAE-ASD-V[0-9]{1,2}[.][0-9]{1,2}$”
},
1.1.12 Visual Scene Descriptors
The specification of Visual (Basic) Scene Descriptors is derived from that of the Audio-Visual (Basic) Scene Descriptors by removing Visual Objects and Audio-Visual Objects. The Headers are, respectively:
Visual Basic Scene Descriptors
“Header”: {
“type”: “string”,
“pattern”: “^OSD-VBS-V[0-9]{1,2}[.][0-9]{1,2}$”
},
Visual Scene Descriptors
“Header”: {
“type”: “string”,
“pattern”: “^OSD-VSD-V[0-9]{1,2}[.][0-9]{1,2}$”
},
1.1.13 Body Descriptors
1.1.13.1 Definition
Body Descriptors digitally represent the body of:
- A human having a multimodal conversation with the HCI when it estimates the human’s Personal Status.
- An Avatar embodying the HCI in the Portable Avatar generated by the Personal Status Display AIM rendered by the Audio-Visual Scene Rendering
1.1.13.2 Functional Requirements
MPAI has specified HAnim as the format of Body Descriptors.
1.1.13.3 Syntax
The JSON Syntax of Hanim is being developed.
1.1.13.4 Semantics
The Semantics of HAnim is being developed.
1.1.13.5 Data Formats
Body Descriptors.
1.1.13.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Body Descriptors.
- Propose extensions to the identified technologies.
- Propose new technologies.
1.1.14 Face Descriptors
1.1.14.1 Definition
The Face Descriptors are the Descriptors of the face of:
- A human having a multimodal conversation with the HCI when it estimates the human’s Personal Status.
- The Avatar embodying the HCI in the Portable Avatar generated by the Personal Status Display AIM that is rendered by the Audio-Visual Scene Rendering
- A human for the purpose of finding their Instance ID.
1.1.14.2 Functional Requirements
MPAI has specified the format of Face Descriptors based on with the Actions Units of the Facial Action Coding System (FACS) originally developed by Carl-Herman Hjortsjö, adopted by Paul Ekman and Wallace V. Friesen (1978) and updated by Ekman, Friesen, and Joseph C. Hager (2002).
1.1.14.3 Syntax
https://schemas.mpai.community/PAF/V1.1/PortableAvatarFormat.json
1.1.14.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “PAF-FAD-V” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 characters |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
FaceDescriptorsID | N5 Bytes | Identifier of the AV Object. |
FaceDescriptors | N6 Bytes | |
Action Units | Description | Facial muscle generating the Action |
1 | Inner Brow Raiser | Frontalis, pars medialis |
2 | Outer Brow Raiser | Frontalis, pars lateralis |
4 | Brow Lowerer | Corrugator supercilii, Depressor supercilii |
5 | Upper Lid Raiser | Levator palpebrae superioris |
6 | Cheek Raiser | Orbicularis oculi, pars orbitalis |
7 | Lid Tightener | Orbicularis oculi, pars palpebralis |
9 | Nose Wrinkler | Levator labii superioris alaquae nasi |
10 | Upper Lip Raiser | Levator labii superioris |
11 | Nasolabial Deepener | Zygomaticus minor |
12 | Lip Corner Puller | Zygomaticus major |
13 | Cheek Puffer | Levator anguli oris (a.k.a. Caninus) |
14 | Dimpler | Buccinator |
15 | Lip Corner Depressor | Depressor anguli oris (a.k.a. Triangularis) |
16 | Lower Lip Depressor | Depressor labii inferioris |
17 | Chin Raiser | Mentalis |
18 | Lip Puckerer | Incisivii labii superioris and Incisivii labii inferioris |
20 | Lip stretcher | Risorius with platysma |
22 | Lip Funneler | Orbicularis oris |
23 | Lip Tightener | Orbicularis oris |
24 | Lip Pressor | Orbicularis oris |
25 | Lips part | Depressor labii inferioris or relaxation of Mentalis, or Orbicularis oris |
26 | Jaw Drop | Masseter, relaxed Temporalis and internal Pterygoid |
27 | Mouth Stretch | Pterygoids, Digastric |
28 | Lip Suck | Orbicularis oris |
41 | Lid droop | Relaxation of Levator palpebrae superioris |
42 | Slit | Orbicularis oculi |
43 | Eyes Closed | Relaxation of Levator palpebrae superioris; Orbicularis oculi, pars palpebralis |
44 | Squint | Orbicularis oculi, pars palpebralis |
45 | Blink | Relaxation of Levator palpebrae superioris; Orbicularis oculi, pars palpebralis |
46 | Wink | Relaxation of Levator palpebrae superioris; Orbicularis oculi, pars palpebralis |
61 | Eyes turn left | Lateral rectus, medial rectus |
62 | Eyes turn right | Lateral rectus, medial rectus |
63 | Eyes up | Superior rectus, Inferior oblique |
64 | Eyes down | Inferior rectus, Superior oblique |
1.1.14.5 Data Formats
Action Units are a Data Format.
1.1.14.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Face Descriptors.
- Propose extensions to the identified technologies or new ones.
1.1.15 Face ID
1.1.15.1 Definition
The Identifier of the human inferred from a Face Object captured from a target human. The scope of the ID could cover the members of an authorised group, such as the members of a family, specific employees of a company or the customers of a car renting company.
1.1.15.2 Functional Requirements
MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified. Face is treated in the same way as any other object that is identified as a member of a class of objects.
1.1.15.3 Syntax
No Syntax provided as Instance ID is sufficient for the identified needs.
1.1.15.4 Semantics
No Semantics provided.
1.1.15.5 Data Formats
Instance ID is one Data Format.
1.1.15.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Instance ID format for Face Identifier.
- Propose extensions to the identified technologies or new ones.
1.1.16 Speaker ID
1.1.16.1 Definition
The Identifier of the human inferred from their utterances. The Speaker ID may be derived by analysing speech segments of the speaker under consideration. The scope of the ID may cover the members of an authorised group, such as the members of a family, specific employees of a company, or the customers of a car renting company.
1.1.16.2 Functional Requirements
MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified. Speech is treated in the same way as any other object that is identified as a member of a class of objects.
1.1.16.3 Syntax
No Syntax provided as Instance ID is sufficient for the identified needs.
1.1.16.4 Semantics
No Semantics provided.
1.1.16.5 Data Formats
Instance ID is one Data Format.
1.1.16.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Instance ID format for Speaker Identifier.
- Propose extensions to the identified technologies or new ones.
1.1.17 Audio Object ID
1.1.17.1 Definition
The Identifier uniquely associated with a particular class of audio objects, e.g., a voice, a music, a coded audio sound (e.g., a syren), a natural sound, etc.
1.1.17.2 Functional Requirements
MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified.
1.1.17.3 Syntax
No Syntax provided as Instance ID is sufficient for the identified needs.
1.1.17.4 Semantics
No Semantics provided.
1.1.17.5 Data Formats
Instance ID is one Data Format.
1.1.17.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Instance ID format.
- Propose extensions to the identified technologies or new ones.
1.1.18 Visual Object ID
1.1.18.1 Definition
The Identifier uniquely associated with a particular class of visual objects, e.g., human, hammer, screwdriver, etc.
1.1.18.2 Functional Requirements
MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified.
1.1.18.3 Syntax
No Syntax provided as Instance ID is sufficient for the identified needs.
1.1.18.4 Semantics
No Semantics provided.
1.1.18.5 Data Formats
Instance ID is one Data Format.
1.1.18.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Instance ID format.
- Propose extensions to the identified technologies or new ones.
1.1.19 Meaning
1.1.19.1 Definition
The digital representation of an input text as syntactic and semantic information.
1.1.19.2 Functional Requirements
Meaning is used to extract information from text to help the Entity Dialogue Processing AIM to produce a response.
MPAI has specified Meaning (aka Text Descriptors).
1.1.19.3 Syntax
https://schemas.mpai.community/MMC/V2.2/TextDescriptors.json
1.1.19.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “MMC-TXD-V” |
· Version | N2 Bytes | Major version – 1 or 2 characters |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 characters |
MInstanceID | N4 Bytes | Identifier of M-Instance. |
TextDescriptors | N5 Bytes | Identifier of the AV Object. |
· POS_tagging | N6 Bytes | Results of POS (Part of Speech, e.g., noun, verb, etc.) tagging including information on the question’s POS tagging set and tagged results. |
· NE_tagging | N7 Bytes | Results of NE (Named Entity e.g., Person, Organisation, Fruit, etc.) tagging results including information on the question’s tagging set and tagged results. |
· Dependency_tagging | N8 Bytes | Results of dependency (structure of the sentence, e.g., subject, object, head of relation, etc.) tagging including information on the question’s dependency tagging set and tagged results. |
· SRL_tagging | N9 Bytes | Results of SRL (Semantic Role Labelling) tagging results including information on the question’s SRL tagging set and tagged results. SRL indicates the semantic structure of the sentence such as agent, location, patient role, etc. |
DesrMetadata | N10 Bytes | Descriptive Metadata |
1.1.19.5 Data Formats
Text Descriptors as Meaning specified above are a format for a specific type of Descriptors.
1.1.19.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI specification of Text Descriptors as Meaning.
- Propose extensions to it or a new specification.
1.1.20 Personal Status
1.1.20.1 Definition
Personal Status (PS) indicates a set of three Factors (Cognitive State, Emotion, Social Attitude) conveyed by one or more Modalities (Text, Speech, Face, and Gesture Modalities).
1.1.20.2 Functional Requirements
Personal Status (PS) is used to assign a label, according to a given classification, to the internal state of an Entity – human of Machine.
MPAI has developed the full specification of Personal Status.
1.1.20.3 Syntax
https://schemas.mpai.community/MMC/V2.2/data/PersonalStatus.json
https://schemas.mpai.community/OSD/V1.1/data/Time.json
1.1.20.4 Semantics
1.1.20.5 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI Personal Status format.
- Propose extensions of the labels or new sets of labels for the Factors, or new technologies.
1.1.21 Avatar Model
1.1.21.1 Definition
The Model of an Avatar selected as the human representation of the HCI.
1.1.21.2 Functional Requirements
Avatar Model is an element of MPAI’s Portable Avatar. This can be:
- Generated by the Personal Status Display
- Rendered by the Audio-Visual Scene Rendering
1.1.21.3 Syntax
No specific syntax.
1.1.21.4 Semantics
No specific semantics.
1.1.21.5 Data Formats
An Avatar Format can be expressed as a glTF data file.
1.1.21.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the MPAI-specified technologies identified in this subsection.
- Propose extensions to the identified technologies or new ones.
1.1.22 Speech Model
1.1.22.1 Definition
A Data Type able to generate speech according to a specified set of features.
1.1.22.2 Functional Requirements
The generated speech can be perceived as:
- Having been generated by a specific human.
- Have a specific language or dialect intonation.
- Not have any specific connotation.
It can be implemented as a Neural Network trained to generate utterances with specific Speech Descriptors.
1.1.22.3 Syntax
No syntax provided.
1.1.22.4 Semantics
No Semantics provided.
1.1.22.5 Data Formats
No entry here.
1.1.22.6 To Respondents
Respondents are requested to:
- Comment on the characterisation of Speech Model.
- Propose Speech Model Formats.
1.1.23 Output Audio
1.1.23.1 Definition
Output Audio is Audio information produced by a digital device such as the Audio-Visual Rendering AIM.
1.1.23.2 Functional Requirements
Output Audio can be:
- Single channel
- Multiple channels
- …
1.1.23.3 Syntax
No Syntax provided.
1.1.23.4 Semantics
No Semantics provided.
1.1.23.5 Data Formats
Data Formats and Attributes are required.
1.1.23.6 To Respondents
Proposals of Formats and Attributes are requested.
1.1.24 Output Visual
1.1.24.1 Definition
Output Visual is Visual information produced by a digital device such as the Audio-Visual Rendering AIM.
1.1.24.2 Functional Requirements
No Functional Requirements provided.
1.1.24.3 Syntax
No Syntax provided.
1.1.24.4 Semantics
No Semantics provided.
1.1.24.5 Data Formats
Data Formats and Attributes are required.
1.1.24.6 To Respondents
Proposals of Formats and Attributes are requested.
1.1.25 HCI-AMS Messages
1.1.25.1 Definition
The HCI-AMS Messages request that the Motion Actuation Subsystem (MAS) execute specified actions.
1.1.25.2 Functional Requirements
HCIs send Messages to the AMS based on messages from humans or a Remote-HCI to:
- Request possible Routes connecting the current place and the destination that may include:
- Desired arrival time.
- Stops in between place and destination.
- Request to
- Execute a Route.
- Suspend a Route.
- Resume a Route.
- Change a Route.
- Request to see/hear an M-Location corresponding to a U-Location from a Point of View.
1.1.25.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
https://schemas.mpai.community/OSD/V1.1/data/SpatialAttitude.json”
https://schemas.mpai.community/CAV2/V1.0/data/RouteCommand.json”
1.1.25.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-HAM-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes |
HCIAMSMessageID | N4 Bytes | Identifier of HCI-AMS Message. |
HCIAMSMessageData | N4 Bytes | Data in HCI-AMS Message. |
· DestinationAndArrivalTime | N5 Bytes | Route Endpoint and arrival time. |
· IntermediateStops | N6 Bytes | Stops between origin and destination |
· SelectedRoute | N7 Bytes | ID of Route |
· RouteCommands | N8 Bytes | “Execute”, “Suspend”, “Resume”, “Change” |
DescrMetadata | N9 Bytes | Descriptive Metadata |
1.1.25.5 Data Formats
No Format required.
1.1.25.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the Functional Requirements of the HCI-AMS Messages identified above.
- Extend the list of Functional Requirements.
1.1.26 Ego-Remote HCI Messages
1.1.26.1 Definition
Information exchanged between the HCI of the Ego CAV and a peer HCI of a Remote CAV.
1.1.26.2 Functional Requirements
- Messages from Remote to Ego HCI have the same payload.
- The Ego HCI may:
- Send messages to a Remote HCI.
- Request a Remote HCI or a CAV-Aware entity (e.g., Roadside Unit, a Store and Forward entity etc.) requesting a particular M-Location.
- Select the appropriate Level of Detail for transmission of requested M-Location Data.
1.1.26.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/AudioVisualSceneDescriptors.json
1.1.26.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “MMM-ERH-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes |
EgoToRemoteHCIMessageID | N4 Bytes | Identifier of EgoRemoteHCIMessage. |
EgoToRemoteHCIMessageData | N5 Bytes | Data of Ego-Remote-HCI Message. |
· MLocationRequest | N6 Bytes | |
o ULocationID | N7 Bytes | M-Location of intended U-Location. |
o MLocationFormatID | N8 Bytes | Id of M-Location Format |
· FullEvironmentRepresentation | N9 Bytes | AV Scene Descriptors with semantics. |
· GenericMessage | N10 Bytes | |
DescrMetadata | N11 Bytes | Descriptive Metadata. |
1.1.26.5 Data Formats
MPAI has defined U-Location as a spatial volume in the Universe.
The payload of a Generic Message may have other formats.
1.1.26.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the functional requirements identified E-HCI to R-HCI message above.
- Propose Data Formats of U-Location and Generic Message.