5.5.1    Personal Preferences.

5.5.2    Personal Profile.

5.5.3    Input Text

5.5.4    Spatial Attitude.

5.5.5    Point of View.

5.5.6    Audio Object

5.5.7    Visual Object

5.5.8    Audio-Visual Object

5.5.9    Audio-Visual Basic Scene Descriptors.

5.5.10  Audio-Visual Scene Descriptors.

5.5.11  Audio Scene Descriptors.

5.5.12  Visual Scene Descriptors.

5.5.13  Body Descriptors.

5.5.14  Face Descriptors.

5.5.15  Face ID..

5.5.16  Speaker ID..

5.5.17  Audio Object ID..

5.5.18  Visual Object ID.

5.5.19  Meaning.

5.5.20  Personal Status.

5.5.21  Avatar Model

5.5.22  Speech Model

5.5.23  Output Audio.

5.5.24  Output Visual

5.5.25  HCI-AMS Messages.

5.5.26  Ego-Remote HCI Messages

 

MPAI has already issued a Technical Specification of a subset of CAV-HCI Data Types in [11]. This section will heavily draw on that specification.

Data obtained from Audio, Visual, and LiDAR sensors are used by HCI. However, their specification is delegated to the Environment Sensing Subsystem.

1.1.1        Personal Preferences

1.1.1.1       Definition

Personal Preferences includes passenger-specific preferences that enable an HCI to have access to information that facilitates human-HCI interaction. This is particularly useful when the passenger uses a rented CAV.

1.1.1.2       Functional Requirements

The data in the Personal Preferences should include:

  1. Language
  2. Seat position.
  3. Mirror position.
  4. Display characteristics.
  5. Preferred driving style.
  6. Preferential routes.
  7. Preferred information sources.
  8. Preferred entertainment sources

1.1.1.3       Syntax

https://schemas.mpai.community/CAV2/V1.0/data/PersonalPreferences.json

1.1.1.4       Semantics

Label Size Description
Header N1 Bytes
·         Standard 9 Bytes The characters “CAV-PPR-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Byte Minor version – 1 or 2 characters
humanID N4 Bytes ID of the human the Personal Profile refers to.
PersonalPreferenceID N5 Bytes ID of Personal Profile.
PersonalPreferences N6 Bytes Set of Personal Preferences.
·         Language N7 Bytes Preferred Language.
·         Seat position. N8 Bytes Preferred seat position.
·         Mirror position. N9 Bytes Preferred mirror position
·         Display characteristics. N10 Bytes Preferred display characteristics
·         Preferred driving style. N11 Bytes Preferred driving style
·         Preferential routes N12 Bytes Preferred routes.
DescrMetadata N13 Bytes Descriptive Metadata

1.1.1.5       Data Formats

The following Data Formats are identified:

  • Language Preference.
  • Seat position.
  • Mirror position.
  • Display characteristics.
  • Preferred driving style.
  • Preferred routes.

1.1.1.6       To Respondents

Respondents are invited to:

  1. Comment and elaborate on or extend the functional requirements of the Personal Preferences identified above.
  2. Propose representation formats of the Personal Preferences.
  3. Propose new Personal Preferences and their formats.

1.1.2        Personal Profile

This specification is shared with the planned Technical Specification: MPAI-Metaverse Model (MPAI-MMM) – Technologies (MMM-TEC) V1.0.

1.1.2.1       Definition

Data identifying a human.

1.1.2.2       Functional Requirements

Personal Profile includes humanID and First Name, Last Name, Age, Nationality, and Email of the human.

1.1.2.3       Syntax

https://schemas.mpai.community/CAV2/V1.0/data/PersonalProfile.json

1.1.2.4       Semantics

Label Size Description
Header N1 Bytes
·         Standard 9 Bytes The characters “CAV-PPR-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Byte Minor version – 1 or 2 characters
humanID N5 Bytes ID of the human the Personal Profile refers to.
PersonalProfileID N6 Bytes ID of Personal Profile.
PersonalProfile N7 Bytes The number of Bytes composing the Personal Profile.
·         First Name N8 Bytes The human’s given name
·         Last Name N9 Bytes The human’s family name
·         Age N10 Bytes The human’s age
·         Nationality N11 Bytes The human’s country
·         Email N12 Bytes The human’s address
DescrMetadata N13 Bytes Descriptive Metadata

1.1.2.5       Data Formats

No Data Formats involved, save for Text for which MPAI has selected ISO/IEC 10646.

1.1.2.6       To Respondents

MPAI requests comments on the Functional Requirements of Personal Profile.

1.1.3        Input Text

1.1.3.1       Definition

Textual information represented using a Character Set.

1.1.3.2       Functional Requirements

The Character Set should be able to represent the characters for most of the currently used languages.

MPAI has selected the Text Format [15].

1.1.3.3       Data Formats

No new Data Formats required.

1.1.3.4       To Respondents

Respondents are invited to comment or elaborate on the MPAI selection of Text.

1.1.4        Spatial Attitude

1.1.4.1       Definition

A Data Type representing an Object’s Position, Orientation and their velocity and acceleration.

1.1.4.2       Functional Requirements

  • The Position of an Object is that of a representative point in the Object.
  • Cartesian and Polar Coordinate Systems are supported.
  • The following media used types are supported: Audio; Visual; Audio-Visual; Haptic; Smell; RADAR; LiDAR; Ultrasound.
  • Error is measured as a percentage of the value of each of CartPosition, SpherPosition, Orientation, CartVelocity, SpherVelocity, OrientVelocity, CartAccel, SpherAccel, OrientAccel.
  • Error is assumed to be the same for the three components of each value set.

1.1.4.3       Syntax

https://schemas.mpai.community/OSD/V1.1/data/SpatialAttitude.json

https://schemas.mpai.community/OSD/V1.1/data/CoordinateTypeID.json

https://schemas.mpai.community/OSD/V1.1/data/ObjectTypeID.json

https://schemas.mpai.community/OSD/V1.1/data/MediaTypeID.json”

 

1.1.4.4       Semantics

Table 7 provides the semantics of the components of the Spatial Attitude. The following should be noted:

  1. Each of Position, Velocity, and Acceleration is provided either in Cartesian (X,Y,Z) or Spherical (r,φ,θ) Coordinates.
  2. The Euler angles are indicated by (α,β,γ).

 

Table 7 – Components of the Spatial Attitude

 

Header N1 Bytes
·         Standard-Spatial Attitude 9 Bytes The characters “OSD-OSA-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
ObjectSpatialAttitudeID N4 Bytes  Identifier of Object Spatial Attitude.
General    
·         CoordType bit 0 0: Cartesian, 1: Spherical
·         ObjectTypeID bit 1-2 00: Digital Human

01: Generic

10 and 11: reserved

·         MediaTypeID bit 3-5 000: Audio; 001: Visual; 010: Audio-Visual; 011: Haptic; 100: Smell; 101: RADAR; 110 LiDAR; 111: Ultrasound.
·         Precision bit 6 0: single precision; 1: double precision
·         Reserved bit 7 reserved
·         SpatialAttitudeMask 2 Bytes 3*3 matrix of booleans (by rows)

Position Velocity Acceleration
Cartesian
Spherical
Orientat.
Position and Orientation
·         CartPosition (X,Y,Z) 12/24 Bytes Array (in metres)
·         SpherPosition (r,φ,θ) 12/24 Bytes Array (in metres and degrees)
·         Orient (α,β,γ) 12/24 Bytes Array (in degrees)
Velocity of Position and Orientation
·         CartVelocity (X,Y,Z) 12/24 Bytes Array (in metres)
·         SpherVelocity (r,φ,θ) 12/24 Bytes Array (in metres and degrees)
·         OrientVelocity (α,β,γ) 12/24 Bytes Array (in degrees)
Acceleration of Position and Orientation
·         CartAccel (X,Y,Z) 12/24 Bytes Array (in metres)
·         SpherAccel (r,φ,θ) 12/24 Bytes Array (in metres and degrees)
·         OrientAccel (α,β,γ) 12/24 Bytes Array (in degrees)
Errors
·         ErrCartPosition N5 Bytes Err/CartPosition*100
·         ErrSpherPosition N6 Bytes Err/SpherPosition*100
·         ErrOrientation N7 Bytes Err/Orientation*100
·         ErrCartVelocity N8 Bytes Err/CartVelocity*100
·         ErrSpherVelocity N9 Bytes Err/SpherVelocity*100
·         ErrOrientVelocity N10 Bytes Err/OrientVelocity*100
·         ErrCartAccel N11 Bytes Err/CartAccel*100
·         ErrSpherAccel N12 Bytes Err/SpherAccel*100
·         ErrOrientAccel N13 Bytes Err/OrientAccel*100

1.1.4.5       Data Types

Object Types, Media Types, and Coordinate Types are required.

1.1.4.6       To Respondents

Respondents are requested to comment on Functional Requirements and Object Type ID, Media Type ID, and Coordinate Type ID.

1.1.5        Point of View

1.1.5.1       Definition

Position and Orientation of an Object, e.g., a Virtual Human in a Virtual Environment.

1.1.5.2       Functional Requirements

The static subset of Spatial Attitude.

1.1.5.3       Syntax

https://schemas.mpai.community/OSD/V1.1/data/SpatialAttitude.json

https://schemas.mpai.community/OSD/V1.1/data/CoordinateTypeID.json

https://schemas.mpai.community/OSD/V1.1/data/ObjectTypeID.json

https://schemas.mpai.community/OSD/V1.1/data/MediaTypeID.json

1.1.5.4       Semantics

Table 7 provides the semantics of the components of Point of View. The following should be noted:

  1. Each of Position, Velocity, and Acceleration is provided either in Cartesian (X,Y,Z) or Spherical (r,φ,θ) Coordinates.
  2. The Euler angles are indicated by (α,β,γ).

 

Table 8 – Semantics of Point of View

 

Header N1 Bytes Point of View Data Type header
·         Standard-Point of View 9 Bytes The characters “OSD-OPV-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
PointOfViewID N4 Bytes  Identifier of Object Point of View.
General    
·         CoordType bit 0 0: Cartesian, 1: Spherical
·         ObjectType bit 1-2 00: Digital Human

01: Generic

10 and 11: reserved

·         MediaType bit 3-5 000: Audio; 001: Visual; 010: Audio-Visual; 011: Haptic; 100: Smell; 101: RADAR; 110 LiDAR; 111: Ultrasound.
·         Precision bit 6 0: single precision; 1: double precision
·         Reserved bit 7 reserved
·         PointOfViewMask 2 Bytes Vector of booleans

Position
Cartesian
Spherical
Orientation
Position and Orientation
·         CartPosition (X,Y,Z) 12/24 Bytes Array (in metres)
·         SpherPosition (r,φ,θ) 12/24 Bytes Array (in metres and degrees)
·         Orient (α,β,γ) 12/24 Bytes Array (in degrees)
Errors
·         ErrCartPosition N5 Bytes Err/CartPosition*100
·         ErrSpherPosition N6 Bytes Err/SpherPosition*100
·         ErrOrientation N7 Bytes Err/Orientation*100
DescrMetadata N8 Bytes Descriptive Metadata

1.1.5.5       Data Types

Object Types, Media Types, and Coordinate Types are required.

1.1.5.6       To Respondents

Respondents are requested to comment on Functional Requirements and Object Type ID, Media Type ID, and Coordinate Type ID.

1.1.6        Audio Object

1.1.6.1       Definition

An Object with Audio perceptibility attributes.

1.1.6.2       Functional Requirements

An Audio Object supports:

  1. The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
  2. The ID of the Audio Object.
  3. The ID(s) of Parent Object(s) supporting two cases:
    1. The Parent Object has spawned two (or more) Objects.
    2. Two (or more) Parent Objects have merged into one.
  4. The Audio Object-specific Data:
    1. The ID of Audio Object Data Format.
    2. The length in Bytes of the Audio Object.
    3. The URI of the Data of the Audio Object.
  5. The Audio Object Space-Time Attributes.
  6. The Audio Object Attributes:
    1. The ID of Audio Object Attributes’ Data Formats.
    2. The length in Bytes of the Audio Object Attributes.
    3. The URI of the Data of the Audio Object Attributes.

 

MPAI has specified Audio Object in Technical Specification: Context-based Audio Enhancement (MPAI-CAE) – Use Cases (CAE-USC) V2.2. It is reported here for the reader’s convenience.

1.1.6.3       Syntax

https://schemas.mpai.community/CAE1/V2.2/data/AudioAttributeID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

https://schemas.mpai.community/CAE1/V2.2/data/AudioAttributeID.json

https://schemas.mpai.community/CAE1/V2.2/data/AudioAttributeFormatID.json

1.1.6.4       Semantics

Label Size Description
Header N1 Bytes
·         Standard-AudioObject 9 Bytes The characters “CAE-AOB-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
AudioObjectID N5 Bytes Identifier of the Audio Object.
ParentAudioObjects N6 Bytes Identifier(s) of Parent Audio Objects.
AudioObjectData N7 Bytes Data associated to Audio Object.
·         AudioObjectFormatID N8 Bytes Audio Object Format Identifier
·         AudioObjectLength N9 Bytes Number of Bytes in Audio Object
·         AudioObjectDataURI N10 Bytes URI of Data of Audio Object
AudioObjectSpaceTimeAttributes N11 Bytes
AudioObjectAttributes[] N14 Bytes
·         AudioObjectAttributeID N15 Bytes ID of Attribute of Audio Object
·         AudioObjectAttributeFormatID N16 Bytes ID of Attribute Format of Audio Object
·         AudioObjectAttributeLength N17 Bytes Number of Bytes in Audio Object
·         AudioObjectAttributeDataURI N18 Bytes URI of Data of Audio Object
DescrMetadata N18 Bytes Descriptive Metadata

1.1.6.5       Data Formats

Audio Object requires the specification of

  1. Audio Formats
  2. Audio Attributes
  3. Audio Attribute Formats.

1.1.6.6       To Respondents

MPAI advises Respondents that that the Audio Object Functional Requirements have been developed considering the needs of the various application domains of its Technical Specifications. The current draft specification supports them all. An application not needing some functionalities is allowed to drop them.

MPAI requests:

  1. Comments on Functional Requirements and their support by the JSON Syntax.
  2. Comments on existing Audio Formats, Attributes, and Attribute Formats and Proposal for new entries.

1.1.7        Visual Object

1.1.7.1       Definition

An Object with Visual perceptibility attributes.

1.1.7.2       Functional Requirements

A Visual Object supports:

  1. The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
  2. The ID of the Visual Object.
  3. The ID(s) of Parent Object(s) supporting two cases:
    1. The Parent Object has spawned two (or more) Objects.
    2. Two (or more) Parent Objects have merged into one.
  4. The Visual Object-specific Data:
    1. The ID of Visual Object Data Format.
    2. The length in Bytes of the Visual Object.
    3. The URI of the Data of the Visual Object.
  5. The Visual Object Space-Time Attributes.
  6. The Visual Object Attributes:
    1. The ID of Visual Object Attributes’ Data Formats.
    2. The length in Bytes of the Visual Object Attributes.
    3. The URI of the Data of the Visual Object Attributes.

1.1.7.3       Syntax

https://schemas.mpai.community/OSD/V1.1/data/VisualObject.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

https://schemas.mpai.community/PAF/V1.1/data/VisualAttributeID.json

https://schemas.mpai.community/PAF/V1.1/data/VisualAttributeFormatID.json

1.1.7.4       Semantics

Label Size Description
Header N1 Bytes
·         Standard-Visual Object 9 Bytes The characters “CAE-VOB-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
VisualObjectID N5 Bytes Identifier of the Visual Object.
ParentVisualObjects N6 Bytes Identifier(s) of Parent Visual Objects.
VisualObjectData N7 Bytes Data associated to Visual Object.
·         VisualObjectFormatID N8 Bytes Visual Object Format Identifier
·         VisualObjectLength N9 Bytes Number of Bytes in Visual Object
·         VisualObjectDataURI N10 Bytes URI of Data of Visual Object
VisualObjectSpaceTimeAttributes N11 Bytes Space-Time of Visual Object
VisualObjectAttributes[] N14 Bytes Attributes of Visual Object
·         VisualObjectAttributeID N15 Bytes ID of Attribute of Visual Object
·         VisualObjectAttributeFormatID N16 Bytes ID of Attribute Format of Visual Object
·         VisualObjectAttributeLength N17 Bytes Number of Bytes in Visual Object
·         VisualObjectAttributeDataURI N18 Bytes URI of Data of Visual Object
DescrMetadata N19 Bytes Descriptive Metadata

1.1.7.5       Data Formats

Visual Object requires the specification of

  1. Visual Formats
  2. Visual Attributes
  3. Visual Attribute Formats.

1.1.7.6       To Respondents

MPAI advises Respondents that that the Visual Object Functional Requirements have been developed considering the needs of the various application domains of its Technical Specifications. The current draft specification supports them all. An application not needing some functionalities is allowed to drop them.

MPAI requests:

  1. Comments on Functional Requirements and their support by the JSON Syntax.
  2. Comments on existing Visual Formats, Attributes, and Attribute Formats and Proposal for new entries.

1.1.8        Audio-Visual Object

1.1.8.1       Definition

A Data Type representing an object with both Audio and Visual perceptibility attributes.

1.1.8.2       Functional Requirements

Audio-Visual Object supports:

  1. The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
  2. The ID of the Audio Objects and their Space-Time Attributes in the Audio-Visual Scene possibly different from those of the Objects.
  3. The ID of the Visual Objects and their Space-Time Attributes in the Audio-Visual Scene possibly different from those of the Object.
  4. The Audio-Visual Object’s Space-Time Attributes.

1.1.8.3       Syntax

https://schemas.mpai.community/OSD/V1.1/data/AudioVisualObject.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

 

1.1.8.4       Semantics

Label Size Description
Header N1 Bytes
·         Standard-Item 9 Bytes The characters “OSD-AVO-V”
·         Version N2 Byte Major version – 1 or 2 Bytes
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 Bytes
MInstanceID N4 Bytes Identifier of M-Instance.
AVObjectID N5 Bytes Identifier of AV Object.
AudioObjectID N6 Bytes Identifier of Audio Object
AudioObjectSpaceTimeAttributes N7 Bytes Space-Time Attributes of Audio Object
VisualObjectID N8 Bytes Identifier of Visual Object
VisualObjectSpaceTimeAttributes N9 Bytes Space-Time Attributes of Visual Object
DescrMetadata N10 Bytes Descriptive Metadata

1.1.8.5       Data Formats

Audio-Visual Object requires the specification of

  1. Audio-Visual Formats
  2. Audio-Visual Attributes
  3. Audio-Visual Attribute Formats.

1.1.8.6       To Respondents

MPAI requests:

  1. Comments on Functional Requirements and their support by the JSON Syntax.
  2. Comments on existing Audio-Visual Formats, Attributes, and Attribute Formats and Proposal for new entries.

1.1.9        Audio-Visual Basic Scene Descriptors

1.1.9.1       Definition

A Data Type including the Objects of an Audio-Visual Scene and their arrangement in the Scene.

1.1.9.2       Functional Requirements

Audio-Visual Basic Scene Descriptors include:

  1. The ID of a Virtual Space (M-Instance) where it is or is intended to be located.
  2. The ID of the Audio-Visual Scene Descriptors.
  3. The number of Objects in the Scene.
  4. The Space-Time Attributes of the Scene Descriptors.
  5. The Audio Objects including their inherent ID, Spatial Attitude, and Scene-specific Attributes.
  6. The Visual Objects including their inherent ID, Spatial Attitude, and Scene-specific Attributes.
  7. The AV Objects including their inherent ID, Spatial Attitude, and Scene-specific Attributes.

1.1.9.3       Syntax

https://schemas.mpai.community/OSD/V1.1/data/AudioVisualBasicSceneDescriptors.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

1.1.9.4       Semantics

Label Size Description
Header N1 Bytes
·         Standard-AVScene 9 Bytes The characters “OSD-AVS-”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
AVBasicSceneDescriptorsID N5 Bytes Identifier of the AV Object.
ObjectCount N6 Bytes Number of Objects in Scene
AVBasicSceneDescriptorsSpaceTime N7 Bytes Data about Time
AVSceneAudioObjects[] N8 Bytes Set of Audio Objects
·         AVSceneAudioObjectID N9 Bytes ID of Attribute of Audio Object
o  AVSceneAudioObjectSpaceTime N10 Bytes ID of Attribute Format of Audio Object
·         AVSceneAudioObjectAttributes[] N11 Bytes Length of Attribute Format Data of Audio Object
o  AVSceneAudioObjectAttributeID N12 Bytes URI of Attribute Format Data of Audio Scene Object
o  AVSceneAudioObjectAttributeFormatID N13 Bytes ID of Attribute Format of Audio Scene Object
o  AVSceneAudioObjectAttributeDataLength N14 Bytes Length of Attribute Format of Audio Object
o  AVSceneAudioObjectAttributeDataURI N15 Bytes URI of Attribute Format Data of Audio Object
AVSceneVisualObjects[] N16 Bytes Set of Visual Objects
·         AVSceneVisualObjectID N17 Bytes ID of Attribute of Visual Object
o  AVSceneVisualObjectSpaceTime N18 Bytes ID of Attribute Format of Visual Object
·         AVSceneVisualObjectAttributes[] N19 Bytes Length of Attribute Format Data of Visual Object
o  AVSceneVisualObjectAttributeID N20 Bytes URI of Attribute Format Data of Visual Object
o  AVSceneVisualObjectAttributeFormatID N21 Bytes ID of Attribute Format of Visual Object
o  AVSceneVisualObjectAttributeDataLength N22 Bytes Length of Attribute Format of Visual Object
o  AVSceneVisualObjectAttributeDataURI N23 Bytes URI of Attribute Format Data of Visual Object
AVSceneAVObjects[] N24 Bytes Set of AV Objects
·         AVSceneAVObjectID N25 Bytes ID of Attribute of AV Scene Object
·         AVSceneAVObjectSpaceTime N26 Bytes ID of Attribute Format of AV Object
·         AVSceneAVObjectAttributes[] N27 Bytes Length of Attribute Format Data of AV Object
o  AVSceneAVObjectAttributeID N28 Bytes URI of Attribute Format Data of AV Object
o  AVSceneAVObjectAttributeFormatID N29 Bytes ID of Attribute Format of AV Object
o  AVSceneAVObjectAttributeDataLength N30 Bytes Length of Attribute Format of AV Object
o  AVSceneAVObjectAttributeDataURI N31 Bytes URI of Attribute Format Data of AV Object
DescrMetadata N32 Bytes Descriptive Metadata

1.1.9.5       Data Formats

No new Data Formats

1.1.9.6       To Respondents

Respondents are invited to:

  1. Comment or elaborate on the Functional Requirements.
  2. Propose extensions to or alternative formats For the Audio-Visual Basic Scene Descriptors.

1.1.10    Audio-Visual Scene Descriptors

1.1.10.1   Definition

A Data Type including the Objects of an Audio-Visual Scene, Audio-Visual Scenes, and their arrangement in the Scene.

1.1.10.2   Functional Requirements

Audio-Visual Scene Descriptors include Basic Scenes in addition to Objects.

1.1.10.3   Syntax

https://schemas.mpai.community/OSD/V1.1/data/AudioVisualSceneDescriptors.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

https://schemas.mpai.community/OSD/V1.1/data/AVBasicSceneDescriptors.json

https://schemas.mpai.community/OSD/V1.1/data/AVSceneDescriptors.json

 

1.1.10.4   Semantics

Label Size Description
Header N1 Bytes
·         Standard-AVScene 9 Bytes The characters “OSD-AVS-”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
AVBasicSceneDescriptorsID N5 Bytes Identifier of the AV Object.
ObjectCount N6 Bytes Number of Objects in Scene
AVBasicSceneDescriptorsSpaceTime N7 Bytes Data about Time
AVSceneAudioObjects[] N8 Bytes Set of Audio Objects
·         AVSceneAudioObjectID N9 Bytes ID of Attribute of Audio Object
o  AVSceneAudioObjectSpaceTime N10 Bytes ID of Attribute Format of Audio Object
·         AVSceneAudioObjectAttributes[] N11 Bytes Length of Attribute Format Data of Audio Object
o  AVSceneAudioObjectAttributeID N12 Bytes URI of Attribute Format Data of Audio Scene Object
o  AVSceneAudioObjectAttributeFormatID N13 Bytes ID of Attribute Format of Audio Scene Object
o  AVSceneAudioObjectAttributeDataLength N14 Bytes Length of Attribute Format of Audio Object
o  AVSceneAudioObjectAttributeDataURI N15 Bytes URI of Attribute Format Data of Audio Object
AVSceneVisualObjects[] N16 Bytes Set of Visual Objects
·         AVSceneVisualObjectID N17 Bytes ID of Attribute of Visual Object
o  AVSceneVisualObjectSpaceTime N18 Bytes ID of Attribute Format of Visual Object
·         AVSceneVisualObjectAttributes[] N19 Bytes Length of Attribute Format Data of Visual Object
o  AVSceneVisualObjectAttributeID N20 Bytes URI of Attribute Format Data of Visual Object
o  AVSceneVisualObjectAttributeFormatID N21 Bytes ID of Attribute Format of Visual Object
o  AVSceneVisualObjectAttributeDataLength N22 Bytes Length of Attribute Format of Visual Object
o  AVSceneVisualObjectAttributeDataURI N23 Bytes URI of Attribute Format Data of Visual Object
AVSceneAVObjects[] N24 Bytes Set of AV Objects
·         AVSceneAVObjectID N25 Bytes ID of Attribute of AV Scene Object
·         AVSceneAVObjectSpaceTime N26 Bytes ID of Attribute Format of AV Object
·         AVSceneAVObjectAttributes[] N27 Bytes Length of Attribute Format Data of AV Object
o  AVSceneAVObjectAttributeID N28 Bytes URI of Attribute Format Data of AV Object
o  AVSceneAVObjectAttributeFormatID N29 Bytes ID of Attribute Format of AV Object
o  AVSceneAVObjectAttributeDataLength N30 Bytes Length of Attribute Format of AV Object
o  AVSceneAVObjectAttributeDataURI N31 Bytes URI of Attribute Format Data of AV Object
DescrMetadata N32 Bytes Descriptive Metadata

1.1.10.5   Data Formats

No new Data Formats

1.1.10.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the Functional Requirements.
  2. Propose extensions to or alternative formats for the Audio-Visual Scene Descriptors.

1.1.11    Audio Scene Descriptors

The specification of Audio (Basic) Scene Descriptors is derived from that of the Audio-Visual (Basic) Scene Descriptors by removing Visual Objects and Audio-Visual Objects. The Headers are, respectively:

Audio Basic Scene Descriptors

“Header”: {

“type”: “string”,

“pattern”: “^CAE-ABS-V[0-9]{1,2}[.][0-9]{1,2}$”

},

Audio Scene Descriptors

“Header”: {

“type”: “string”,

“pattern”: “^CAE-ASD-V[0-9]{1,2}[.][0-9]{1,2}$”

},

1.1.12    Visual Scene Descriptors

The specification of Visual (Basic) Scene Descriptors is derived from that of the Audio-Visual (Basic) Scene Descriptors by removing Visual Objects and Audio-Visual Objects. The Headers are, respectively:

Visual Basic Scene Descriptors

“Header”: {

“type”: “string”,

“pattern”: “^OSD-VBS-V[0-9]{1,2}[.][0-9]{1,2}$”

},

Visual Scene Descriptors

“Header”: {

“type”: “string”,

“pattern”: “^OSD-VSD-V[0-9]{1,2}[.][0-9]{1,2}$”

},

1.1.13    Body Descriptors

1.1.13.1   Definition

Body Descriptors digitally represent the body of:

  1. A human having a multimodal conversation with the HCI when it estimates the human’s Personal Status.
  2. An Avatar embodying the HCI in the Portable Avatar generated by the Personal Status Display AIM rendered by the Audio-Visual Scene Rendering

1.1.13.2   Functional Requirements

MPAI has specified HAnim as the format of Body Descriptors.

1.1.13.3   Syntax

The JSON Syntax of Hanim is being developed.

1.1.13.4   Semantics

The Semantics of HAnim is being developed.

1.1.13.5   Data Formats

Body Descriptors.

1.1.13.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Body Descriptors.
  2. Propose extensions to the identified technologies.
  3. Propose new technologies.

1.1.14    Face Descriptors

1.1.14.1   Definition

The Face Descriptors are the Descriptors of the face of:

  1. A human having a multimodal conversation with the HCI when it estimates the human’s Personal Status.
  2. The Avatar embodying the HCI in the Portable Avatar generated by the Personal Status Display AIM that is rendered by the Audio-Visual Scene Rendering
  3. A human for the purpose of finding their Instance ID.

1.1.14.2   Functional Requirements

MPAI has specified the format of Face Descriptors based on with the Actions Units of the Facial Action Coding System (FACS) originally developed by Carl-Herman Hjortsjö, adopted by Paul Ekman and Wallace V. Friesen (1978) and updated by Ekman, Friesen, and Joseph C. Hager (2002).

1.1.14.3   Syntax

https://schemas.mpai.community/PAF/V1.1/PortableAvatarFormat.json

1.1.14.4   Semantics

Label Size Description
Header N1 Bytes
·         Standard 9 Bytes The characters “PAF-FAD-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
FaceDescriptorsID N5 Bytes Identifier of the AV Object.
FaceDescriptors N6 Bytes
Action Units Description Facial muscle generating the Action
1 Inner Brow Raiser Frontalis, pars medialis
2 Outer Brow Raiser Frontalis, pars lateralis
4 Brow Lowerer Corrugator supercilii, Depressor supercilii
5 Upper Lid Raiser Levator palpebrae superioris
6 Cheek Raiser Orbicularis oculi, pars orbitalis
7 Lid Tightener Orbicularis oculi, pars palpebralis
9 Nose Wrinkler Levator labii superioris alaquae nasi
10 Upper Lip Raiser Levator labii superioris
11 Nasolabial Deepener Zygomaticus minor
12 Lip Corner Puller Zygomaticus major
13 Cheek Puffer Levator anguli oris (a.k.a. Caninus)
14 Dimpler Buccinator
15 Lip Corner Depressor Depressor anguli oris (a.k.a. Triangularis)
16 Lower Lip Depressor Depressor labii inferioris
17 Chin Raiser Mentalis
18 Lip Puckerer Incisivii labii superioris and Incisivii labii inferioris
20 Lip stretcher Risorius with platysma
22 Lip Funneler Orbicularis oris
23 Lip Tightener Orbicularis oris
24 Lip Pressor Orbicularis oris
25 Lips part Depressor labii inferioris or relaxation of Mentalis, or Orbicularis oris
26 Jaw Drop Masseter, relaxed Temporalis and internal Pterygoid
27 Mouth Stretch Pterygoids, Digastric
28 Lip Suck Orbicularis oris
41 Lid droop Relaxation of Levator palpebrae superioris
42 Slit Orbicularis oculi
43 Eyes Closed Relaxation of Levator palpebrae superioris; Orbicularis oculi, pars palpebralis
44 Squint Orbicularis oculi, pars palpebralis
45 Blink Relaxation of Levator palpebrae superioris; Orbicularis oculi, pars palpebralis
46 Wink Relaxation of Levator palpebrae superioris; Orbicularis oculi, pars palpebralis
61 Eyes turn left Lateral rectus, medial rectus
62 Eyes turn right Lateral rectus, medial rectus
63 Eyes up Superior rectus, Inferior oblique
64 Eyes down Inferior rectus, Superior oblique

1.1.14.5   Data Formats

Action Units are a Data Format.

1.1.14.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Face Descriptors.
  2. Propose extensions to the identified technologies or new ones.

1.1.15    Face ID

1.1.15.1   Definition

The Identifier of the human inferred from a Face Object captured from a target human. The scope of the ID could cover the members of an authorised group, such as the members of a family, specific employees of a company or the customers of a car renting company.

1.1.15.2   Functional Requirements

MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified. Face is treated in the same way as any other object that is identified as a member of a class of objects.

1.1.15.3   Syntax

No Syntax provided as Instance ID is sufficient for the identified needs.

1.1.15.4   Semantics

No Semantics provided.

1.1.15.5   Data Formats

Instance ID is one Data Format.

1.1.15.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Instance ID format for Face Identifier.
  2. Propose extensions to the identified technologies or new ones.

1.1.16    Speaker ID

1.1.16.1   Definition

The Identifier of the human inferred from their utterances. The Speaker ID may be derived by analysing speech segments of the speaker under consideration. The scope of the ID may cover the members of an authorised group, such as the members of a family, specific employees of a company, or the customers of a car renting company.

1.1.16.2   Functional Requirements

MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified. Speech is treated in the same way as any other object that is identified as a member of a class of objects.

1.1.16.3   Syntax

No Syntax provided as Instance ID is sufficient for the identified needs.

1.1.16.4   Semantics

No Semantics provided.

1.1.16.5   Data Formats

Instance ID is one Data Format.

1.1.16.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Instance ID format for Speaker Identifier.
  2. Propose extensions to the identified technologies or new ones.

1.1.17    Audio Object ID

1.1.17.1   Definition

The Identifier uniquely associated with a particular class of audio objects, e.g., a voice, a music, a coded audio sound (e.g., a syren), a natural sound, etc.

1.1.17.2   Functional Requirements

MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified.

1.1.17.3   Syntax

No Syntax provided as Instance ID is sufficient for the identified needs.

1.1.17.4   Semantics

No Semantics provided.

1.1.17.5   Data Formats

Instance ID is one Data Format.

1.1.17.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Instance ID format.
  2. Propose extensions to the identified technologies or new ones.

1.1.18    Visual Object ID

1.1.18.1   Definition

The Identifier uniquely associated with a particular class of visual objects, e.g., human, hammer, screwdriver, etc.

1.1.18.2   Functional Requirements

MPAI has specified the format of Instance ID that is agnostic of the nature of the Object to be Identified.

1.1.18.3   Syntax

No Syntax provided as Instance ID is sufficient for the identified needs.

1.1.18.4   Semantics

No Semantics provided.

1.1.18.5   Data Formats

Instance ID is one Data Format.

1.1.18.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Instance ID format.
  2. Propose extensions to the identified technologies or new ones.

1.1.19    Meaning

1.1.19.1   Definition

The digital representation of an input text as syntactic and semantic information.

1.1.19.2   Functional Requirements

Meaning is used to extract information from text to help the Entity Dialogue Processing AIM to produce a response.

MPAI has specified Meaning (aka Text Descriptors).

1.1.19.3   Syntax

https://schemas.mpai.community/MMC/V2.2/TextDescriptors.json

 

1.1.19.4   Semantics

Label Size Description
Header N1 Bytes
·         Standard 9 Bytes The characters “MMC-TXD-V”
·         Version N2 Bytes Major version – 1 or 2 characters
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
TextDescriptors N5 Bytes Identifier of the AV Object.
·         POS_tagging N6 Bytes Results of POS (Part of Speech, e.g., noun, verb, etc.) tagging including information on the question’s POS tagging set and tagged results.
·         NE_tagging N7 Bytes Results of NE (Named Entity e.g., Person, Organisation, Fruit, etc.) tagging results including information on the question’s tagging set and tagged results.
·         Dependency_tagging N8 Bytes Results of dependency (structure of the sentence, e.g., subject, object, head of relation, etc.) tagging including information on the question’s dependency tagging set and tagged results.
·         SRL_tagging N9 Bytes Results of SRL (Semantic Role Labelling) tagging results including information on the question’s SRL tagging set and tagged results. SRL indicates the semantic structure of the sentence such as agent, location, patient role, etc.
DesrMetadata N10 Bytes Descriptive Metadata

1.1.19.5   Data Formats

Text Descriptors as Meaning specified above are a format for a specific type of Descriptors.

1.1.19.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI specification of Text Descriptors as Meaning.
  2. Propose extensions to it or a new specification.

1.1.20    Personal Status

1.1.20.1   Definition

Personal Status (PS) indicates a set of three Factors (Cognitive State, Emotion, Social Attitude) conveyed by one or more Modalities (Text, Speech, Face, and Gesture Modalities).

1.1.20.2   Functional Requirements

Personal Status (PS) is used to assign a label, according to a given classification, to the internal state of an Entity – human of Machine.

MPAI has developed the full specification of Personal Status.

1.1.20.3   Syntax

https://schemas.mpai.community/MMC/V2.2/data/PersonalStatus.json

https://schemas.mpai.community/OSD/V1.1/data/Time.json

1.1.20.4   Semantics

1.1.20.5   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI Personal Status format.
  2. Propose extensions of the labels or new sets of labels for the Factors, or new technologies.

1.1.21    Avatar Model

1.1.21.1   Definition

The Model of an Avatar selected as the human representation of the HCI.

1.1.21.2   Functional Requirements

Avatar Model is an element of MPAI’s Portable Avatar. This can be:

1.1.21.3   Syntax

No specific syntax.

1.1.21.4   Semantics

No specific semantics.

1.1.21.5   Data Formats

An Avatar Format can be expressed as a glTF data file.

1.1.21.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the MPAI-specified technologies identified in this subsection.
  2. Propose extensions to the identified technologies or new ones.

1.1.22    Speech Model

1.1.22.1   Definition

A Data Type able to generate speech according to a specified set of features.

1.1.22.2   Functional Requirements

The generated speech can be perceived as:

  1. Having been generated by a specific human.
  2. Have a specific language or dialect intonation.
  3. Not have any specific connotation.

 

It can be implemented as a Neural Network trained to generate utterances with specific Speech Descriptors.

1.1.22.3   Syntax

No syntax provided.

1.1.22.4   Semantics

No Semantics provided.

1.1.22.5   Data Formats

No entry here.

1.1.22.6   To Respondents

Respondents are requested to:

  1. Comment on the characterisation of Speech Model.
  2. Propose Speech Model Formats.

1.1.23    Output Audio

1.1.23.1   Definition

Output Audio is Audio information produced by a digital device such as the Audio-Visual Rendering AIM.

1.1.23.2   Functional Requirements

Output Audio can be:

  1. Single channel
  2. Multiple channels

1.1.23.3   Syntax

No Syntax provided.

1.1.23.4   Semantics

No Semantics provided.

1.1.23.5   Data Formats

Data Formats and Attributes are required.

1.1.23.6   To Respondents

Proposals of Formats and Attributes are requested.

1.1.24    Output Visual

1.1.24.1   Definition

Output Visual is Visual information produced by a digital device such as the Audio-Visual Rendering AIM.

1.1.24.2   Functional Requirements

No Functional Requirements provided.

1.1.24.3   Syntax

No Syntax provided.

1.1.24.4   Semantics

No Semantics provided.

1.1.24.5   Data Formats

Data Formats and Attributes are required.

1.1.24.6   To Respondents

Proposals of Formats and Attributes are requested.

1.1.25    HCI-AMS Messages

1.1.25.1   Definition

The HCI-AMS Messages request that the Motion Actuation Subsystem (MAS) execute specified actions.

1.1.25.2   Functional Requirements

HCIs send Messages to the AMS based on messages from humans or a Remote-HCI to:

  1. Request possible Routes connecting the current place and the destination that may include:
    • Desired arrival time.
    • Stops in between place and destination.
  2. Request to
    • Execute a Route.
    • Suspend a Route.
    • Resume a Route.
    • Change a Route.
  3. Request to see/hear an M-Location corresponding to a U-Location from a Point of View.

1.1.25.3   Syntax

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

https://schemas.mpai.community/OSD/V1.1/data/SpatialAttitude.json”

https://schemas.mpai.community/CAV2/V1.0/data/RouteCommand.json”

 

1.1.25.4   Semantics

Label Size Description
Header N1 Bytes
·         Standard 9 Bytes The characters “CAV-HAM-V”
·         Version N2 Bytes Major version – 1 or 2 Bytes
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 Bytes
HCIAMSMessageID N4 Bytes Identifier of HCI-AMS Message.
HCIAMSMessageData N4 Bytes Data in HCI-AMS Message.
·         DestinationAndArrivalTime N5 Bytes Route Endpoint and arrival time.
·         IntermediateStops N6 Bytes Stops between origin and destination
·         SelectedRoute N7 Bytes ID of Route
·         RouteCommands N8 Bytes “Execute”, “Suspend”, “Resume”, “Change”
DescrMetadata N9 Bytes Descriptive Metadata

1.1.25.5   Data Formats

No Format required.

1.1.25.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the Functional Requirements of the HCI-AMS Messages identified above.
  2. Extend the list of Functional Requirements.

1.1.26    Ego-Remote HCI Messages

1.1.26.1   Definition

Information exchanged between the HCI of the Ego CAV and a peer HCI of a Remote CAV.

1.1.26.2   Functional Requirements

  1. Messages from Remote to Ego HCI have the same payload.
  2. The Ego HCI may:
    • Send messages to a Remote HCI.
    • Request a Remote HCI or a CAV-Aware entity (e.g., Roadside Unit, a Store and Forward entity etc.) requesting a particular M-Location.
    • Select the appropriate Level of Detail for transmission of requested M-Location Data.

1.1.26.3   Syntax

https://schemas.mpai.community/OSD/V1.1/data/AudioVisualSceneDescriptors.json

1.1.26.4   Semantics

Label Size Description
Header N1 Bytes
·         Standard 9 Bytes The characters “MMM-ERH-V”
·         Version N2 Bytes Major version – 1 or 2 Bytes
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 Bytes
EgoToRemoteHCIMessageID N4 Bytes Identifier of EgoRemoteHCIMessage.
EgoToRemoteHCIMessageData N5 Bytes Data of Ego-Remote-HCI Message.
·         MLocationRequest N6 Bytes
o  ULocationID N7 Bytes M-Location of intended U-Location.
o  MLocationFormatID N8 Bytes Id of M-Location Format
·         FullEvironmentRepresentation N9 Bytes AV Scene Descriptors with semantics.
·         GenericMessage N10 Bytes
DescrMetadata N11 Bytes Descriptive Metadata.

1.1.26.5   Data Formats

MPAI has defined U-Location as a spatial volume in the Universe.

The payload of a Generic Message may have other formats.

1.1.26.6   To Respondents

Respondents are invited to:

  1. Comment or elaborate on the functional requirements identified E-HCI to R-HCI message above.
  2. Propose Data Formats of U-Location and Generic Message.