1 Definition 2 Functional Requirements 3 Syntax 4 Semantics

1 Definition

The Genomic Processing Type defines the allowable operations, methods, and processing descriptors used for Genomic (DNA/RNA) data.
It reuses Common Definitions for: Header, Algorithm, Algorithms, FeatureClass, Features.

2 Functional Requirements

The Genomic Processing Type shall:

  • Fix Domain = Genomics.
  • Validate Operation against genomics‑specific enumerations.
  • Validate Method against genomics computational techniques.
  • Allow Algorithm to be a string identifier or an AlgorithmObject.
  • Allow Algorithms to be an array of Algorithm items.
  • Require Features to be a non‑empty array of unique strings.

3 Syntax

https://schemas.mpai.community/AIH1/V1.0/data/GenomicProcessingType.json

4 Semantics

Label Description
Header Genomic Processing Type Header, Standard “AIH‑GNT‑Vx.y”.
Domain Constant value "Genomics". Processing Type applies exclusively to genomic data.
Operation Specifies the genomics‑specific processing step. Enumerated list includes: QualityControl, Alignment, VariantCalling, VariantFiltering, Annotation, GeneExpressionQuantification, DifferentialExpression, CopyNumberAnalysis, HaplotypeReconstruction, EpigenomicProcessing.
QualityControl Operation performing read‑level QC (quality checks, trimming, adapter removal).
Alignment Operation mapping sequencing reads to a reference genome.
VariantCalling Operation identifying SNPs, indels, or structural variants.
VariantFiltering Operation filtering variants using quality thresholds or rules.
Annotation Operation adding functional/clinical annotations to variants or genes.
GeneExpressionQuantification Operation quantifying gene/isoform expression from RNA‑seq.
DifferentialExpression Operation comparing expression levels across groups/conditions.
CopyNumberAnalysis Operation detecting genomic amplifications or deletions (CNV).
HaplotypeReconstruction Operation phasing variants into haplotypes.
EpigenomicProcessing Operation analysing methylation or chromatin accessibility signals.
Method Processing technique used to implement the operation. Must be one of: FastQC, Cutadapt, Trimmomatic, BWA, Bowtie2, STAR, HISAT2, GATK, FreeBayes, DeepVariant, ANNOVAR, VEP, Salmon, Kallisto, DESeq2, EdgeR, CNVkit.
Algorithm String identifier or AlgorithmObject from CommonDefinitions. Represents the algorithm used.
AlgorithmObject.Name Required algorithm name (e.g., “GATK‑HC”, “DeepVariant‑Model”).
AlgorithmObject.Version Optional version identifier.
AlgorithmObject.Params Free‑form object containing algorithm configuration parameters.
Algorithms Array of Algorithm entries, each either a string ID or an AlgorithmObject.
FeatureClass Category describing type of genomic features (e.g., variant features, expression features, CNV features).
Features Non‑empty array of unique genomic feature names (e.g., SNP_Count, TPM_FoldChange, CNV_Events).
Trace Provenance information and Time.
DescrMetadata Descriptive Metadata