]> 1.0.6 CC0 [https://creativecommons.org/publicdomain/zero/1.0/]. Begum Durgahee Best Practices 1.) The class hierarchy, object- and datatype-property hierarchies are modeled to match SIO's hierarchies. 2.) Every class and property has a label (rdfs:label) as well as a description (rdfs:comment). The description may contain an example of how a class/property is being applied. 3.) Classes that are directly related to either GFF3, GTF, GVF or VCF specification have a link-out to the specification's document as provenance indicator (via rdfs:isDefinedBy). 4.) Link-outs to Wikipedia are provided for classes wherever possible. 5.) Ontology terms are encoded in the URIs using camel case, i.e. letters following a white space in an ontology term are capitalized followed by the removal of the white space. Chris Mungall CODAMONO Erick Antezana Francesco Strozzi Genomic Feature and Variation Ontology (GFVO) Joachim Baran Joachim Baran Karen Eilbeck Michel Dumontier Raoul Bonnal Robert Hoehndorf Stephen Kan (GitHub: Helisquinde) Takatomo Fujisawa The Genomic Feature and Variation Ontology (GFVO) is modeled to represent genomic data using the Resource Description Format (RDF). It is captures the contents of data files that adhere to the Generic Feature Format Version 3 (GFF3, http://www.sequenceontology.org/resources/gff3.html), the General Transfer Format (GTF, http://mblab.wustl.edu/GTF22.html), the Genome Variation Format Version 1 (GVF, http://www.sequenceontology.org/resources/gvf.html), and the Variant Call Format (VCF, http://vcftools.sourceforge.net/specs.html). The creation of the ontology was inspired by previous work of Robert Hoehndorf on RDF2OWL (http://code.google.com/p/rdf2owl). Toshiaki Katayama https://github.com/BioInterchange/Ontologies describes Links to an entity for which supportive information is being provided. has annotation Links to additional annotations about an entity. has attribute Links out to aggregate information for an entity. has evidence References an entity or resource that provides supporting/refuting evidence. has first part Denotes the first entity of an ordered part relationship. has identifier Links out to an identifier. has input Links out to an entity that is the input of a "Process" subclass. has last part Denotes the last entity of an ordered part relationship. has member Denotes membership for "Collection", "Catalog" and "File" instances. has ordered part Denotes a compositional relationship to other entities, where the ordering of the composition of entities carries meaning. has output Links out to an entity that is the output of a "Process" subclass. has part Denotes a compositional relationship to other entities. has participant Denotes the participation of other entities in processes. has quality Links out to an entity that provides qualitative information. has source Denotes information origin. is about References an entity about which information is provided for. is affected by Denotes that an entity is affected by another entity. is after Denotes the trailing occurrence or succession of the subject in regards to the object. is attribute of Denotes that an entity is an attribute of the entity that this property links out to. is before Denotes the leading occurrence or precedence of the subject in regards to the object. is created by Denotes the process or method that created an entity. is described by Provides a description of the subject via reference to an object that provides further information on the subject. is located on Denotes the location of genomic feature on a landmark. is part of Denotes that an entity is an intrinsic component of an encapsulating entity. is participant in Denotes participation with another entity. is refuted by References an entity or resource that provides refuting evidence. is source of Denotes that an entity is the source of the entity that this property links out to. is spatiotemporally related to Denotes spatio-temporal relations to other entities. is supported by References an entity or resource that provides supporting evidence. is temporarily part of Denotes a temporarily constraint "isPartOf" relationship. The temporal restriction expresses that the relationship is not universally true. This property can be used to express "Derives_from" relations in GFF3. references References another entity or resource. refers to References an entity, where additional information is provided to augment the reference. has value Representation of any literal that is associated with a GFVO class instance. Domain restrictions might apply. For example, "Codon Sequence" entities restrict "has value" to be a non-empty string consisting of A, C, G, or T letters, and whose length is a multiple of 3. Alias http://en.wikipedia.org/wiki/Pseudonym An alias is an alternative name for an entity. The use of an alias is mostly secondary, whereas instances of the "Name" class should be used to denote primary names. Encodes for the "Alias" attribute in GFF3 and GVF. Allele Count Count of a specific allele in genotypes. Encodes for "AC" additional information in VCF files. Allele Frequency 1.0 0.0 0.0 1.0 http://en.wikipedia.org/wiki/Allele_frequency Proportion of a particular gene allele in a gene pool or genotype. Encodes for "AF" additional information in VCF files. Amino Acid [A-Z] "Amino Acid" encodes for the "Variant_aa" and "Reference_aa" attributes in GVF files. Linking an "Amino Acid" instance to a "Reference Sequence" or "Sequence Variant" instance denotes the genomic context of the amino acid. http://en.wikipedia.org/wiki/Amino_acid Ancestral Sequence Denotes an ancestral allele of a feature. May be used to denote the "ancestral allele" ("AA" additional information) of VCF formatted files. http://en.wikipedia.org/wiki/Ancestral_reconstruction Array Comparative Genomic Hybridization Feature provenance is based on array-comparative genomic hybridization. Used by the "data-source" structured pragma in GVF. Attribute An attribute denotes characteristics of an entity. At this stage, "Quality" is the only direct subclass of "Attribute", whose subclasses denote qualitative properties such as sex ("Female", "Male", "Hermaphrodite"), zygosity ("Hemizygous", "Heterozygous", "Homozygous"), etc. The object property "has quality" (or subproperties thereof) should be utilized to express qualities of entities. The "hasAttribute" object property should be used to denote relationships to "Object" or "Process" instances, unless there is a better object property suitable to represent the relationship between entities. Average Coverage http://en.wikipedia.org/wiki/Shotgun_sequencing#Coverage Average coverage depth for a genomic locus (a region or single base pair), i.e. the average number of reads representing a given nucleotide in the reconstructed sequence. Captures the "technology-platform" structured pragma in GVF files ("Average_coverage" tag). Base Quality Root mean square base quality. Accounts for "BQ" additional information in VCF files. Biological Entity A biological entity an entity that contains genomic material or utilizes genomic material during its existance. Genomic material itself is represented as subclasses of "Chemical Entity". Biopolymer Sequencing Information about features and variants is based on biopolymer sequencing. This class is not directly instantiated, but its subclasses "DNA Sequencing" and "RNA Sequencing" are used to describe the "data-source" structured pragma in GVF. http://en.wikipedia.org/wiki/Sequencing Breakpoint A breakpoint describes the source or destination of a zero-length sequence alteration. These alterations are typically insertions, deletions or translocations according to the GVF specification (see "Breakpoint_detail" in http://sequenceontology.org/resources/gvf.html). Breakpoint coordinates should be provided using classes of the Feature Annotation Location Description Ontology. The class encodes for the "Breakpoint_detail" and "Breakpoint_range" attributes in GVF. Catalog A catalog is a specialization of a "Collection", where all of its contents are of the same type. The requirement of same type cannot be enforced formally via this ontology; data providers need to verify this condition manually or programmatically, or alternatively, use the more generic "Collection" class instead. Cell A cell is a biological unit that in itself forms a living organism or is part of a larger organism that is composed of many other cells. The subclasses "Germline Cell" and "Somatic Cell" can be used to denote the biological material that was used in an experiment. http://en.wikipedia.org/wiki/Cell_(biology) Chemical Entity A chemical entity is an entity related to biochemistry. This class is typically not instantiated, but instead, its subclasses "Amino Acid", "Chromosome", "Peptide Sequence", etc., are used to represent specific chemical entities. Chromosome A chromosome can be used as an abstract representation of a (not necessarily named) chromosome to represent ploidy within a data set. The "Chromosome" instance is then used for for denoting the locus of phased genotypes. For placing genomic features ("Feature" class instances) on a chromosome, contig, scaffold, etc., please see the "Landmark" class. It is encouraged that Sequence Ontology terms are used to annotate a Landmark with a biological type (s.a. "chromosome", "contig", etc.). Encodes for "sequencing-scope" pragma in GVF. http://en.wikipedia.org/wiki/Chromosome Circular Helix A circular helix structure. Can be used to indicate a true "Is_circular" attribute in GFF3 and GVF. http://en.wikipedia.org/wiki/Circular_DNA Coding Frame Offset 0 2 http://en.wikipedia.org/wiki/Reading_frame Coding frame offset of a genomic feature that is a coding sequence or other genomic feature that contributes to transcription and translation. A feature's coding frame offset can be either 0, 1, or 2. It is referred to as "frame" in GTF, but called "phase" in GFF3 and GVF. "phase" is defined in GVF, but unused. Codon Sequence ([ACGT]{3})+ http://en.wikipedia.org/wiki/Codon A codon sequence is a nucleotide sequence underlying a potential amino acid sequence. Codon sequences are three bases of length or multiples thereof. Encodes for "Variant_codon" and "Reference_codon" attributes in GVF. Collection A collection is a container for genomic data. A collection may contain information about genomic data including -- but not limited to -- contents of GFF3, GTF, GVF and VCF files. The latter are better represented by "File" class instances, whereas the result of unions or intersections between different "File" class instances should be captured within this format-independent "Collection" class. When importing data whose provenance is not file based, instances of "Collection" should be utilized (e.g., database exports). Comment A comment is a remark about a piece of information, an observation or statement. In the context of GFF3, GVF, etc., genomic feature and variation descriptions, "isAfter" and "isBefore" relationships should be used to indicate where a comment is situated between pragma or feature statements of GFF3, GTF, GVF or VCF files. For specific descriptions or textual annotations of genomic features, the use of the "Note" class is encouraged. Conditional Genotype Quality Conditional genotype quality expressed in form of a "Phred Score". It denotes the score of the genotype being wrong under the assumption/condition that the genomic site is a sequence variant. Encodes for "GQ" additional information in VCF files. Contig A contig is a contiguous DNA sequence that has been assembled from shorter overlapping DNA segments. "Contig" is a specialization of a "Collection" and should be used to aggregate features, but not for indicating that a "Landmark" is representing a contig. It is encouraged that the latter is annotated by a term of the Sequence Ontology. Encodes for "sequencing-scope" in GVF and the "contig" information field in VCF. http://en.wikipedia.org/wiki/Contig Coverage 0.0 0.0 0 Number of nucleic acid sequence reads for a particular genomic locus (a region or single base pair). Accounts for "DP" additional information in VCF files. DNA Microarray Feature information is based on DNA microarray probes. Used by the "data-source" structured pragma in GVF. http://en.wikipedia.org/wiki/DNA_microarray_experiment DNA Sequence ([ACGT])+ A DNA sequence is a sequence of nucleic acids. It can be used to describe "FASTA" annotations in both GFF3 and GVF files as well as short sequences in VCF files. http://en.wikipedia.org/wiki/Dna_sequence DNA Sequencing http://en.wikipedia.org/wiki/DNA_sequencing Information about features and variants is based on DNA sequencing. Used by the "data-source" structured pragma in GVF. Exome Representation of an exome. Features that constitute the exome may be linked via one or more "Collection", "Catalog", "Contig", "Scaffold" or "File" instances. "Exome" can be used for describing data contents of the "sequencing-scope" pragma in GVF files. http://en.wikipedia.org/wiki/Exome Experimental Method An experimental method is a procedure that yields an experimental outcome (result). Experimental methods can be in vivo, in vitro or in silico procedures that are well described and can be referenced. Encodes for "source" column contents of GFF3, GTF, and GVF file formats as well as the "CHROM" column in VCF. Can be used to describe the "capture-method" pragma in GVF; it can describe "VALIDATED" additional information in VCF. External Reference A cross-reference to associate an entity to a representation in another database. Encodes for the "Dbxref" attribute in GFF3 and GVF. Can be used to describe the contents of the "source" column in GTF files. Captures the "genome-build" pragma, "source-method", "attribute-method", "phenotype-description", and "phased-genotypes" structured pragmas in GVF. Accounts for the "assembly" and "pedigreeDB" information fields, and "DB", "H2", "H3", "1000G" additional information in VCF. Feature The feature class captures information about genomic sequence features and variations. A genomic feature can be a large object, such as a chromosome or contig, down to single base-pair reference or variant alleles. Female Denoting sex of a female individual, who is defined as an individual producing ova. This quality can be used to encode for the "sex" pragma in GVF files. http://en.wikipedia.org/wiki/Female File A file represents the contents of a GFF3, GTF, GVF or VCF file. It can capture genomic meta-data that is specific to any of these file formats. The result of unions, intersections or other operations between "File" class instances should be capture with the generic "Collection" class, which is format independent. Forward Reference Sequence Frameshift Denotes a frameshift forward in the reference sequence. Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF. Fragment Read Platform Details about the fragment-read (single-end read) sequencing technology used to gather the data in a set. Encodes for the "technology-platform-read-type" pragma in GVF. Functional Specification A functional specification of bioinformatics data, i.e. the specification of genomic material that potentially has biological function. This class should not be directly instantiated, but instead, its subclass "Genotype" should be used. Gametic Phase Denotes the presence of information that required capturing the gametic phase. For diploid organisms, this quality indicates that information is available about which chromosome of a chromosome pair contributed data. "Gametic Phase" encodes for the "Phase" attribute in GVF. It encodes for "GT" and "PS" additional information in VCF. http://en.wikipedia.org/wiki/Gametic_phase Genome Representation of a genome. Genomic features that constitute the genome may be linked via one or more "Collection", "Catalog", "Contig", "Scaffold" or "File" instances. "Genome" can be used for describing data contents of the "genome-build" and "sequencing-scope" pragmas in GVF files. http://en.wikipedia.org/wiki/Genome Genome Analysis A genome analysis denotes the type of procedure that was carried out to derive information from a genome assembly. "Genome Analysis" can be instantiated for cases where an application of "FILTER" in VCF cannot be linked to "Genotyping" or "Variant Calling", which are subclasses of "Genome Analysis". If possible, further annotation should be provided to indicate the actually utilized filter type. http://en.wikipedia.org/wiki/Genomics#Genome_analysis Genomic Ascertaining Method Provides information about the source of data. Subclasses of the class can be used to encode for the "data-source" structured pragma in GVF. Genotype http://en.wikipedia.org/wiki/Genotype The genotype is the genetic information captured in a particular genome. It can also refer to one or more populations, if statistical distributions are provided that assign genetic codes to groups of individuals. A genotype is denoted by a string of slash-separated list of alleles ("has value" property). The length of the list is dependent on the ploidy of the studied species as well as sequencing technique used. Example: "A/G" denotes a genotype with alleles "A" and "G". Encodes for the "Genotype" attribute in GVF and "GT" additional information in VCF. Genotyping http://en.wikipedia.org/wiki/Genotyping Genotyping is the process of determining the genetics of an individual or sample. The genotype itself is expressed as the difference of genetic mark-up compared to a reference genome. Applicable to the "FILTER" information field in VCF. Germline Cell http://en.wikipedia.org/wiki/Germline The germline feature class captures information about genomic sequence features arising from germline cells. VCF files permit the explicit annotation with "Somatic Cell" via "SOMATIC" additional information. The absence of that field does not imply the presence of germine cell material though. Describes the "genomic-source" pragma in GVF. Haplotype A "Haplotype" is a collection of "Genotype" or "Sequence Variant" instances. It can imply that a set of genes is inherited as a group, or alternatively, that the set of genotypes or sequence variance has a biological function when acting together (e.g., there exists a disease association). Haplotype instances should only catalog a single type, i.e. either "Genotype" or "Sequence Variant" instances; they should not mix both types (see also "Catalog"). Encodes for "HQ" additional information in VCF. http://en.wikipedia.org/wiki/Haplotype Helix Structure http://en.wikipedia.org/wiki/DNA_helix "Helix Structure" denotes the physical shape of biopolymers. The subclasses "Circular Helix" and "Watson-Crick Helix" can be used to for encoding the "Is_circular" attribute in GFF3 and GVF files. Hemizygous http://en.wikipedia.org/wiki/Zygosity#Hemizygous A sequence alteration with hemizygous alleles. This quality can be used to directly encode for the "Zygosity" attribute in GVF files and it indirectly describes genotypes in VCF files. Heritage http://en.wikipedia.org/wiki/Heredity Heritage denotes the passing of traits from parents or ancestors. Passed traits may not be visible as a phenotype, but instead, might only manifest as genetic inheritance. Hermaphrodite http://en.wikipedia.org/wiki/Hermaphrodite Denoting sex of an individual that contains both male and female gametes. Heterozygous http://en.wikipedia.org/wiki/Zygosity#Heterozygous A sequence alteration with heterozygous alleles. This quality can be used to directly encode for the "Zygosity" attribute in GVF files and it indirectly describes genotypes in VCF files. Homozygous http://en.wikipedia.org/wiki/Zygosity#Homozygous A sequence alteration with homozygous alleles. This quality can be used to directly encode for the "Zygosity" attribute in GVF files and it indirectly describes genotypes in VCF files. Identifier An identifier labels an entity with preferably a single term that is interpreted as an accession. An accession labels entities that are part of a collection of similar type. More generic naming of entities can be achieved using the "Label" class. Encodes for the "seqid" column in GFF3 and GVF; encodes for the "seqname" column in GTF and "CHROM" column in VCF. Captures the "ID" attribute in GFF3 and GVF. Suitable for expression values of "individual-id" and "technology-platform-machine-id" pragmas in GVF. Encodes for the "ID" key/value property in VCF. http://en.wikipedia.org/wiki/Identifier Information Content Entity An information content entity is a data structure or data type that requires background information or specific domain knowledge to be interpreted correctly. Information content entities can be of simple structure, such as "Label" that only requires the application of "has value" to be meaningful, or, they can be of complex structure such as "Locus" which becomes meaningful with multiple FALDO annotations. Label A label is a term or short list of terms that describe an entity for the purpose of distinguishing it from entities of similar type. It should be considered to utilize the "Identifier" class, if labels of entities are sufficiently unique to actually identify them. Encodes for the "PEDIGREE" information field in VCF. Landmark A landmark establishes a coordinate system for features. Landmarks can be chromosomes, contigs, scaffolds or other constructs that can harbor "Feature" class instances. For expressing ploidy within a data set, please refer to the "Chromosome" class. To annotate a landmark with a biological type, it is encouraged to use terms of the Sequence Ontology, but not the classes "Chromosome", "Scaffold" and "Contig". The latter classes are used for describing ploidy within a dataset as well as offering means of data aggregation. Encodes for the "seqid" column in GFF3 and GVF; encodes for the "seqname" column in GTF and "CHROM" column in VCF. Captures the "sequence-region" pragma in in GFF3 and GVF as well as their "FASTA" annotation. Encodes for "DNA", "RNA" and "Protein" "##"-lines in GTF. Captures the "contig" information field in VCF. Likelihood A "Likelihood" is a probability of a certain even occurring. For use with "GL" and "GP" additional information in VCF files. Likelihood of Heterogeneous Ploidy "Likelihood of Heterogeneous Ploidy" expresses the likelihood of genotypes in absence of copy number data. Specifically designed to encode for values of "GLE" additional information in VCF files. Locus http://en.wikipedia.org/wiki/Locus_(genetics) A locus refers to a position within a designated genomic landmark. Actual locus coordinates should be provided using classes of the Feature Annotation Location Description Ontology. The class encodes for the "start", "end" and "strand" columns in GFF3, GTF, and GVF and for the "POS" column in VCF. It also encodes the "Start_range" and "End_range" attributes in GVF. Male http://en.wikipedia.org/wiki/Male Denoting sex of a male individual, who is defined as an individual producing spermatozoa. This quality can be used to encode for the "sex" pragma in GVF files. Mapping Quality Root mean square mapping quality. Encodes values of the "MQ" additional information in VCF files. Match Denotes a match between the reference sequence and target sequence. Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF. Material Entity A material entity represents a physical object. In the context of genomic features and variations, material entities are cells, organisms, sequences, chromosomes, etc. "Material Entity" should not be instantiated as such, instead, it is suggested that its subclasses "Genome", "DNA Sequence", "Sample Count", etc. are appropriated. Maternal Heritage Maternal heritage is the passing of traits from a female to her ancestors. Currently ununsed but might be applicable to phased genotypes in GVF and VCF files; included for future use. http://en.wikipedia.org/wiki/Maternal_effect Name A name assigns an entity a non-formal term (or multiples thereof) that can provide information about the entities identity. Unlike an "Identifier", a name should not be considered unique. Encodes for the "feature" column in GTF. Captures the "genome-build" pragma in GFF3 and GVF. Captures the "population", "technology-platform-name" pragmas in GVF. A note is a short textual description about an entity. It provides a formal or semi-formal description of an entity, as opposed to a "Comment". Encodes for the "sample-description" pragma and "Comment" key/value pairs in structured attributes in GVF. Captures "Description" key/value pairs in information fields and "SB" information field in VCF. Number of Reads 0 Number of reads supporting a particular feature or variant. Can encode for "MQ0" additional information in VCF files, if additional annotations are provided to denote a mapping quality of zero for the given count. In GVF files, the class accounts for the "Variant_reads" attribute. Object An object is a concrete entity that realizes a concept and encapsulates data associated with said concept. Objects are typically representing tangible entities, such as "Chromosome", "DNA Sequence", but also objects such as "Identifier", "Average Coverage" or other computational or mathematical entities. Since an object describes a large body of entities, its use is discouraged. Where applicable, one of its subclasses should be used instead. Paired End Read Platform Details about the paired-end read sequencing technology used to gather the data in a set. Encodes for the "technology-platform-read-type" pragma in GVF. Paternal Heritage http://en.wikipedia.org/wiki/Maternal_effect#Paternal_effect_genes Paternal heritage is the passing of traits from a male to his ancestors. Currently ununsed but might be applicable to phased genotypes in GVF and VCF files; included for future use. Peptide Sequence ([A-Z])+ A peptide sequence is an ordered sequence of amino acid residues, but which may not necessarily be a protein sequence. For encoding sequences of proteins, the subclass "Protein Sequence" should be used. Encodes for "FASTA" annotation in GFF3 and GVF. http://en.wikipedia.org/wiki/Peptide_sequence Phenotype http://en.wikipedia.org/wiki/Phenotype A phenotype description represents additional information about a sequenced individual's phenotype. A sequenced individual is represented by instances of the "Sequenced Individual" class. Encodes for the "phenotype-description" structured pragma in GVF. Phred Score 0 http://en.wikipedia.org/wiki/Phred_score The Phred score can be used to assign quality scores to base calls of DNA sequences. GVF supports the use of Phred scores in the "score" column, but this information needs to be obtained/given by the data provider. In VCF files, the "QUAL" column and "PL", "HQ", and "PQ" additional information carries Phred scores that can be encoded as "Phred Score". GFVO's "Score" and "Phred Score" cannot be defined as equivalent to the Sequence Ontology terms "score" (SO:0001685) and "quality_value" (SO:0001686) due to differences in inheritance between the two ontology implementations. GFVO defines "Phred Score" as a subclass of "Score", but the Sequence Ontology defines "score" as a sibling of "quality_value". Prenatal Cell true A prenatal feature is purportedly associated with prenatal cells; the GVF specification declares this feature type under the prama directive "##genomic-source", but does not describe its semantics and the referenced Logical Observation Identifiers Names and Codes (LOINC, http://loinc.org), do not define the meaning or intended usage of the term "prenatal" either. http://en.wikipedia.org/wiki/Prenatal Process A process denotes a temporally dependent entity. It can be thought of as a function, where input data is transformed by an algorithm to produce certain output data. Since a process describes a large number of entities, its direct use is discouraged. At least "Experimental Method" or one of its subclasses should be used instead. Protein Sequence http://en.wikipedia.org/wiki/Peptide_sequence A protein sequence is a peptide sequence which represents the primary structure of a protein. Encodes for "sequencing-scope" pragma in GVF. Proteome Representation of a proteome. Features that constitute or contribute to the proteome may be linked via one or more "Collection", "Catalog", "Contig", "Scaffold" or "File" instances. It is envisioned that "Proteome" could be used for describing data contents of the "sequencing-scope" pragma in GVF files. http://en.wikipedia.org/wiki/Proteome Quality Quality is a specific attribute that is strongly associated with an entity, but whose values are varying and disjunct. Qualities are finite enumerations, such as sex ("Female", "Male", "Hermaphrodite"), heritage ("Maternal", "Paternal"), but they also make use of the "has value" datatype property such as "Coding Frame Offset" (either "0", "1" or "2"). For encoding numerical qualities, see "Base Quality" and "Mapping Quality", or, "Phred Score" and "Conditional Genotype Quality", which are sub-classes of "Score". Quantity A property of a phenomenon, body, or substance, where the property has a value that can be expressed by means of a number. This class is typically not directly instantiated, but instead, its subclasses "Allele Frequency", "Average Coverage", etc. are used. RNA Sequencing http://en.wikipedia.org/wiki/RNA-Seq Information about features and variants is based on RNA sequencing. Used by the "data-source" structured pragma in GVF. Reference Sequence http://en.wikipedia.org/wiki/Reference_sequence Denotes the reference sequence of a feature. The reference sequence is of importance when dealing with genomic variation data, which is expressed by the "Variant" class. Encodes for the "Reference_seq" and "Sequence_context" attributes in GVF and the "REF" column in VCF. Reference Sequence Gap Denotes a gap in the reference sequence for an alignment. Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF. Reverse Reference Sequence Frameshift Denotes a frameshift backwards (reverse) in the reference sequence. Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF. Sample A sample is a limited quantity of a chemical entity, which is typically used (destructively/non-desctructively) in a scientific analysis or test. It can be applied to describe contents of the "sample-description" pragma in GVF files or the "SAMPLE" information field in VCF files. http://en.wikipedia.org/wiki/Sample_(material) Sample Count Number of samples in the dataset. Encodes for "NS" additional information in VCF files. Sample Mixture 1.0 0.0 0.0 1.0 http://en.wikipedia.org/wiki/Biopsy Sample mixture determines the proportion of various tissues/cell types in a biological sample that has been taken as part of a biopsy. The sum of various sample mixtures belonging to the same sample should equal 1. Expresses the "Mixture" key/value pair in "SAMPLE" fields in VCF. Scaffold http://en.wikipedia.org/wiki/Contig A scaffold is the aggregation of multiple contigs to form a larger continuous sequencing region. "Scaffold" is a specialization of a "Collection" and should be used to aggregate features, but not for indicating that a "Landmark" is representing a scaffold. It is encouraged that the latter is annotated by a term of the Sequence Ontology. Encodes for "sequencing-scope" in GVF. Score A measure that permits the ranking of entities. Directly encodes for the "score" column in GFF3, GTF and GVF files; if the actual scoring algorithm is known, then "Phred Score" might be used to encode for the values of the "score" column in GVF files. The class can encapsule information of the "score-method" pragma in GVF files. For VCF files, the subclasses "Phred Score" and "Conditional Genotype Quality" should be used. GFVO's "Score" and "Phred Score" cannot be defined as equivalent to the Sequence Ontology terms "score" (SO:0001685) and "quality_value" (SO:0001686) due to differences in inheritance between the two ontology implementations. GFVO defines "Phred Score" as a subclass of "Score", but the Sequence Ontology defines "score" as a sibling of "quality_value". http://en.wikipedia.org/wiki/Score_(statistics) Sequence ([ACGTUWSMKRYBDHVN\-]+|\~[0-9]*|\.|!|\^) A sequence provides information about any biopolymer sequences. Specialized subclasses are provided to denote specialized instances of sequences, such as "Codon Sequence", "Reference Sequence", "Protein Sequence", etc. Can be used to encode for the "sequencing-scope" pragma in GVF files. See subclasses for applications in both GFF3 and GVF files. Sequence Alignment http://en.wikipedia.org/wiki/Sequence_alignment A sequence alignment denotes the congruence of two sequences. In GFF3/GVF, a sequence alignment can be a nucleotide-to-nucleotide or protein-to-nucleotide alignment (see "The Gap Attribute", http://sequenceontology.org/resources/gff3.html). "Alignment Operation" class instances denote the actual steps that the constitute the sequence alignment. Encodes for the "Target" attribute in GFF3/GVF files as well as the "sequence-alignment" pragma in GVF files. Can encode "CIGAR" additional information of VCF files. Sequence Alignment Operation A sequence alignment operation captures the type of alignment (see "Sequence Alignment") between a reference sequence and target sequence. Note that a "Sequence Alignment Operation" is situated in a linked list, where the order of the alignment operations is of significance. Its subclasses are used to encode for the "Target" attribute and "sequence-alignment" pragma in GVF, and, they encode "CIGAR" additional information in VCF. http://en.wikipedia.org/wiki/Sequence_alignment Sequence Variant http://en.wikipedia.org/wiki/Mutation Describing specific sequence alterations of a genomic feature. A variant is related to "Reference" class instances, which denote the sequence that serves as a basis for sequence alteration comparisons. Encodes for the "Variant_seq" attribute in GVF. Captures the "ALT" column and "ALT" information field in VCF. Sequenced Individual An abstract representation of a particular individual for representing aggregated sequencing information. "Sequenced Individual" can also be used to denote complex heritage relationships in genomic samples. Encodes for the "individual" attribute and "multi-individual" pragma in GVF. Sequencing Technology Platform http://en.wikipedia.org/wiki/Read_(Biology) Details about the sequencing/microarray technology used to gather the data in a set. Encodes for the "technology-platform-class" pragma and is composite for aggregating information of the pragma statements "technology-platform-name", "technology-platform-version", "technology-platform-machine-id", "technology-platform-read-length", "technology-platform-read-type", "technology-platform-read-pair-span", "technology-platform-average-coverage", as well as the structured pragma "technology-platform" in GVF. Sex Biological sex of a sequenced individual. Subclasses "Female" and "Male" can be used to encode for the "sex" pragma in GVF files. The subclass "Hermaphrodite" is included for potential future use cases. http://en.wikipedia.org/wiki/Sex Somatic Cell The somatic feature class captures information about genomic sequence features arising from somatic cells. Encodes for "genomic-source" pragma in GVF and "SOMATIC" additional information in VCF. http://en.wikipedia.org/wiki/Somatic Span 0 A span is an attribute denoting the number of nucleotides or peptides that an entity covers. This is directly used in conjunction with "Sequence Alignment Operation" subclasses to express the number of nucleotides a sequence alignment match ranges over, which can be used in conjunction with GFF3/GVF files. The class also covers "technology-platform-read-length" and "technology-platform-read-pair-span" pragmas in GVF files. Target Sequence Gap Denotes a gap in the target sequence for an alignment. Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF. Total Number of Alleles Total number of alleles in called genotypes. Encodes for "AN" additional information in VCF files. Total Number of Reads Total number of reads covering a feature or variant. Covers the "Total_reads" attribute in GVF files and the "DP" additional information field in VCF files. Variant Calling http://en.wikipedia.org/wiki/SNV_calling_from_NGS_data Denotes the technique of calling genomic feature variants in a genome assembly. Applicable to the "FILTER" information field in VCF as well as the "variant-calling" pragma in GVF. Version A "Version" names a release of a software, dataset, or other resource. Versions can follow the common "major.minor.patch" version format, but are not restricted in any way. The version can also incorporate the dataset name (e.g., "HGNC19"). Encodes for the "gff-version" and "gvf-version" pragma statements in GFF3 and GVF, respectively. Encodes for the "gff-version" "##"-line type in GTF. Captures the "fileformat" meta-information line in VCF. Encodes for "file-version" and "technology-platform-version" pragmas in GVF. http://en.wikipedia.org/wiki/Versioning Watson-Crick Helix http://en.wikipedia.org/wiki/Non-helical_models_of_DNA_structure#Proposal_of_Watson.E2.80.93Crick_helical_structure Helical structure as first proposed by James Watson and Francis Crick, whose work was greatly influenced by discoveries of Rosalind Franklin and Maurice Wilkins. Can be used to indicate a false "Is_circular" attribute in GFF3 and GVF. Zygosity http://en.wikipedia.org/wiki/Zygosity Zygosity denotes the similarities of a specific allele in the genome of an organism. Subclasses can be utilized to directly encode zygosity (e.g., "Zygosity" attribute in GVF files), or, encode zygosity indirectly by inferring it from genotype descriptions (case in VCF files).