--- layout: page title: "An example RFC" date: 2013-12-29 10:51 comments: true sharing: true footer: true --- _This is an example of an ADAM RFC. It is meant to demonstrate syntax of an RFC as well as example headings and structure. You can [view the source for this RFC online](https://raw.github.com/bigdatagenomics/bigdatagenomics.github.io/source/source/rfc/1/index.markdown)._ ### Schema The following schema defines how to store [FASTA formatted data](http://en.wikipedia.org/wiki/FASTA_format) in ADAM. The following schema captures all FASTA content. ```c record ADAMFastaFragment { union {null, string } description = null; union {null, long } start = null; union {null, long } end = null; union {null, string } sequence = null; } ``` All fields are optional and default to `null`. ### Performance Considerations The `end` field can be elided as it can be inferred from the `start` and `sequence` length. However, for performance, a pushdown predicate on the start and end position would be faster than materializing the sequence. ### Predicates A commonly used predicate would be to find all sequences with a specific description that `start` and `end` within a specified range. ### Common Operations 1. Reading the entire sequence 2. Reading a portion of the sequence that falls in a specified range. ### Open Questions 1. Should we require that the `end` field is always specified for performance? 2. Should we break up the description into the superset of all sequence identifiers? ### Filename extension Once coverted to ADAM, the file extension will be `.fasta.adam`.