Manuel Lemos (mlemos@acm.org)
Igor Feghali (ifeghali@php.net)
Version control: @(#) $Id$
The schema description format is based on XML (eXtensible Markup Language). For those that are not familiar with it, XML is standard that specifies rules to define abstract data formats based on tagged text.
Like HTML (HyperText Markup Language), XML is also based on SGML (Standard Generalized Markup Language). SGML defines rules to structure data using special text tags.
SGML tags may be used to delimit data sections. Section begin tags are of the form <name attributes... > and end tags are of the form </name>. name is the name of the tag and attributes is a set of zero or more pairs of attribute names and the values associated with the respective tag.
XML is a little stricter in the way tags may be used. While with many SGML formats some end tags are optional, in XML end tags are always required. Also, when tag attributes are used, attribute values must be specified always between quotes. These XML requirements are usually known as well-formedness.
Another important detail about XML strictness is that tag names and attributes are case sensitive. This means that tags in upper case are distinct from tags in lower case.
Unlike a common (but mistaken) belief, XML is not meant just for describing data in documents that are meant to be displayed or printed. XML is a standard that defines rules for describing abstract data may be used to for any purpose.
Even though it may be used to schemas that may be displayed or printed, Metabase schema description format is meant to provide a way for developers to design their database schemas using a DBMS independent file format. Using this format, developers may describe relations and properties of tables, field, indexes, sequences, etc..
This format uses just a subset of the XML known as SML (Simplified Markup Language). SML formats complies with the same rules as any XML format but it does not use all its possibilities to make it simpler for users to write and understand the data. For instance, files written on SML do not use any tag attributes.
Metabase schema description format is also simpler to edit by hand because tags and constant data values should always be in lower case to save the user from holding frequently the keyboard shift key.
The Metabase schema description format lets the developers describe a set of database schema objects following the database objects hierarchy. This means that a database may contain tables and sequence objects, tables may contain fields and index objects and all these objects have their own attributes.
The definition of each database schema object contained within the begin and end tags of the respective container object. Therefore, the definition of each table and sequence has to be specified between the main database begin and end tags. Likewise, the definition of fields and indexes has to be specified between the respective table begin and end tags.
The properties of each schema object are also defined between the respective begin and end tags. The values of each property are also defined between the respective property being and end tags.
The values of the properties are subject of validation according to the type of each property and the context within which they are being defined.
Some properties define names of database schema objects. There are names that are accepted as valid for some DBMS that are not accepted by other DBMS. Metabase schema parser may optionally fail if such names are used to reduce the potential problems when using the same Metabase based application with different DBMS.
The schema object description tags are defined as follows:
The database tag should be always at the top of the schema object hierarchy. Currently it may contain the definition of two types of objects: table and sequence.
The database schema object may have the following properties:
Name of the database that is meant to be created when it is installed for the first time.
Default: none
Boolean flag that indicates whether the database manager class should create the specified database or use a previously installed database of the same name.
This property may have to be set to false if you are splitting your database definition in multiple schema description files and only the installation of the first description file is supposed to lead to the actual database creation operation.
Another circumstance on which this property may have to be set to false is when the DBMS driver does not support database creation or if this operation requires special database administrator permissions that may not be available to the database user.
Default: false
Boolean flag that indicates whether the database manager class should overwrite previously created structures.
Default: false
Charset/Encoding of database.
Default: none
Free text property meant for describing the purpose of the database. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
Additional database comments. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
The table is one of the main database schema objects. It may be used in a schema description multiple times, once per each table that is contained the database being described.
The table object definition may contain the declaration and initialization sections besides the properties. The table schema object may have the following properties:
Name of the table.
Default: none
Name that the table had before the last time it was renamed. This property is only meant to be used when the database is being upgraded and the table is meant to be renamed.
Beware that if this property is not specified and the name is changed, that is intended as if it is meant to drop the table with the previous name and create a new table without keeping the data that was previously stored in the table.
Default: table name
Free text property meant for describing the purpose of the table. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
Additional table comments. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
<table>
<name>users</name>
<declaration>
<field>
<name>id</name>
<type>integer</type>
<notnull>1</notnull>
<default>0</default>
</field>
<field>
<name>name</name>
<type>text</type>
<length><variable>user_name_length</variable></length>
</field>
</declaration>
<initialization>
<insert>
<field>
<name>id</name>
<value>1</value>
</field>
<field>
<name>name</name>
<value>administrator</value>
</field>
</insert>
</initialization>
</table>
declaration is one of the sections that is part of the table definition. This section is required because it must contain the definition of the table field and index objects.
field is one of the types of table definition object. It should be specified within the table declaration section for each field that the table should contain.
The field schema object may have the following properties:
Name of the field.
Default: none
Name that the field had before the last time it was renamed. This property is only meant to be used when the database is being upgraded and the field is meant to be renamed.
Default: field name
Type of the field. Valid type values are: integer, text, boolean, date, timestamp, time, float, decimal, clob and blob.
Default: none
Boolean flag property that specifies whether the table field should have a fixed length.
Default: false
Default value for the field. The allowed values depend on the type of the field. If this property is not specified the default value is assumed to be NULL. If there is a table index on the field being declared, the default value must be specified because indexes can not be created on table fields with NULL values.
Notice that some low end DBMS do not support specifying default values in the field definition. Make sure that the you either specify all the values for each new row that is inserted in the tables or that the DBMS actually supports default values.
Only some DBMS support default values for large object fields. For this reason it is not allowed to specify default values on the definition of this type of field.
Default: depends on field type
Boolean flag property that specifies whether the table field should be created with the constraint NOT NULL. As it suggests, this implies that it is not allowed to set this field value with NULL. The parser does not allow that an index is declared on a field that is not declared with the notnull flag property.
Default: false
Boolean flag property that specifies whether the field should be an auto incrementing not null integer field with a primary key. Note that some DBMS do not support auto increment natively in which case this is emulated with a trigger that updates a sequence.
Default: false
Boolean flag property that specifies whether an integer field should be declared as unsigned integer. Note that some DBMS do not support unsigned integer fields. In such case the DBMS driver will ignore this property but it issues a warning.
Default: false
Unsigned integer property that specifies the length of a text. If this property is specified the text field may not have a longer length. Text values shorter than the length are not padded. If this property is not specified the length limit is determined by the DBMS.
Default: none
Free text property meant for describing the purpose of the field. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
Additional field comments. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
<field>
<name>id</name>
<type>integer</type>
<notnull>1</notnull>
<default>0</default>
</field>
index is another type of table definition object. It should also be specified within the table declaration section for each field that the table should contain.
The index schema object may have the following properties:
Name of the index.
Only some DBMS support indexes on large object fields. For this reason it is not allowed to specify a large object field for indexes.
Default: none
Name that the index had before the last time it was renamed. This property is only meant to be used when the database is being upgraded and the field is meant to be renamed.
Default: index name
Boolean flag property that specifies whether the combination of the values of the index fields on each row of the table should unique.
Default: false
Boolean flag property that specifies whether the combination of the values of the index fields on each row of the table should be a primary key.
Default: false
field is a section that is part of the table index declaration. It should be used once per each field on which the index should be created.
The index field declaration may have the following properties:
Name of the field on which the index should be created.
Default: none
Type of field sorting that should be assumed when the index is created. On DBMS that support index sorting, queries that search tables on the fields specified by the given index may execute faster if the specified sorting type is chosen adequately. DBMS that do not support index sorting will ignore this property.
Valid sorting types are ascending and descending.
Default: ascending
The length attribute is only supported by the MySQL and MySQLi drivers.
Expects an integer.
Default: none
<index>
<name>users_index</name>
<field>
<name>id</name>
<sorting>ascending</sorting>
</field>
</index>
foreign was added in Schema version 0.7.3. [description here]
The foreign schema object may have the following properties:
Name of the foreign key.
Default: none
Name that the foreign key had before the last time it was renamed. This property is only meant to be used when the database is being upgraded and the field is meant to be renamed.
Default: foreign key name
Allow the litereal string values: full, partial or simple.
Default: none
Allow the litereal string values: cascade, set null, set default, restrict or no action.
Default: none
Allow the litereal string values: cascade, set null, set default, restrict or no action.
Default: none
Boolean flag property.
Default: false
Boolean flag property.
Default: false
foreign field should be used once per each field on which the foreign key should be created. Allow a string value indicating a field name.
references is a section that is part of the table foreign key declaration.
This section is optional. If it is absent, it is assumed that the referenced key is the primary key of referenced table.
The foreign references declaration may have the following properties:
Name of the table being referenced.
Default: none
Name of the field being referenced. It should be used once per each field on which the foreign key should reference.
The number and types of fields in the referenced key must match those of the foreign key.
Default: none
<foreign>
<name>my_foreign_key</name>
<field>costumer_id</field>
<field>country</field>
<references>
<table>costumers</table>
<field>id</field>
<field>country</field>
</references>
<ondelete>cascade</ondelete>
<onupdate>cascade</onupdate>
</foreign>
initialization is another section that is part of the table definition. This section is optional and may contain the definition of actions that should be executed when installing the database for the first time. update and delete was added in Schema version 0.7.0.
The available actions, in the execution order, are:
insert is a table initialization command that specifies the values of the fields of rows that should be inserted in the respective table after the database is created for the first time.
The insert initialization object do not have any property.
field is a section that is part of the table insert initialization. It should be used once per each field which should be filled with data.
It may have only the property name.
One, and only one, of the following objects should be used to specify the initial data:
NULL value. Anything inside null will be ignored.
Plain text data.
Take data from the specified column of the respective table.
Take the output of the specified DBMS function.
Recursive structure capable of mixing all of the above objects.
When doing an insert missing fields are implicitly initialized by the DBMS with the respective default values.
<insert>
<field>
<name>foo</name>
<value>1601</value>
</field>
<field>
<name>creation</name>
<function>
<name>NOW</name>
</function>
</field>
</insert>
select was added in Schema version 0.7.3 to acomplish the INSERT...SELECT syntax. It is optional and, when used, should be the only child of insert.
It may have only the property table that designates the table name in the select statement.
The insert select field object is defined exactly the same way as insert field. Here name goes to the fields portion of the insert statement and the next property goes to the fields portion of the select statement
The insert select where object, that is not required, must contain only one expression object.
In that case the expression object usually links operants by logical operators like AND, OR and others.
update is a table initialization command that replaces the values of the fields of rows, matched by a condition, in the respective table after the database is created for the first time.
The update initialization object do not have any property.
The update field object is defined exactly the same way as insert field
The update where object, that is not required, must contain only one expression object.
In that case the expression object usually links operants by logical operators like AND, OR and others.
<update>
<field>
<name>id</name>
<expression>
<column>id</column>
<operator>MINUS</operator>
<value>10</value>
</expression>
</field>
</update>
delete is a table initialization command that removes rows, matched by a condition, in the respective table after the database is created for the first time.
The delete command definition only contains one more where section used to select the rows to be deleted.
The delete initialization object do not have any property.
The delete where object must contain only one expression object.
In that case the expression object usually links operants by logical operators like AND, OR and others.
<delete>
<where>
<expression>
<expression>
<column>id</column>
<operator>GREATER THAN</operator>
<value>10</value>
</expression>
<operator>AND</operator>
<expression>
<column>id</column>
<operator>NOT EQUAL</operator>
<value>15</value>
</expression>
</expression>
</where>
</delete>
The function object will bind to a DBMS function, that can be called with any number of arguments.
It may have only the property name.
Any of the following objects should be used, how many times it is necessary, to provide the arguments:
Plain text data.
Take data from the specified column of the respective table.
You can pass a new function as an argument, although this isn't supported in the current release yet.
Recursive structure capable of mixing all of the above objects.
The expression object is a recursive structure used to relate pairs of value, column, function and expression objects.
The expression object do not have any property and has the exact form operant operator operant.
The expression operant isn't a object itself but may be represented by one of the following objects:
Plain text data.
Take data from the specified column of the respective table.
Take the output of the specified DBMS function.
Expressions are recursive, although this isn't supported in the current release yet.
Operators aren't implemented in a portable way yet. Currently the following descriptive operators are always binded to the same operators:
Operator | Binding |
PLUS | + |
MINUS | - |
TIMES | * |
DIVIDED | / |
EQUAL | = |
NOT EQUAL | != |
LESS THAN | < |
GREATER THAN | > |
LESS THAN OR EQUAL | <= |
GREATER THAN OR EQUAL | >= |
Any other value given to operator will be parsed with no translation.
Future versions of Schema may provide a common set of descriptive operators that are translated to the respective operator for your DBMS
The sequence is another main database schema object. It may be used in a schema description multiple times, once per each sequence that is contained the database being described.
A sequence is an object that only contains an integer value. A sequence value is automatically incremented after every time it is fetch the next sequence value. The fetch and increment actions constitute an atomic operation. This means that if two concurrent accesses to the database fetch the next sequence value, they will always get different values.
Sequences are useful to generate integer numbers that may be used to store in fields that must contain unique numbers.
The sequence object definition may contain one on section besides the properties. The sequence schema object may have the following properties:
Name of the sequence.
Default: none
Name that the sequence had before the last time it was renamed. This property is only meant to be used when the database is being upgraded and the sequence is meant to be renamed.
Default: sequence name
Integer property that specifies the value that the sequence will return when it is fetched the next sequence value.
Default:1
Free text property meant for describing the purpose of the sequence. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
Additional sequence comments. This field is not processed by the schema parser. It may contain text or other XML well formed tags that may be useful for embedding additional data structures for documentation applications or other purposes.
Default: none
The sequence on table field section specifies a table field on which the sequence value should be synchronized.
This information is meant to be used only by the database manager class when it is added a sequence later after the database has been installed for the first time. If the sequence on is specified, the database manager class will override the sequence start value with a value that is higher than the highest value in the specified field table. Therefore, the specified field table type must be integer.
<sequence>
<name>users_sequence</name>
<start>1</start>
<on>
<field>id</field>
<table>users</table>
</on>
</sequence>
The variable tag is not meant to define any database schema object but rather a means to replace property values with variables defined at parsing time.
For instance, if you have several tables that store user names in text fields with the same length limit, instead of hard coding the length value in all field definitions, you may use a variable reference. The parser will replace variable references found in property definitions by the text values of the respective values passed to the parser.
The parser will issue an error if it find a reference for a variable that is not defined.
<database>
<name/>
<create/>?
<overwrite/>?
<charset/>?
<description/>?
<comments/>?
<table>
<name/>
<was/>?
<description/>?
<comments/>?
<declaration>
<field>
<name/>
<was/>?
<type/>
<fixed/>?
<default/>?
<notnull/>?
<autoincrement/>?
<unsigned/>?
<length/>?
<description/>?
<comments/>?
<name/>
<was/>?
<unique/>?
<primary/>?
<field>
<name/>
<sorting/>?
<length/>?
<name/>
<was/>?
<match/>?
<ondelete/>?
<onupdate/>?
<deferrable/>?
<initiallydeferred/>?
<field/>+
<references>
<table/>
<field/>*
<insert>
{field}+ or {select}
{field}+
<where>
{expression}
<where>
{expression}
<name/>
<was/>?
<start/>?
<description/>?
<comments/>?
<on>
<table/>
<field/>
</database>
<field>
<name/>
<null/> or <value/> or <column/> or {function} or {expression}
</field>
<function>
<name/>
( <null/> or <value/> or <column/> or {function} or {expression} )+
</function>
<expression>
<null/> or <value/> or <column/> or {function} or {expression}
<operator/>
<null/> or <value/> or <column/> or {function} or {expression}
</expression>
<select>
<table/>
{field}+
<where>
{expression}
</select>