ALTO (analyzed layout and text object) stores layout information and OCR recognized text of pages of any kind of printed documents like books, journals and newspapers. ALTO is a standardized XML format to store layout and content information. It is designed to be used as an extension schema to METS (Metadata Encoding and Transmission Standard), where METS provides metadata and structural information while ALTO contains content and physical information. Describes general settings of the alto file like measurement units and metadata All measurement values inside the alto file are related to this unit, except the font size. Coordinates as being used in HPOS and VPOS are absolute coordinates referring to the upper-left corner of a page. The upper left corner of the page is defined as coordinate (0/0). values meaning: mm10: 1/10th of millimeter inch1200: 1/1200th of inch pixel: 1 pixel The values for pixel will be related to the resolution of the image based on which the layout is described. Incase the original image is not known the scaling factor can be calculated based on total width and height of the image and the according information of the PAGE element. Styles define properties of layout elements. A style defined in a parent element is used as default style for all related children elements. A text style defines font properties of text. A paragraph style defines formatting properties of text blocks. Indicates the alignement of the paragraph. Could be left, right, center or justify. Left indent of the paragraph in relation to the column. Right indent of the paragraph in relation to the column. Line spacing between two lines of the paragraph. Measurement calculated from baseline to baseline. Indent of the first line of the paragraph if this is different from the other lines. A negative value indicates an indent to the left, a positive value indicates an indent to the right. Tag define properties of additional characteristic. The tags are referenced from related content element on Block or String element by attribute TAGREF via the tag ID. This container element contains the individual elements for LayoutTags, StructureTags, RoleTags, NamedEntityTags and OtherTags The root layout element. One page of a book or journal. The area between the top line of print and the upper edge of the leaf. It may contain page number or running title. The area between the printspace and the left border of a page. May contain margin notes. The area between the printspace and the right border of a page. May contain margin notes. The area between the bottom line of letterpress or writing and the bottom edge of the leaf. It may contain a page number, a signature number or a catch word. Rectangle covering the printed area of a page. Page number and running title are not part of the print space. Any user-defined class like title page. The number of the page within the document. The page number that is printed on the page. Gives brief information about original page quality Gives more details about the original page quality, since QUALITY attribute gives only brief and restrictive information Position of the page. Could be lefthanded, righthanded, cover, foldout or single if it has no special position. A link to the processing description that has been used for this page. Estimated percentage of OCR Accuracy in range from 0 to 100 Page Confidence: Confidence level of the ocr for this page. A value between 0 (unsure) and 1 (sure). Group of available block types A block of text. A picture or image. A graphic used to separate blocks. Usually a line or rectangle. A block that consists of other blocks Base type for any kind of block on the page. Tells the rotation of the block e.g. text or illustration. The value is in degree counterclockwise. The next block in reading sequence on the page. Correction Status. Indicates whether manual correction has been done or not. The correction status should be recorded at the highest level possible (Block, TextLine, String). A sequence of chars. Strings are separated by white spaces or hyphenation chars. Any alternative for the word. Identifies the purpose of the alternative. Type of the substitution (if any). Content of the substiution. Word Confidence: Confidence level of the ocr for this string. A value between 0 (unsure) and 1 (sure). Confidence level of each character in that string. A list of numbers, one number between 0 (sure) and 9 (unsure) for each character. Correction Status. Indicates whether manual correction has been done or not. The correction status should be recorded at the highest level possible (Block, TextLine, String). Attribute to record language of the string. The language should be recorded at the highest level possible. A region on a page A list of points Describes the bounding shape of a block, if it is not rectangular. A polygon shape. An ellipse shape. HPOS and VPOS describe the center of the ellipse. HLENGTH and VLENGTH are the width and height of the described ellipse. A circle shape. HPOS and VPOS describe the center of the circle. Formatting attributes. Note that these attributes are assumed to be inherited from ancestor elements of the document hierarchy. The font name. The font size, in points (1/72 of an inch). Font color as RGB value Serif or Sans-Serif fixed or proportional Information to identify the image file from which the OCR text was created. A unique identifier for the image file. This is drawn from MIX. This identifier must be unique within the local system. To facilitate file sharing or interoperability with other systems, fileIdentifierLocation may be added to designate the system or application where the identifier is unique. A location qualifier, i.e., a namespace. Information on how the text was created, including preprocessing, OCR processing, and postprocessing steps. Where possible, this draws from MIX's change history. A processing step. Date or DateTime the image was processed. Identifies the organizationlevel producer(s) of the processed image. An ordinal listing of the image processing steps performed. For example, "image despeckling." A description of any setting of the processing application. For example, for a multi-engine OCR application this might include the engines which were used. Ideally, this description should be adequate so that someone else using the same application can produce identical results. Information about a software application. Where applicable, the preferred method for determining this information is by selecting Help --> About. The name of the organization or company that created the application. The name of the application. The version of the application. A description of any important characteristics of the application, especially for non-commercial applications. For example, if a non-commercial application is built using commercial components, e.g., an OCR engine SDK. Those components should be mentioned here. List of any combination of font styles A block that consists of other blocks A user defined string to identify the type of composed block (e.g. table, advertisement, ...) An ID to link to an image which contains only the composed block. The ID and the file link is defined in the related METS file. A picture or image. A user defined string to identify the type of illustration like photo, map, drawing, chart, ... A link to an image which contains only the illustration. A graphic used to separate blocks. Usually a line or rectangle. A block of text. A single line of text. A white space. A hyphenation char. Can appear only at the end of a line. Attribute to record language of the textline. Correction Status. Indicates whether manual correction has been done or not. The correction status should be recorded at the highest level possible (Block, TextLine, String). Attribute deprecated. LANG should be used instead. Attribute to record language of the textblock. There are following variation of tag types available: LayoutTag – criteria about arrangement or graphical appearance StructureTag – criteria about grouping or formation RoleTag – criteria about function or mission NamedEntityTag – criteria about assignment of terms to their relationship / meaning (NER) OtherTag – criteria about any other characteristic not listed above, the TYPE attribute is intended to be used for classification within those. The xml data wrapper element XmlData is used to contain XML encoded metadata. The content of an XmlData element can be in any namespace or in no namespace. As permitted by the XML Schema Standard, the processContents attribute value for the metadata in an XmlData is set to “lax”. Therefore, if the source schema and its location are identified by means of an XML schemaLocation attribute, then an XML processor will validate the elements for which it can find declarations. If a source schema is not identified, or cannot be found at the specified schemaLocation, then an XML validator will check for well-formedness, but otherwise skip over the elements appearing in the XmlData element. Type can be used to classify and group the information within each tag element type. Content / information value of the tag. Description text for tag information for clarification. Any URI for authority or description relevant information.