Tuesday, November 10, 2009

W3C XML Schema

-

-

Schema languages available


    1. RELAX NG
    2. Schematron
    3. Hook
    4. Examplotron
    5. W3C XML Schema Language

-

-

-

Differences between DTD and W3C XML Schema

-

DTD -    

        1. No support strong data typing.
        2. No specific element occurrence constraints can be defined.
        3. The only one defined by XML specification.
        4. Non-XML based.
        6. No explicit support for namespaces.
        7. Non-Extensible. (however Parameter Entities provides a bit of extensibility)
        8. DTDs are optional in XML.
        9. Document Type Declaration (DOCTYPE) links a DTD to XML document.

       10. The mechanism for associating an XML document with a schema varies according to the schema language. 
       11. The association may be achieved via markup within the XML document itself, or via some external means.

-

-

-

Schema Usage

-

        Validation


                Validation can be considered a "firewall" against the diversity of XML.

                By validating documents against schemas, you can ensure that the documents'
                contents conform to your expected set of rules, simplifying the code needed to process them.

                XML is a good foundation for pipelines of transformations using widely available tools.
                Since each of these transformations introduces a risk of error, and each error is easier
                to fix when detected near its source, it is good practice to introduce check points
                in the pipeline where the documents are validated.

-

        Documentation

- 
                XML schemas are frequently used to document XML vocabularies, even when validation isn't a requirement.

                The machine-readability of schemas gives them several advantages as documentation.
                Human-readable documentation can be generated from the schema's formal description.
                Schema IDEs, for instance, provide graphical views that help to understand the structure of the documents.

-

        Querying Support

- 

                The first versions of XPath and XSLT were defined to work without any explicit understanding of the structure
                of the documents being manipulated. This has worked well, but has imposed performance and functionality limits

                The second version of XPath and XSLT and the first version of XQuery  rely on the availability of a W3C XML
                Schema for those features.

                Remember XQuery and the typeswitch where is it can get the information based on the xsi:type of the xml element.

-

        Data Binding

- 

                Although it isn't especially difficult to write applications that process XML documents using the
                SAX, DOM, and similar APIs, it is a low-level task, both repetitive and error-prone.

                1. Runtime binding tools do their best to perform a binding based on the structure of the documents and applications discovered by introspection,
                2. Design time binding tools rely on a model formalized in a schema of some kind.

-

        Guided Editing

- 

                The W3C is creating a standard API that can be used by guided editing applications to ask a schema processor
                which action can be performed at a certain location in a document—for instance:
                "Can I insert this new element here?",
                "Can I update this text node to this value?", etc.
                The Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification
                defines "Abstract Schemas" generic enough to cover both DTDs and W3C XML Schema (and potentially other
                schema languages as well). When finalized and widely adopted, this API should allow you to plug the schema
                processor of your choice into any editing application.

-

        Replacing scripts from end-user Forms (XForms)

                XForms the replacement of HTML forms uses XML Schema for strong typing. No need for scripting language.

-

-

-

W3C XML Schema Language

-

    An XML Schema consists of components such as type definitions and element declarations.

-

    Namespaces

XSLT:………………..http://www.w3.org/1999/XSL/Transform
XHTML:……………..http://www.w3.org/1999/xhtml
XML Signature:….http://www.w3.org/2000/09/xmldsig#"
XML Schema:….http://www.w3.org/2001/XMLSchema
XForms:…………….http://www.w3.org/2002/xforms

-

Top Element:    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>

Top Attributes:   

            targetNamespace

-

            attributeFormDefault (default=unqualified)

-
                When "qualified" Attribute must be "qualified" with a namespace, since attributes
                don't take the default element namespace, a namespace prefix will be required.

                Attributes must be qualified, either because they are declared globally or
                because the attributeFormDefault attribute is set to qualified.

                In fact, attributes that are required to be qualified must be explicitly prefixed.

                Controlling qualification on a declaration by declaration basis
         

                    It is also possible to control qualification on a declaration by declaration basis using the form attribute.
                    For example, to require that the locally declared attribute publicKey is qualified in instances,
                    we declare it in the following way:

-

                    Example

-
                    Requiring Qualification of Single Attribute
                    <schema xmlns="http://www.w3.org/2001/XMLSchema"
                        xmlns:po="http://www.example.com/PO1"
                        targetNamespace="http://www.example.com/PO1"
                        elementFormDefault="qualified"
                        attributeFormDefault="unqualified">
                      <!-- etc. -->
                      <element name="secure">
                        <complexType>
                          <sequence>
                              <!-- element declarations -->
                          </sequence>
                          <attribute name="publicKey" type="base64Binary" form="qualified"/>
                        </complexType>
                      </element>
                    </schema>

-

                    Example

-
                    Instance with a Qualified Attribute

                    <?xml version="1.0"?>
                    <purchaseOrder xmlns="http://www.example.com/PO1"
                               xmlns:po="http://www.example.com/PO1"
                               orderDate="1999-10-20">
                      <!-- etc. -->
                      <secure po:publicKey="GpM7">
                        <!-- etc. -->
                      </secure>
                    </purchaseOrder>

-

                    Notice that the value of the form attribute overrides the value of the attributeFormDefault
                    attribute for the publicKey attribute only.

-

            elementFormDefault (default=unqualified)

-

                If it is set to "qualified", nested elements must belong to the target namespace
                of the schema either through a default namespace declaration or an explicit prefix.   
                ******************************************************************************************

                 *  The form attribute can be applied to an element declaration in the same manner as in case of attribute declaration    *


                *******************************************************************************************

 

-

Associating an Schema with an XML document

    The XML Schema Definition language (XSD) defined four attributes for use in XML instance documents.

    XML Instance Document Refers a Schema Using

        An XML instance document can have both.

            xsi:schemaLocation        " { namespace schemaLocation.xsd }* "
            xsi:noNamespaceSchemaLocation
            xsi:type
            xsi:nil

    According to the World Wide Web Consortium (W3C) XML Schema Recommendation,
    XML instance documents can have both xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes specified.

   Example

-
        XSD

-

            <?xml version="1.0"?>
            <xsd:schema    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                                 targetNamespace="http://jethwani.com/clan"
                                 xmlns:ab="http://jethwani.com/clan">

                <xsd:element name="Name" type="ab:NameType"/>   
                <xsd:complexType name="NameType">
                    <xsd:sequence>
                        <xsd:element name="FirstName" type="xsd:string" form="qualified"/>
                        <xsd:element name="LastName" type="xsd:string"/>
                        <xsd:element name="Age" type="xsd:duration"/>
                    </xsd:sequence>
                </xsd:complexType>
            </xsd:schema>

-

        XML

-

            <?xml version="1.0"?>
            <ab:Name    xmlns:ab="http://jethwani.com/clan"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    xsi:schemaLocation="http://jethwani.com/clan Name.xsd">
                <ab:FirstName>Bipin</ab:FirstName>
                <LastName>Jethwani</LastName>
                <Age>P28Y7M27DT13H44M12345.67890S</Age>
            </ab:Name>

-

Annotation

-

    Most of the schema elements may contain an optional xs:annotation element as their first child element.

        xs:annotation
            (may contain any combination of)
            xs:documentation    (human readable)      (xml:lang is allowed)
            xs:appinfo               (machine readable)    (also used to extend the schema functionality..eg. using Schematron constructs in appinfo)

            Encoding context sensitive help text (tool tip)
            -----------------------------------------------

            <xs:element name="fullName" type="xs:string">
                <xs:annotation>
                    <xs:appinfo>
                        <helpText>Enter person's full name.<helpText>
                    </xs:appinfo>
                </xs:annotation>
            </xs:element>

-

-

Types

-
 

        W3C XML Schema derives many of these predefined datatypes
        from a smaller set of "primitive" datatypes that have a specific meaning and semantic and cannot be derived from other types.

-

        Lexical and Value Spaces

-
                    W3C XML Schema introduced a decoupling between the data, as it can be read from
            the instance documents (the "lexical space"), and the value, as interpreted according to the datatype (the "value space").

            Each datatype has its own lexical and value spaces and its own rules to associate a lexical representation with a value;
            for many datatypes, a single value can have multiple lexical representations
            (for instance, the <xs:float> value "3.14116" can also be written equivalently
            as "03.14116," "3.141160," or ".314116E1").

            This distinction is important since the basic operations performed on the values
            (such as equality testing or sorting) are done on the value space. "3.14116" is considered to be equal to
             "03.14116" when the type is xs:float and is different when the type is xs:string)

anySimpleType 

   
            duration

                The lexical representation for duration is the [ISO 8601] extended format PnYnMnDTnHnMnS,
                where nY represents the number of years, nM the number of months, nD the number of days, 'T' is the
                date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds.
                The number of seconds can include decimal digits to arbitrary precision.

                The values of the Year, Month, Day, Hour and Minutes components are not restricted but allow an arbitrary integer.
                Similarly, the value of the Seconds component allows an arbitrary decimal.

                An optional preceding minus sign ('-') is allowed, to indicate a negative duration.
                If the sign is omitted a positive duration is indicated.

                For example, to indicate a duration of 1 year, 2 months, 3 days, 10 hours, and 30 minutes,
                one would write: P1Y2M3DT10H30M. One could also indicate a duration of minus 120 days as: -P120D.

                Reduced precision and truncated representations of this format are allowed provided they conform to the following:

             If the number of years, months, days, hours, minutes, or seconds in any expression equals zero,
                the number and its corresponding designator ·may· be omitted.


             However, at least one number and its designator ·must· be present.


             The seconds part ·may· have a decimal fraction.


             The designator 'T' ·shall· be absent if all of the time items are absent.


              The designator 'P' ·must· always be present

-
                .
                For example,
                    Allowed

-
                        P1347Y
                        P1347M
                        P1Y2MT2H
                        P0Y1347M
                        P0Y1347M0D
                        -P1347M

-
                    Not Allowed

-    
                        P-1347M
                        P1Y2MT

-

                Valid values for xs:duration include:

-
                    PT1004199059S
                    PT130S
                    PT2M10
                    SP1DT2S
                    -P1Y
                    P1Y2M3DT5H20M30.123S

-
                The following values are invalid:

-
                    1Y (the leading P is missing)
                    P1S (the T separator is missing)
                    P-1Y (all parts must be positive)
                    P1M2Y (the parts order is significant and Y must precede M)
                    P1Y-1M (all parts must be positive)

-

               dateTime    (CCYY-MM-DDThh:mm:ss)

                Description:

                        The xs:dateTime datatype defines a "specific instant of time."
                        This is a subset of what ISO 8601 calls a "moment of time."
                        All the fields must be present and may optionally be preceded by a sign and leading figures,
                        if needed, and followed by fractional digits for the seconds and a time zone.
                        The time zone may be specified using the letter
                        "Z," which identifies UTC, or by the difference of time with UTC.

                        Valid values for xs:dateTime include:

                            2001-10-26T21:32:52
                            2001-10-26T21:32:52+02:00
                            2001-10-26T19:32:52Z
                            2001-10-26T19:32:52+00:00
                            -2001-10-26T21:32:52
                            2001-10-26T21:32:52.12679

                        The following values are invalid:

                            2001-10-26 (all the parts must be specified)
                            2001-10-26T21:32 (all the parts must be specified)
                            2001-10-26T25:32:52+02:00 (the hours part (25) is out of range)
                            01-10-26T21:32 (all the parts must be specified)
                        In the valid examples given above, three of them have identical value spaces:
                            2001-10-26T21:32:52+02:00
                            2001-10-26T19:32:52Z
                            2001-10-26T19:32:52+00:00

                        The first one (2001-10-26T21:32:52), which doesn't include a time zone specification,
                        is considered to have an indeterminate value between 2001-10-26T21:32:52-14:00 and 2001-10-26T21:32:52+14:00.
                        With the usage of summer saving time, this range is subject to national regulations and may change.
                        The range was between -13:00 and +12:00 when the Recommendation was published,
                        but the Working Group has kept a margin to accommodate possible changes in the regulations.

 

 

-             

              decimal    (. dot notation)
               

                    The following values are invalid:
                        1 234 (spaces are forbidden)
                        1. (the decimal separator is forbidden)
                        +1,234 (delimiters between thousands are forbidden).

-

               double    (e or E notation)
                   

                        xs:float and xs:double are both primitive datatypes and represent IEEE simple
                        (32 bits) and double (64 bits) precision floating-point types.
                        These store the values in the form of mantissa and an exponent of a power of 2 (m x 2^e).

                        These datatypes accept several "special" values:
                            positive zero (0),
                            negative zero (-0) (which is greater than positive 0 but less than any negative value),
                            infinity (INF) (which is greater than any value),
                            negative infinity (-INF) (which is less than any float, and
                            "not a number" (NaN).

                            Valid values for xs:float and xs:double include:

                                123.456
                                +1234.456
                                -1.2344e56
                                -.45E-6
                                INF
                                -INF
                                NaN
                            The following values are invalid:

                                1234.4E 56 (spaces are forbidden)
                                1E+2.5 (the power of 10 must be an integer)
                                +INF (positive infinity doesn't expect a sign)
                                NAN (capitalization matters in special values)

-

                integer
                    - nonNegativeInteger
                    - nonPositiveInteger
                        Description:
                            ...., -2, -1, 0

                    -long
                        Description:
                            64-bit word.

                        - int

                            Description:
                                32 bit word.

                            - short
                                Description:
                                    16 bit word.

                                - byte
                                    Description:

                                        8 bit word
                                - unsignedByte)
                                    Description:

                                        8 bit word

 

 

-

-

-

-

Simple Types

-           

            Attributes are always of simple types.
            Elements of simple types cannot contain attribute or nested elements.
            We derive a simple type by restricting an existing simple type.
            The legal range of values for new type are subset of existing type's range values.
            Derivation by extension is reserved for complex types and has no equivalent for simple types.

           

                       
            

            <xsd:simpleType>    element is used to define and name a new simple type.

-

            <xsd:list>                List of atomic value, above IDREFS, NMTOKENS, ENTITIES
                                         We cannot create a list type from an existing list type
                itemType=""

-

            <xsd:union>
                memberTypes=""

-

           <xsd:restriction>      element to indicate the existing base type
                                         and to identify  the "facets" that constrain the range values.

                                            <xsd:enumeration> value should be unique
                                            <xsd:minInclusive>
                                            <xsd:maxInclusive>
                                            <xsd:pattern>

            Example 1:
            ----------

-
            <xsd:simpleType name="myInteger">
                <xsd:restriction base="xsd:integer">
                    <xsd:minInclusive value="10000"/>
                    <xsd:maxInclusive value="99999"/>
                </xsd:restriction>
            </xsd:simpleType>

-

            Example 2:
            ----------

-
            <xsd:simpleType name="listOfMyIntType">
                <xsd:list itemType="myInteger"/>
            </xsd:simpleType>

            <?xml version="1.0"?>
            <listOfMyInt>20003 15037 95977 95945</llstOfMyInt>

-

            Example 3:
            ----------
            <xsd:simpleType name="SKU">
                <xsd:restriction base="xsd:string">
                    <xsd:pattern value="\d{3}-[A-Z]{2}"/>
                </xsd:restriction>
            </xsd:simpleType>

-