Tuesday, November 10, 2009

W3C XML Schema

-

-

Schema languages available


    1. RELAX NG
    2. Schematron
    3. Hook
    4. Examplotron
    5. W3C XML Schema Language

-

-

-

Differences between DTD and W3C XML Schema

-

DTD -    

        1. No support strong data typing.
        2. No specific element occurrence constraints can be defined.
        3. The only one defined by XML specification.
        4. Non-XML based.
        6. No explicit support for namespaces.
        7. Non-Extensible. (however Parameter Entities provides a bit of extensibility)
        8. DTDs are optional in XML.
        9. Document Type Declaration (DOCTYPE) links a DTD to XML document.

       10. The mechanism for associating an XML document with a schema varies according to the schema language. 
       11. The association may be achieved via markup within the XML document itself, or via some external means.

-

-

-

Schema Usage

-

        Validation


                Validation can be considered a "firewall" against the diversity of XML.

                By validating documents against schemas, you can ensure that the documents'
                contents conform to your expected set of rules, simplifying the code needed to process them.

                XML is a good foundation for pipelines of transformations using widely available tools.
                Since each of these transformations introduces a risk of error, and each error is easier
                to fix when detected near its source, it is good practice to introduce check points
                in the pipeline where the documents are validated.

-

        Documentation

- 
                XML schemas are frequently used to document XML vocabularies, even when validation isn't a requirement.

                The machine-readability of schemas gives them several advantages as documentation.
                Human-readable documentation can be generated from the schema's formal description.
                Schema IDEs, for instance, provide graphical views that help to understand the structure of the documents.

-

        Querying Support

- 

                The first versions of XPath and XSLT were defined to work without any explicit understanding of the structure
                of the documents being manipulated. This has worked well, but has imposed performance and functionality limits

                The second version of XPath and XSLT and the first version of XQuery  rely on the availability of a W3C XML
                Schema for those features.

                Remember XQuery and the typeswitch where is it can get the information based on the xsi:type of the xml element.

-

        Data Binding

- 

                Although it isn't especially difficult to write applications that process XML documents using the
                SAX, DOM, and similar APIs, it is a low-level task, both repetitive and error-prone.

                1. Runtime binding tools do their best to perform a binding based on the structure of the documents and applications discovered by introspection,
                2. Design time binding tools rely on a model formalized in a schema of some kind.

-

        Guided Editing

- 

                The W3C is creating a standard API that can be used by guided editing applications to ask a schema processor
                which action can be performed at a certain location in a document—for instance:
                "Can I insert this new element here?",
                "Can I update this text node to this value?", etc.
                The Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification
                defines "Abstract Schemas" generic enough to cover both DTDs and W3C XML Schema (and potentially other
                schema languages as well). When finalized and widely adopted, this API should allow you to plug the schema
                processor of your choice into any editing application.

-

        Replacing scripts from end-user Forms (XForms)

                XForms the replacement of HTML forms uses XML Schema for strong typing. No need for scripting language.

-

-

-

W3C XML Schema Language

-

    An XML Schema consists of components such as type definitions and element declarations.

-

    Namespaces

XSLT:………………..http://www.w3.org/1999/XSL/Transform
XHTML:……………..http://www.w3.org/1999/xhtml
XML Signature:….http://www.w3.org/2000/09/xmldsig#"
XML Schema:….http://www.w3.org/2001/XMLSchema
XForms:…………….http://www.w3.org/2002/xforms

-

Top Element:    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>

Top Attributes:   

            targetNamespace

-

            attributeFormDefault (default=unqualified)

-
                When "qualified" Attribute must be "qualified" with a namespace, since attributes
                don't take the default element namespace, a namespace prefix will be required.

                Attributes must be qualified, either because they are declared globally or
                because the attributeFormDefault attribute is set to qualified.

                In fact, attributes that are required to be qualified must be explicitly prefixed.

                Controlling qualification on a declaration by declaration basis
         

                    It is also possible to control qualification on a declaration by declaration basis using the form attribute.
                    For example, to require that the locally declared attribute publicKey is qualified in instances,
                    we declare it in the following way:

-

                    Example

-
                    Requiring Qualification of Single Attribute
                    <schema xmlns="http://www.w3.org/2001/XMLSchema"
                        xmlns:po="http://www.example.com/PO1"
                        targetNamespace="http://www.example.com/PO1"
                        elementFormDefault="qualified"
                        attributeFormDefault="unqualified">
                      <!-- etc. -->
                      <element name="secure">
                        <complexType>
                          <sequence>
                              <!-- element declarations -->
                          </sequence>
                          <attribute name="publicKey" type="base64Binary" form="qualified"/>
                        </complexType>
                      </element>
                    </schema>

-

                    Example

-
                    Instance with a Qualified Attribute

                    <?xml version="1.0"?>
                    <purchaseOrder xmlns="http://www.example.com/PO1"
                               xmlns:po="http://www.example.com/PO1"
                               orderDate="1999-10-20">
                      <!-- etc. -->
                      <secure po:publicKey="GpM7">
                        <!-- etc. -->
                      </secure>
                    </purchaseOrder>

-

                    Notice that the value of the form attribute overrides the value of the attributeFormDefault
                    attribute for the publicKey attribute only.

-

            elementFormDefault (default=unqualified)

-

                If it is set to "qualified", nested elements must belong to the target namespace
                of the schema either through a default namespace declaration or an explicit prefix.   
                ******************************************************************************************

                 *  The form attribute can be applied to an element declaration in the same manner as in case of attribute declaration    *


                *******************************************************************************************

 

-

Associating an Schema with an XML document

    The XML Schema Definition language (XSD) defined four attributes for use in XML instance documents.

    XML Instance Document Refers a Schema Using

        An XML instance document can have both.

            xsi:schemaLocation        " { namespace schemaLocation.xsd }* "
            xsi:noNamespaceSchemaLocation
            xsi:type
            xsi:nil

    According to the World Wide Web Consortium (W3C) XML Schema Recommendation,
    XML instance documents can have both xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes specified.

   Example

-
        XSD

-

            <?xml version="1.0"?>
            <xsd:schema    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                                 targetNamespace="http://jethwani.com/clan"
                                 xmlns:ab="http://jethwani.com/clan">

                <xsd:element name="Name" type="ab:NameType"/>   
                <xsd:complexType name="NameType">
                    <xsd:sequence>
                        <xsd:element name="FirstName" type="xsd:string" form="qualified"/>
                        <xsd:element name="LastName" type="xsd:string"/>
                        <xsd:element name="Age" type="xsd:duration"/>
                    </xsd:sequence>
                </xsd:complexType>
            </xsd:schema>

-

        XML

-

            <?xml version="1.0"?>
            <ab:Name    xmlns:ab="http://jethwani.com/clan"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    xsi:schemaLocation="http://jethwani.com/clan Name.xsd">
                <ab:FirstName>Bipin</ab:FirstName>
                <LastName>Jethwani</LastName>
                <Age>P28Y7M27DT13H44M12345.67890S</Age>
            </ab:Name>

-

Annotation

-

    Most of the schema elements may contain an optional xs:annotation element as their first child element.

        xs:annotation
            (may contain any combination of)
            xs:documentation    (human readable)      (xml:lang is allowed)
            xs:appinfo               (machine readable)    (also used to extend the schema functionality..eg. using Schematron constructs in appinfo)

            Encoding context sensitive help text (tool tip)
            -----------------------------------------------

            <xs:element name="fullName" type="xs:string">
                <xs:annotation>
                    <xs:appinfo>
                        <helpText>Enter person's full name.<helpText>
                    </xs:appinfo>
                </xs:annotation>
            </xs:element>

-

-

Types

-
 

        W3C XML Schema derives many of these predefined datatypes
        from a smaller set of "primitive" datatypes that have a specific meaning and semantic and cannot be derived from other types.

-

        Lexical and Value Spaces

-
                    W3C XML Schema introduced a decoupling between the data, as it can be read from
            the instance documents (the "lexical space"), and the value, as interpreted according to the datatype (the "value space").

            Each datatype has its own lexical and value spaces and its own rules to associate a lexical representation with a value;
            for many datatypes, a single value can have multiple lexical representations
            (for instance, the <xs:float> value "3.14116" can also be written equivalently
            as "03.14116," "3.141160," or ".314116E1").

            This distinction is important since the basic operations performed on the values
            (such as equality testing or sorting) are done on the value space. "3.14116" is considered to be equal to
             "03.14116" when the type is xs:float and is different when the type is xs:string)

anySimpleType 

   
            duration

                The lexical representation for duration is the [ISO 8601] extended format PnYnMnDTnHnMnS,
                where nY represents the number of years, nM the number of months, nD the number of days, 'T' is the
                date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds.
                The number of seconds can include decimal digits to arbitrary precision.

                The values of the Year, Month, Day, Hour and Minutes components are not restricted but allow an arbitrary integer.
                Similarly, the value of the Seconds component allows an arbitrary decimal.

                An optional preceding minus sign ('-') is allowed, to indicate a negative duration.
                If the sign is omitted a positive duration is indicated.

                For example, to indicate a duration of 1 year, 2 months, 3 days, 10 hours, and 30 minutes,
                one would write: P1Y2M3DT10H30M. One could also indicate a duration of minus 120 days as: -P120D.

                Reduced precision and truncated representations of this format are allowed provided they conform to the following:

             If the number of years, months, days, hours, minutes, or seconds in any expression equals zero,
                the number and its corresponding designator ·may· be omitted.


             However, at least one number and its designator ·must· be present.


             The seconds part ·may· have a decimal fraction.


             The designator 'T' ·shall· be absent if all of the time items are absent.


              The designator 'P' ·must· always be present

-
                .
                For example,
                    Allowed

-
                        P1347Y
                        P1347M
                        P1Y2MT2H
                        P0Y1347M
                        P0Y1347M0D
                        -P1347M

-
                    Not Allowed

-    
                        P-1347M
                        P1Y2MT

-

                Valid values for xs:duration include:

-
                    PT1004199059S
                    PT130S
                    PT2M10
                    SP1DT2S
                    -P1Y
                    P1Y2M3DT5H20M30.123S

-
                The following values are invalid:

-
                    1Y (the leading P is missing)
                    P1S (the T separator is missing)
                    P-1Y (all parts must be positive)
                    P1M2Y (the parts order is significant and Y must precede M)
                    P1Y-1M (all parts must be positive)

-

               dateTime    (CCYY-MM-DDThh:mm:ss)

                Description:

                        The xs:dateTime datatype defines a "specific instant of time."
                        This is a subset of what ISO 8601 calls a "moment of time."
                        All the fields must be present and may optionally be preceded by a sign and leading figures,
                        if needed, and followed by fractional digits for the seconds and a time zone.
                        The time zone may be specified using the letter
                        "Z," which identifies UTC, or by the difference of time with UTC.

                        Valid values for xs:dateTime include:

                            2001-10-26T21:32:52
                            2001-10-26T21:32:52+02:00
                            2001-10-26T19:32:52Z
                            2001-10-26T19:32:52+00:00
                            -2001-10-26T21:32:52
                            2001-10-26T21:32:52.12679

                        The following values are invalid:

                            2001-10-26 (all the parts must be specified)
                            2001-10-26T21:32 (all the parts must be specified)
                            2001-10-26T25:32:52+02:00 (the hours part (25) is out of range)
                            01-10-26T21:32 (all the parts must be specified)
                        In the valid examples given above, three of them have identical value spaces:
                            2001-10-26T21:32:52+02:00
                            2001-10-26T19:32:52Z
                            2001-10-26T19:32:52+00:00

                        The first one (2001-10-26T21:32:52), which doesn't include a time zone specification,
                        is considered to have an indeterminate value between 2001-10-26T21:32:52-14:00 and 2001-10-26T21:32:52+14:00.
                        With the usage of summer saving time, this range is subject to national regulations and may change.
                        The range was between -13:00 and +12:00 when the Recommendation was published,
                        but the Working Group has kept a margin to accommodate possible changes in the regulations.

 

 

-             

              decimal    (. dot notation)
               

                    The following values are invalid:
                        1 234 (spaces are forbidden)
                        1. (the decimal separator is forbidden)
                        +1,234 (delimiters between thousands are forbidden).

-

               double    (e or E notation)
                   

                        xs:float and xs:double are both primitive datatypes and represent IEEE simple
                        (32 bits) and double (64 bits) precision floating-point types.
                        These store the values in the form of mantissa and an exponent of a power of 2 (m x 2^e).

                        These datatypes accept several "special" values:
                            positive zero (0),
                            negative zero (-0) (which is greater than positive 0 but less than any negative value),
                            infinity (INF) (which is greater than any value),
                            negative infinity (-INF) (which is less than any float, and
                            "not a number" (NaN).

                            Valid values for xs:float and xs:double include:

                                123.456
                                +1234.456
                                -1.2344e56
                                -.45E-6
                                INF
                                -INF
                                NaN
                            The following values are invalid:

                                1234.4E 56 (spaces are forbidden)
                                1E+2.5 (the power of 10 must be an integer)
                                +INF (positive infinity doesn't expect a sign)
                                NAN (capitalization matters in special values)

-

                integer
                    - nonNegativeInteger
                    - nonPositiveInteger
                        Description:
                            ...., -2, -1, 0

                    -long
                        Description:
                            64-bit word.

                        - int

                            Description:
                                32 bit word.

                            - short
                                Description:
                                    16 bit word.

                                - byte
                                    Description:

                                        8 bit word
                                - unsignedByte)
                                    Description:

                                        8 bit word

 

 

-

-

-

-

Simple Types

-           

            Attributes are always of simple types.
            Elements of simple types cannot contain attribute or nested elements.
            We derive a simple type by restricting an existing simple type.
            The legal range of values for new type are subset of existing type's range values.
            Derivation by extension is reserved for complex types and has no equivalent for simple types.

           

                       
            

            <xsd:simpleType>    element is used to define and name a new simple type.

-

            <xsd:list>                List of atomic value, above IDREFS, NMTOKENS, ENTITIES
                                         We cannot create a list type from an existing list type
                itemType=""

-

            <xsd:union>
                memberTypes=""

-

           <xsd:restriction>      element to indicate the existing base type
                                         and to identify  the "facets" that constrain the range values.

                                            <xsd:enumeration> value should be unique
                                            <xsd:minInclusive>
                                            <xsd:maxInclusive>
                                            <xsd:pattern>

            Example 1:
            ----------

-
            <xsd:simpleType name="myInteger">
                <xsd:restriction base="xsd:integer">
                    <xsd:minInclusive value="10000"/>
                    <xsd:maxInclusive value="99999"/>
                </xsd:restriction>
            </xsd:simpleType>

-

            Example 2:
            ----------

-
            <xsd:simpleType name="listOfMyIntType">
                <xsd:list itemType="myInteger"/>
            </xsd:simpleType>

            <?xml version="1.0"?>
            <listOfMyInt>20003 15037 95977 95945</llstOfMyInt>

-

            Example 3:
            ----------
            <xsd:simpleType name="SKU">
                <xsd:restriction base="xsd:string">
                    <xsd:pattern value="\d{3}-[A-Z]{2}"/>
                </xsd:restriction>
            </xsd:simpleType>

-

Wednesday, October 21, 2009

Free XML Application Testing Tools

Following tools will help to test the sample XML applications.

 

Kernow 1.6

 

(http://kernowforsaxon.sourceforge.net/)

    XML Schema

                          Testing schemas is fun

    XQuery

                          Helped me a lot while I was creating XQuerys for Oracle OSB 10gR3 project for messaging between SAIP

                         (SOAP based client) and Oracle Beehive (RESTFul based service provider).
                          However for XML Schema based programs  I had to switch back to JDeveloper. As it ask to
                         upgrade inbuilt SAXON to validating SAXON parser.

    XSLT

                        Running XSLT program against a XML in sandbox mode
                        is real fun. Supports Saxon extensions with EXSLT
                        extensions.

Mozilla based RESTTest

 

                          Quick and easy way to test RESTFul based web service providers.
                          Truly amazing implementation on apache HTTPClient, I suppose.

 

Araxis Merge

http://www.araxis.com/merge/Download.html

                     Merge is the visual file comparison (diff), merging and folder synchronization application from Araxis.
                     Use it to compare and merge source code, web pages, XML and other text files with native application performance.
                     Directly open and compare the text from Microsoft Office (Word and Excel), OpenDocument, PDF and RTF files.
                     Compare images and binary files.
                     Synchronize folders.
                     Perform code reviews and audits.
                     Work with folder hierarchies containing thousands of files.
                     Merge integrates with many SCM (version control) systems and other applications.

XML

   -                                        

<?xml version=""  encoding=""  standalone="" ?>

 

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).

XML (Extensible Markup Language) is a general-purpose specification for creating custom markup languages or say XML applications.(XML is in itself an application of SGML optimized for automated parsing).

 

Usage and alternatives


            Sharing structured data especially via the Internet,
            Encode documents,
            Serialize data    (

-
                        JSON

-  
                                
JavaScript Object Notation- Text based, lightweight, human 
                                  readable computer data interchange format.
                                  application/json, .json filename.
                                  The JSON format is often used for serialization and transmitting 
                                  structured data over a network connection. 
                                  Its main application is in Ajax web application programming,
                                  where it serves as an alternative to the XML format.
                                  Also very much in use in RESTFull based web services. 
                                  For instance Oracle Beehive (Collaboration Suite)
                                   RESTFul services use both XML and JSON formats.

                               Although JSON was based on a subset of the JavaScript programming language
                               (specifically, Standard ECMA-262 3rd Edition—December 1999[1]).
                               It is considered to be a language-independent data format.
                               Code for parsing and generating JSON data is readily available for a large 
                               variety of programming languages.

 

                                    JSON Example

                                     {
                                           "firstName": "John",
                                           "lastName": "Smith",
                                            "address": {
                                                               "streetAddress": "21 2nd Street",
                                                                "city": "New York",
                                                                "state": "NY",
                                                                "postalCode": "10021"
                                                           },
                                           "phoneNumbers": {
                                                                    { "type": "home", "number": "212 555-1234" },
                                                                    { "type": "fax", "number": "646 555-4567" }
                                                                  }
                                     }

                            The equivalent for the above in XML:

                                    <Person firstName="John" lastName="Smith">
                                      <Address>
                                           <streetAddress>21 2nd Street</streetAddress>
                                           <city>New York</city>
                                           <state>NY</state>
                                           <postalCode>10021</postalCode>
                                       </Address>
                                      <phoneNumber type="home">
                                                                     212 555-1234
                                       </phoneNumber>
                                      <phoneNumber type="fax">
                                                                     646 555-4567
                                      </phoneNumber>
                                  </Person>

                            JSON Schema is a specification for a JSON-based format for defining the structure of JSON data.  
                            JSON Schema is based on the concepts from XML Schema, RelaxNG, and Kwalify,

                            XML being a general purpose markup language, they are syntactically more complex and bigger in file size than JSON, which, in contrast, is specifically designed for data interchange. Both lack an explicit mechanism for representing
                            large binary data types such as image data (although binary data can be serialized in either case by applying a general purpose binary-to-text encoding scheme).
                            The most used forms of binary-to-text encodings are:

                                hexadecimal
                                base64
                                quoted-printable
                                uuencoding
                                yEnc
                                Ascii85
                                BinHex
                                Percent encoding

-

                        YAML

- 
                            Both functionally and syntactically, JSON is effectively a subset of YAML.[14]
                            The common YAML library (Syck) also parses JSON. 
                       

                  S-Expressions


                        
     

XML
    Version 1.0
     Version 1.1

    encoding is optional.
            IANA values :- Internet Assigned Numbers Authority.

                FYI ..IANA also defines the media types (http://www.iana.org/assignments/mediatypes/).
                e.g.    (text/html)
                    (text/xml)        (not preferred)        (ASCII character set)
                    (application/xml)    (preferred)        (Unicode character set)
                    (image/jpeg)
                    (application/json)

            By Default Unicode character-set will be assumed and parser may use the first few bytes of the file to guess the encoding

            UTF-8 versus UTF-16
            UTF-16 requires 0xFEFF/0xFFFE byte order mark. Preceding the XML declaration. 

    standalone is optional.
            Default value is "No".
            Documents that have a DTD may have the value of "yes", if DTD doesn't change the document content.

Originally designed to meet the challenges of large-scale electronic publishing,
XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.

    [[1]]    It is classified as an extensible language, because it allows the user to define the mark-up elements.

    [[2]]     XML's purpose is to aid information systems in sharing structured data, especially via the Internet,
        to encode documents, and to serialize data; in the last context, it compares with text-based serialization
        languages such as JSON, YAML and S-Expressions.

XML is recommended by the World Wide Web Consortium (W3C).

It is a fee-free open standard.

Monday, September 7, 2009

XQuery

XQuery is a query language (with some programming language features) that is designed to query collections of XML. XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents. XQuery for XML is like SQL for databases. XQuery 1.0 is an extension of XPath2.0. It supplements XPath with a SQL-like "FLWOR expression" for performing joins. XQuery is a W3C Recommendation. XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.). The language also provides syntax allowing new XML documents to be constructed. XQuery 1.0 does not include features for updating XML documents or databases; 
It also lacks full text search capability. (    These features are both under active development for a subsequent version of the language.    )

XQuery uses "smiley faces" to begin and end comments. This cheerful notation was originally suggested by Jeni Tennison. Here is an example of a comment:

 (: Thanks, Jeni! :)

XQuery uses input functions to identify the data to be queried. There are two input functions:

doc()                  returns an entire document, identifying the document by a Universal Resource Identifier (URI). 
                          To be more precise, it returns the document node.

collection()       returns a collection, which is any sequence of nodes that is associated with a URI. 
                          This is often used to identify a database to be used in a query.

The document node does not have explicit syntax in XML, but XQuery provides an explicit document node constructor.

The query document { ... } creates an empty document node.

    document {

            <?xml-stylesheet type="text/xsl" href="c:\temp\double-slash.xslt"?>,
              <!—I love this book! —>,
            <book year="1977">
                    <title>Harold and the Purple Crayon</title>
                    <author>
                    <last>Johnson</last>
                    <first>Crockett</first>
                </author>
                    <publisher>HarperCollins Juvenile Books</publisher>
                    <price>14.95</price>
              </book>
        }