1.1 WHAT IS XML?

Extensible Markup Language (XML) is the latest buzzword on the Internet, but it is also a rapidly growing and maturing technology with real world applications, particularly for management, display, and the organization of data. It is primarily a technology concerned with the description and structuring of data.

The idea of a universal data format is not new. An early attempt to combine a universally acceptable data format with rich information storage capabilities was SGML (Standard Generalized Markup Language). The best known application of SGML is HTML (Hypertext Markup Language). The idea was that any HTML document (or web page) would be presentable in any application that was capable of understanding HTML (termed the Web Browser).

Unfortunately, SGML is such a complicated language that it is not well suited to data interchange over the Web. HTML too is limited in its scope, in that it is intended for displaying documents in a browser only. Thus to adapt SGML to provide facilities to describe some kinds of specialized information, XML was developed. Thus XML is actually a subset of SGML and fully compatible with it.

It is important to note however that XML is not really a language at all, but a standard for creating languages that meet the XML criteria. It thus describes a syntax that you would use to create your own languages. XML can be viewed in an IE5 web browser since IE5 contains a default built in style sheet that enables us to view XML documents in a web browser.

XML is very flexible. Hence it is targeted to be the basis for defining data exchange languages, especially for communication over the Internet. It makes it very easy to work with data within applications but it also makes it easy to share this information with others. The coming chapters will highlight some factors about the use of XML in real world applications, as well as the reason why it is becoming the lingua franca for database applications.

1.2 WHY EXTENSIBLE ?

Since we have full control over the creation of the XML document, we can shape the data any way we wish, so that it makes sense to our particular application. For example, instead of creating a text file to store the name- John Doe, I might create an XML file like

John

Doe

If I do not wish for this level of flexibility, i.e. different tags for each part of the name, I can also write

John Fitzgerald Byers

We are thus free to structure the same data in different applications to suit the requirements of that application. If we want to create data in a way that only a particular computer program will use, we can do so. If we want to share data with other programs we can do so.

This is where the “Extensible” in XML comes from, in the freedom to use our own tags to describe data, to make it more comprehensible. Anyone is free to mark up data in any way using the language, even if others are doing it in totally different ways.

To interchange information much more easily, if people use the same format of data, it becomes much more easier. Thus XML allows us to use various industry-standard vocabularies to describe various types of data. For example, Scalable Vector Graphics (SVG) is an XML vocabulary for describing 2-dimensional graphics, MathML is an XML vocabulary for describing mathematics as a basis for machine to machine communication, etc.

1.2.1 Hierarchies in XML

XML groups information in hierarchies. The items in the document relate to each other in parent/child and sibling/sibling relationships.

These items are called elements or individual pieces of information in the data.

1.3 PIECES OF XML

Here are some of the important technologies that make up the XML family, each specification covering different aspects of communicating information.

q XML 1.0 is the base specification upon which the XML family is built. It describes the syntax that XML documents have to follow, the rules that XML parsers follow, and anything else you need to know to write an XML document.

q DTD’s (Document Type Definitions) and Schemas provide ways to create templates for our document types.

q Namespaces provide a way to distinguish one XML vocabulary from another, which allows us to create richer documents.

q XPath describes a querying language for addressing parts of an XML document.

q CSS (Cascading style sheets) and XSL (Extensible Style sheet language) are used to format the XML documents for displaying them.

q XLink and XPointer are languages used to link the XML documents with one another, in a similar manner to HTML hyperlinks.

q DOM (Document Object Model) provides a traditional way to interface with XML documents, and SAX (Simple API for XML) is an alternative way for programmers to interface with XML documents from their code.

1.4 WHERE IT IS USED

1.4.1 Reducing Server Load

Web based applications can use XML to reduce the load on web servers. This can be done by keeping all the information on the client as long as possible and then sending the information to those servers in one big XML document.

1.4.2 Web Site Content

The W3C (world wide web consortium) uses XML to write their specifications. These XML documents can then be transformed into HTML for display or transformed into a number of other presentation formats.

Some web sites also use XML entirely for their content where traditionally HTML would have been used.

XML is the basis for metadata (information about information, a special type of data) such as Microsoft’s Channel Definition Format (CDF) for describing Web push channels or Netscape’s Meta Content Framework (MCF).

1.4.3 Remote Procedure Calls

XML is also used for Remote Procedure Calls (RPC’s) , which allow objects on one computer to call objects on another computer to do work, allowing distributed computing. Using XML and HTTP for these RPC calls allows this to occur even through a firewall, which would normally block such calls, providing greater opportunities for distributed computing.

1.4.4 E-Commerce

E-Commerce is one of those buzzwords that you hear al over the place. Companies are discovering that communicating via the Internet, instead of by more traditional methods, they can streamline their processes, decreasing costs and increasing response times. Whenever one company needs to send data to another, XML is the perfect fir for the exchange format.

XML IN THE BROWSER

2.1 EXAMPLE XML DOCUMENT

Suppose we wish to create an XML document to describe a library of books according to information contained as regards the book name, author, price and other statistics. We use XML for such a project since it allows us to create a prototype of data that can be used in other files as well.

ISBN="8763-343-2343" >

Professional JINI

Sing Li

Wrox Publications

22/10/1999

XML Programming

Sudhir Ancha

Mann Publications

Couple of things to be noticed in the XML File :

The line "" is called the XML prolog. The XML version number should be mentioned at the start of every XML file. The rest of the line is optional. It tells the server about the type of character encoding our text is in, the style sheet files used, if any, and so on.

In the above XML file, after the XML Prolog

we have added one more line called

Here DOCTYPE Library indicates that all the Tags inside this XML file will be under the Tag "Library". Which means "Library" will be the parent or root of all other Tags in this XML file. Each XML file can have only one DOCTYPE.

Also in the XML File we have added comments for Book1 using the Following syntax

The Element called "Book" has both Attributes and More Tags under it. For Example in the above XML file, for the Book Element, ISBN is attribute and Title, Author and Publisher are sub Tags under the Book Element. If the Tags and Elements need to be added compulsorily or not in the XML file along with the Element is defined by DTD (Document Type Definition) file. For Example in the above XML file, For Book Element, ISBN might be compulsory if the search Based on ISBN is supported. And Date Published Tag may not be necessary at all times if there's no search facility based on get the Most Recent Books. I will explaining how to create DTD's after next few sections.

We have declared a Empty Tag for , under Second Book. This statement is equivalent to writing . This feature could save your XML file size if there is no Data required between the Tags.

The above XML document may be called well formed XML.

Well formed XML is the XML document thatmeets certain grammatical rules outlined in the XML 1.0 specification. There are a certain set of rules to be adhered to by the tags for the well formedness of XML documents. They are listed in the forthcoming section.

2.2 RULES FOR ELEMENTS

XML documents should adhere to the following rules to be well formed.

q Every start tag must have a matching end tag.

q Tags cannot overlap.

q XML documents can have only one root element.

q Element names must obey XML naming conventions.

q XML is case sensitive with respect to tags.

q XML will keep white space in your text.

2.3 XML PARSERS

The main reason for creating all these rules about the well formed-ness of XML documents is so that we can create a computer program to read in the data and easily tell markup from information. An XML processor is more commonly called a parser, since it simply parses XML and provides the application with any information it needs.

There are quite a number of XML parsers available. Some of them include Microsoft Internet Explorer Parser, James Clark’s Expat, Vivid Creations ActiveDOM, and popular among Java users, JavaSofts's XML Parser and IBM's Xerces Parser.

2.4 FORMATTING XML DOCUMENTS

Trying to view an XML document in the IE5 browser will certainly result in some kind of output due to the inbuilt default style sheet language in IE5. But this output will not be conforming to how you wanted it, in terms of color, foreground, design, formatting and general style. This is why we have style sheet languages like Cascading StyleSheet Language (CSS) and the default style sheet language for XML developed by the W3C called the XSL (Extensible Stylesheet language).

The need for formatting arises from the fact that in XML user defined tags rule the roost. Thus in our earlier book example, there is nothing built into the browser that will recognize that Book title, author, ISBN identification number et al, will appear in different columns, in different fonts or colors, or even on different lines. What we would see would be an unending line of words separated by whitespaces at the appropriate places.

Thus we develop the Content/Presentation paradigm with the use of stylesheet languages, which essentially embodies the idea that you separate the data from the way that data is displayed.

2.4.1 CASCADING STYLE SHEETS

It is a styling tool that can be used with XML as well as HTML documents. It provides us with the facility of being able to style individual tags the way we want; for example, the book name in bold, with a font size of 15 in blue color, the ISBN in red, with a font size of 18, and italicized print, and things like that.

Since my aim in this seminar is to show the importance of XML in data storage applications, I will not be delving into the details of how we can make XML documents look attractive in a browser environment. Nevertheless, an example is presented below.

Consider the following example of an XML document which is used to display the news of the week on a website:

The Weekly News

Bush Warns Terrorists

By our correspondent

Beijing, China.

Saturday February 23 2002 9:43 IST

The President of the United States, George W. Bush, today issued a statement, in which he came down hard on the abductors, and supposedly the killers, of the kidnapped Washington Post correspondent Daniel Pearl.

A grim faced Bush told reporters that the agents of terror operating in Asia would not get away with what they had done to the American journalist.

Although the abductors have not released Pearls's body, American sources said they have received evidence of Pearl's murder, in the form of a video tape showing him being stabbed.

The news of Pearl's death comes as no surprise after the arrested Pakistani militant Omar Sheikh, said that Pearl had been murdered by his abductors.However it has been received by the world community with a mixture of shock, sadness and outrage. Pearl is survived by his reporter wife Marianne, who is pregnant with their first child.

As can be seen in the above document, we have a well formed XML document. But it will appear on the website as a drab line by line account that will leave visitors with a bad taste for reasons other than the news it is displaying. But this problem can be offset by the efficient use of a cascaded style sheet document that will attach style to each of the tags. All that needs to be done is linking the CSS file with the above XML document with the use of the line:

first.css will be the cascaded style sheet document that can be typed on a simple notepad and then be saved as a .css file. For example, the CSS file for the above document can be written as follows:

/*File name-first.css*/

hedline {

display: block;

width: 400px;

border-bottom: 5px double black;

text-align: right;

font-family: Times, serif;

font-size: 36pt;

background-image: url("c:\news.bmp");

}

byline {

display: inline;

width: 200px;

text-align: left;

color: black;

font-family: Times, serif;

font-size: 14pt;

}

dateline {

display: inline;

width: 200px;

text-align: right;

color: black;

font-family: Times, serif;

font-size: 11pt;

font-style: italic;

}

p {

display: block;

width: 400px;

color: black;

font-family: Times, serif;

font-size: 12pt;

}

As can be seen in the above example, CSS files specify the tag of the corresponding XML document (in the same case) and also specify various attributes like font size, font family, color, background color, background image, width of the place on the webpage occupied by the tag etc, followed by the value of that particular attribute. For example, the statement:

background-image: url("c:\news.bmp");

in the “hedline” tag, styles the document in such a way that the headline of the news item is presented on the background of the news.bmp image.

Other attributes like alignment, indentation, margins and padding, position of the content with respect to the browser (static, relative, absolute and fixed), height and width of the output and tables can also be specified similarly.

2.4.2 EXTENSIBLE STYLE SHEET LANGUAGE

It is a language that can transform XML documents into any text based format, XML or otherwise. It is also used to create style sheets, similar to CSS. You can define the layout of the output document, and where to get data from within the output document.

XSLT style sheets are built on structures called templates, which specify what to look for in the source tree, and what to put into the result tree.

XSLT is especially important in the area of E-Commerce. For instance, consider two companies that wish to communicate their data. “A” is a store and “B” fulfils A’s orders.

Then three scenarios are possible- A can use the same structure for its data as B uses, B can use the same data structure as A’s, or they can use whatever XML format they wish to use internally, but transform their data to a common format whenever they wish to communicate the information outside.

With XSLT, this kind of transformation becomes quite easy.

Consider the following example of a database consisting of student records. The XML document looks like this:

Shaivya Easwaren

6, Gagangiri Villa, Vidyasagar Colony, Gultekdi, Pune

115

020-4272832

shaivya_e@yahoo.com

Ranjana Rao

7/A, Pleasant Park,Bhairoba Nala,Hadapsar, Pune

126

020-6871580

rrao@chequemail.com

Namita Sane

4,Center Court, Prabhat Road, Deccan Gymkhana, Pune

132

020-5673275

namita_s@hotmail.com

Krushna Bagade

532/2, Adinath Society, Vithoba Chowk , Kothrud, Pune

104

020-5436119

krushna_b@yahoo.com

The above information needs to be displayed in tabular format, which is not possible if we simply open the file in the Internet Explorer 5.0 browser. Here is where XSLT comes to our rescue.

The statement in the afore mentioned XML document:

tells the browser that the type of style sheet used for styling the file is a .XSL file and to look for the linked xsl file that contains the formatting information of the XML document.

The XSL file will look something like this:

Student table

Student Records

Name	Address	Roll no	Tel No.	E-mail ID

In the above .XSL file the first statement is necessary. It defines the W3C standard used in the XSL document. Templates are the heart and soul of XSLT. Style sheets are simply a collection of these templates, which are applied to the input document to get the output document. Style sheets may have as many templates as are needed.

The section of the source tree to which the template applies is specified by the match attribute. In this case match=”/” indicates the template is matched against the document root.

Special XSLT elements indicate to the processor that it should do some work. In this case, an element called is in effect, a mini template applying to any XML element in the source tree matching its select attribute. The element called is used to put the value of the XML element in the result tree.

When viewed in the browser, an XSL file appears in the form of a hierarchical tree, much like the XML document it is intended to style.

XSLT can also be used in conjunction with CSS. They provide a complementary functionality. XSLT can help you structure your pages in a wide number of formats. CSS can then balance this with easily modifiable media representations for those browsers that support it. XSLT’s primary domain is to provide transformation (i.e. programming) services to XML, while CSS takes the results of such transformations and makes XML into multimedia.

XML INTERFACES

3.1 DOCUMENT OBJECT MODEL AND XML-DOM

The Document Object Model (DOM) provides a means of working with XML documents and other types of documents through the use of code, and a way to interface with that code in the programs that we write. For instance DOM enables us to create documents and parts of documents, navigate through the document, move, copy and remove parts of the document, add or modify attributes.

Working with an object model makes working with information easier. An XML document in fact is structured very much like an object model, as seen in chapter 1. it is hierarchical, with nodes potentially having other nodes as children.

The DOM can model any XML document regardless of how it is structured. It is usually added as a layer between the XML parser and the application that needs information from the document, meaning that the parser reads the data from the XML document and then feeds the data into the DOM. The DOM is then used by a higher level application. The application can do whatever it desires with such information, including putting it into another proprietary object model if desired.

The DOM does not really deal with objects that much. It mainly works with interfaces. An interface is, by definition, a contract to support certain properties and methods, which can be applied to an object. Different programming languages may or may not use the term interface, or have a specific mechanism for providing interfaces, but the same concept can be applied to any language.

3.2 SIMPLE API FOR XML (SAX)

The Simple API for XML or SAX was developed in order to enable more efficient analysis of large XML documents. The problem with DOM is that before you can use it to traverse a document, it has to build a massive in-memory map of it. This takes up space and time, and is inefficient if you wish to recover small amounts of information.

If we want to locate only specific parts of the document, a second approach is more appropriate. The way that SAX works, and that is EVENT-DRIVEN. Rather than parse the document into the DOM and then use the DOM to navigate around the document, we tell the parser to raise events whenever it finds something.

Known SAX interfaces such as DocumentHandler can be used to “catch” events passed to us by the parser. We can use this to extract some simple information from the XML document. It is also possible to implement error handling by making the DocumentHandler throw SAXExceptions whenever an error is detected in the parsing.

Sophisticated intelligent parsing allows us to report errors and throw exceptions as they are found. Error handling mechanisms in the parser can be supplemented by using the Locator object.

Thus SAX is an excellent API for analyzing and extracting information from large XML documents without incurring the time and space overheads associated with the DOM. The latest version of SAX is the SAX 2.0.

3.3 NAMESPACES

Namespaces are the means by which we can differentiate elements and sometimes attributes of different XML document types from each other when combining them together into other documents or even when processing multiple documents simultaneously.

Because of the nature of XML, it is possible for any individual to create XML document types which describe the world in their own terms. If company A feels that element to denote a person’s name when XHTML already has a <title> element, which is used to describe the title of an HTML document? Further, how can one distinguish these from the <title> of a book? The answer is- namespaces. A traditional namespace is a set of zero or more names, each of which must be unique within the namespace and constructed according to the rules (if any) of the namespace. For example, the names of element types in an XML document inhabit a traditional namespace, as do the names of tables in a relational database and the names of class variables in a Java class. Traditional namespaces also occur outside the field of computer science -- for example, the names of people could be thought to inhabit a traditional namespace, as could the names of species. Different traditional namespaces are disjoint, i.e. they are not related. Because of this, a name in one traditional namespace does not collide with the same name in a different traditional namespace. This property is useful to applications that have multiple sets of names. By assigning each set of names to a different traditional namespace, they can allow the same name to occur in each set of names without fear of collision. For example, in the following XML document, there is no conflict between the three different uses of the name Value. <pre style="line-height: 150%;"> <auctionitem><o:p></o:p></pre><pre style="line-height: 150%;"> <title value="486Laptop"><o:p></o:p></pre><pre style="line-height: 150%;"> <category value="Computers"><o:p></o:p></pre><pre style="line-height: 150%;"> <value>$100</value><o:p></o:p></pre><pre style="line-height: 150%;"> </auctionitem><o:p></o:p></pre><pre><o:p> </o:p></pre> This is because an XML document has one traditional namespace for element type names and, for each element type, one traditional namespace for the names of the attributes that apply to that element type. Thus, the two Value attribute names don't conflict because each is assigned to a different traditional namespace -- the first to the attribute namespace for the Title element type and the second to the attribute namespace for the Category element type. Furthermore, neither of the Value attribute names conflicts with the Value element type name because element type names are kept in a traditional namespace that is separate from the attribute namespaces. The XML namespaces recommendation does not define anything except a two-part naming system for element names and attributes. As an example of how XML namespaces are used to resolve naming conflicts in XML documents that contain element types and attributes from multiple XML languages, consider the following two XML documents: <pre style="line-height: 150%;"> <?xml version="1.0" ?><o:p></o:p></pre><pre style="line-height: 150%;"> <address><o:p></o:p></pre><pre style="line-height: 150%;"> <street>Wilhelminenstr. 7</street><o:p></o:p></pre><pre style="line-height: 150%;"> <city><st1:city st="on"><st1:place st="on">Darmstadt</st1:place></st1:City></city><o:p></o:p></pre><pre style="line-height: 150%;"> <state>Hessen</state><o:p></o:p></pre><pre style="line-height: 150%;"> <country><st1:country-region st="on"><st1:place st="on">Germany</st1:place></st1:country-region></country><o:p></o:p></pre><pre style="line-height: 150%;"> <postalcode>D-64285</postalcode><o:p></o:p></pre><pre style="line-height: 150%;"> </address><o:p></o:p></pre> and: <pre style="line-height: 150%;"> <?xml version="1.0" ?><o:p></o:p></pre><pre style="line-height: 150%;"> <server><o:p></o:p></pre><pre style="line-height: 150%;"> <name>OurWebServer</name><o:p></o:p></pre><pre style="line-height: 150%;"> <address>123.45.67.8</address><o:p></o:p></pre><pre style="line-height: 150%;"> </server><o:p></o:p></pre> Each document uses a different XML language and each language defines an Address element type. Each of these Address element types is different -- that is, each has a different content model, a different meaning, and is interpreted by an application in a different way. This is not a problem as long as these element types exist only in separate documents. But what if they are combined in the same document, such as a list of departments, their addresses, and their Web servers? How does an application know which Address element type it is processing? The answer is to assign each language (including its Address element type) to a different namespace. This allows us to continue using the Address name in each language, but to distinguish between the two different element types. By assigning each Address name to an XML namespace, we actually change the name to a two-part name consisting of the name of the XML namespace plus the name Address. This means that any code that recognizes just the name Address will need to be changed to recognize the new two-part name. However, this only needs to be done once, as the two-part name is universally unique. The name of the XML namespace is a URI. This allows XML namespaces to provide a two-part naming system for element types and attributes. The first part of the name is the URI used to identify the XML namespace -- the namespace name. The second part is the element type or attribute name itself -- the local part, also known as the local name. Together, they form the universal name. For example: <pre style="line-height: 150%;"> <department><o:p></o:p></pre><pre style="line-height: 150%;"> <name>DVS1</name><o:p></o:p></pre><pre style="line-height: 150%;"> <addr:addressxmlns:addr="http://www.tudarmstadt.de/to/addresses"><o:p></o:p></pre><pre style="line-height: 150%;"> <addr:street>Wilhelminenstr. 7</addr:Street><o:p></o:p></pre><pre style="line-height: 150%;"> <addr:city><st1:city st="on"><st1:place st="on">Darmstadt</st1:place></st1:City></addr:City><o:p></o:p></pre><pre style="line-height: 150%;"> <addr:state>Hessen</addr:State><o:p></o:p></pre><pre style="line-height: 150%;"> <addr:country><st1:country-region st="on"><st1:place st="on">Germany</st1:place></st1:country-region></addr:Country><o:p></o:p></pre><pre style="line-height: 150%;"> <addr:postalcode>D-64285</addr:PostalCode><o:p></o:p></pre><pre style="line-height: 150%;"> </addr:Address><o:p></o:p></pre><pre style="line-height: 150%;"> <serv:server serv="http://www.tu-darmstadt.de/ito/servers"><o:p></o:p></pre><pre style="line-height: 150%;"> <serv:name>OurWebServer</serv:Name><o:p></o:p></pre><pre style="line-height: 150%;"> <serv:address>123.45.67.8</serv:Address><o:p></o:p></pre><pre style="line-height: 150%;"> </serv:Server><o:p></o:p></pre><pre style="line-height: 150%;"> </department><o:p></o:p></pre> Thus, each universal name is unique, meeting the requirement that each element type in an XML document have a unique name. Thus we have seen the functions of various XML interfaces and the modularity and reusability each of them incorporate into the language. XML DATA <o:p> </o:p>4.1 INTRODUCTION <o:p></o:p> XML Documents follow a tree structure. A tree is a natural structure that is richer than a simple flat list yet also respectful of cognitive and data processing requirements for economy and simplicity. Valid XML documents belong to classes- document types- that determine the tree structure and other properties of their member documents. The properties of the classes themselves comprise the document type definitions or DTD’s which serve the same role for documents that schemas do for databases. XML-Data is a notation in the form of an XML document that is both an alternative to markup declarations for writing DTD’s and a means of augmenting DTD’s with additional capabilities. For instance, q XML-data supports rich data types, allowing for tighter validation of data and reduced application effort. q Through the namespaces facility, XML-Data improves expressiveness, ensuring the existence of uniquely qualified names. q XML-Data provides for greater and more efficient semantic capabilities by incorporating the concept of inheritance, enabling one schema to be based on another. For instance, a bookstore purchase order schema could be based on a general- purpose E-Commerce schema. Other benefits of the XML-Data, which uses XML instance syntax, include § The same tools that are used to parse XML can be used to parse the XML-Data notation. § As the syntax is very similar to HTML, it is easy for HTML authors to write and read. § It is easily extensible. Schemas define the characteristics of classes of objects. Syntactic schemas are used for classes that are strictly syntactic, like XML. Conceptual schemas are used for classes that indicate concepts or relations among concepts, like RDBMS. Schemas are composed of declarations for Þ Element- indicates the containment of a single element type (property). Þ Empty, Any, String and Mixed content- the names are self-explanatory. Mixed content is a mixture of parsed character data and one or more elements. Þ Group- a set of sequence of elements. Þ Constraints and additional properties- like min and max constraints, domain and range constraints etc. <o:p> </o:p> 4.2 XML-SPECIFIC ELEMENTS<o:p></o:p> 1) ATTRIBUTES The XML syntax allows that certain properties can be expressed in a form called attributes. An attribute may be given a default value. For example: 2) ENTITY DECLARATION ELEMENT TYPES Entities are a shorthand mechanism similar to macros in a programming language. 3) EXTERNAL DECLARATIONS ELEMENT TYPE The extDcls declaration gives a clean mechanism of importing fragments from other schema. <o:p> </o:p> 4) DATATYPES A datatype indicates that the contents of an element can be interpreted as both a string, and also, more specifically, as an object that can be interpreted more specifically as a number, date etc. The datatype indicates that the elements contents can be parsed or interpreted to yield an object more specific than a string. Some common data types, their parse types, storage types in memory etc. are given in the table on the next page. XML-Data datatypes include all the highly popular types and all the built in types of popular data base and programming languages like SQL, Visual Basic, C, C++, and Java. <o:p> </o:p> <o:p> </o:p> TABLE 4.1 : SPECIFIC DATATYPES IN XML-Data<o:p></o:p> <table class="MsoNormalTable" style="border: medium none ; margin-left: 0.25in; border-collapse: collapse;" border="1" cellpadding="0" cellspacing="0"> <tbody><tr> <td style="border: 1pt solid windowtext; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> NAME </td> <td style="border-style: solid solid solid none; border-color: windowtext windowtext windowtext -moz-use-text-color; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> PARSE TYPE </td> <td style="border-style: solid solid solid none; border-color: windowtext windowtext windowtext -moz-use-text-color; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> STORAGE TYPE </td> <td style="border-style: solid solid solid none; border-color: windowtext windowtext windowtext -moz-use-text-color; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> EXAMPLE </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> string </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Pcdata </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> String(Unicode) </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Greek letters </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> number </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> A number, with no limit on its digits, and optional sign, float and exponent </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> String </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 15, 3.14, 123.456E+10 </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> int </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> A number with optional sign, no fraction, no exponent </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 32-bit signed binary </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 1,58502, -13 </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> float </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Same as for number </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 64 bit IEEE 488 </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> .31415926E+1 <o:p> </o:p> </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> fixed .14.4 </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Same as number, less than 15 digits to left of ‘.’, 4 to the right. </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 64 bit signed binary </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 12.0044 <o:p> </o:p> </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> boolean </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> “1” or “0” </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Bit </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 0,1 (1==true) </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> char </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> String </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> 1 unicode character (16 bits) </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> <o:p> </o:p> </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> string.ansi </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> String with only ascii characters </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Unicode or single byte string </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> I am Shaivya. </td> </tr> <tr> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> bin.hex </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Hexadecimal digits representing octets </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> No specified size </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> <o:p> </o:p> </td> </tr> <tr style=""> <td style="border-style: none solid solid; border-color: -moz-use-text-color windowtext windowtext; border-width: medium 1pt 1pt; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> uri </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Universal resource identifier </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> Per W3C spec </td> <td style="border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; border-width: medium 1pt 1pt medium; padding: 0in 5.4pt; width: 2in;" valign="top" width="192"> http://www.gossamer.org/fluky </td> </tr> </tbody></table> <o:p> </o:p> Other datatypes include: - i1: an 8-bit binary number with optional sign, no fractions, no exponent. - i2: 16-bit binary. - i4: 32-bit binary. - i8: 64-bit binary. - ui1-ui8: unsigned binary. - r4: IEEE 488 4-byte float. - r8: IEEE 488 8-byte float. <o:p> </o:p> 4.3 EXAMPLE XML SCHEMA <o:p></o:p> The Schema element type:<o:p></o:p> All schema declarations are contained within a schema element type like this: <?XML version=’1.0’?><o:p></o:p> <?xml:namespace<o:p></o:p> name=”urn:uuid:BDC6E-11d1-00AA00CC14822/”<o:p></o:p> as = “s”/?><o:p></o:p> <s:schema id="’ExampleSchema’"><o:p></o:p> <!—schema goes here-- ><o:p></o:p> </s:schema><o:p></o:p> The heart of the XML-Data schema is the elementType declaration which defines a class of objects (or “type of element” in XML terminology). The id attribute serves a dual role of identifying the definition and also naming the specific class. <elementtype id="”author”/"><o:p></o:p> <description> The description subelement may be used to provide a human readable description of the element’s purpose. In this case it is the person who wrote the book. </description><o:p></o:p> </elementtype><o:p></o:p> Consider the following schema that describes the book object. The element may be required or optional and may occur multiple times, as indicated by the occurs attribute that may have the values “REQUIRED”, “OPTIONAL” , “ZEROORMORE” or “ONEORMORE”. It has a default of required. <elementtype id="”Book”"><o:p></o:p> <element type ="”#title”" occurs="”OPTIONAL”/"><o:p></o:p> <element type="”#author”" occurs="”ONEORMORE”/"><o:p></o:p> <attribute name="”copyright”/"><o:p></o:p> </elementtype> describes an instance such as <book copyright="”1922”"><o:p></o:p> <title>Hitchhiker’s Guide to the Galaxy should contain a certain set of information, and B feels it should contain a different set of information, they can both go ahead and create different document types to describe that information. However, personalized XML vocabularies are bound to create a problem sometime, due to the limits imposed by the scope of human vocabulary. How can one define a

Douglas Adams

Here each instance of “book” may contain a title and must contain one or more authors.

Consider another example of the schema to make the idea clearer.

name=”urn:uuid:BDC6E-11d1-00AA00CC14822/”

as = “s”/?>

Consider an instance of the above schema, that is, the information regarding a single book:

Henry Ford

Samuel Crowley

The Unofficial Guide to Intergalactic Travel <titlepart>A Spoof on Martians and Everything Otherworldly</titlepart>

Then the schema defines an instance of Book to have an optional title, and one or more authors. The name element has a content model of any, meaning that free text is not allowed, but any arrangement of subtitlements is valid. The content model of title is mixed, allowing a free intermixture of characters and any number of titleParts. The author, name and titleParts elements have a content model of string.

Mapping between schemas

Syntactic schemas often have fewer elements compared to explicitly conceptual ones. It is also easier to design a schema that merely covers syntax than a well thought out conceptual data model. An effect of this is that many practical schemas will not contain all the elements that a conceptual schema would, either for reasons of economy or because the schema was simply syntactic. But it is useful to make the implicit explicit, or more general over time, so that more generic processors can make use of the data.

Thus we can add mapping information to the syntactic schema using the statement. It will tell us how to interpolate the implied elements thereby creating a conceptual or RDBMS data model.

Thus schemas are an alternative way to constrain the nature and structure of data items in XML as well as the relationships among those data items. They also provide several advantages over DTD’s .

XML AND DATABASES

5.1 USING XML IN AN N-TIER APPLICATION

The N-tier architecture typically has the following logical layers-

q Data services, where all data for the application is stored (usually a database).

q Data Objects, which handle the communication between the database and Business Objects.

q Business Objects, which take care of the business logic in your application, and are responsible for communication between the presentation and data layers.

q Presentation, which is responsible for communication between the user and the business logic tier.

If XML is to be used in the above client-server environment, the presentation layer is going to be using XML for its data needs. Our business objects will also be using XML, both to communicate with the presentation tier and with each other. So we can might as well go all the way and have our data objects return XML instead of recordsets. When updating the database the Business Object could pass XML to the data objects, which would parse the XML and pull out the appropriate data to insert into the databases. This means that potentially, any time one object would communicate with another it would use XML as the common language.

5.2 RETURNING XML FROM A DATA OBJECT

We will be using Visual Basic to write the data object, ADO to connect to the database, and Microsoft’s XML parser, MSXML (Section 5.3.1), to create an XML document with the results of the query.

Dim cnnDatabaseConnection as ADODB.Connection.

Set cnnDatabaseConnection=New ADODB.Connection.

‘here we are connecting to the database….

Dim strSQL as string

StrSQL=”SELECT last_name FROM Customer WHERE account_number=1952”

We are now ready to execute our SQL. We call the execute() method that returns a recordset object as the result of the query.

Dim rsResult as ADODB.Recordset

Set rsResult=cnnDatabaseConnection.Execute(strSQL)

We then create a quick XML document and populate that object via the DOM with the values from our SQL.

Dim objXML as MSXML.DOMDocument

Set objXML=New MSXML.DOMDocument

objXML.loadXML “

The XML document looks like this:

And finally the last step is to get the value from our recordset and add it to the XML document.

objXML.selectSingleNode(“/root/lastname”).Text=rsResults(“last_name”).Value

MSXML provides a property of the Document object called xml which returns a string containing the XML document that this DOM is modeling. All the data object now has to do is to return the text from that property and we are done.

5.3 DATABASE VENDORS AND XML

With both XML and database being data-centric technologies, they are not in competition with each other, contrary to established belief. XML is best used to communicate data and a database is best used to retrieve data, which makes the two complementary rather than competitive. For this reason, database vendors, while realizing that XML will never replace the database, but become more closely integrated with it, have recognized the power and flexibility of XML. They are thus building support for XML right into their products.

5.3.1 MICROSOFT’S XML TECHNOLOGIES

Microsoft has been big on XML since the very beginning. Some of its technologies providing XML support are:

q MSXML

The Internet Explorer Browser comes bundled with the MSXML COM-based parser which provides a DOM interface. It provides validating and non-validating modes as well as support for XML namespaces. It also provides support for XSL transformations.

q Visual Basic Code generator

It can read XML Schema documents and produce Visual Basic code to match the schema. In effect, you can build the basics of an Object Model, based on an XML document type automatically.

q SQL Server

There is XML support built into SQL Server, Microsoft’s Relational Database Management System. SQL server provides the capability to perform an SQL query through an HTTP request via an ISAPI filter for Internet Information Server (Microsoft’s Web Server). Not only can you get data from the SQL Server using XML, you can also put it in using SQL Update Grams. These are XML files containing the information you want to put into the database in a certain format.

5.3.2 ORACLE’S XML TECHNOLOGIES

q XML parsers:

The first tool available from Oracle is the XML parser. Oracle provides parsers written in Java, C, C++, and PL/SQL. These parsers provide a DOM interface, a SAX interface, both validating and non-validating support, support for namespaces and fully compliant support for XSLT.

q Code Generators:

Oracle offers Java and C++ class generating applications like the Visual Basic code generator. However these generators work from DTD’s and not schemas, meaning they are fully conformant with W3C specifications.

q XML SQL Utility for Java

The XDK (XML Developer’s Kit) also provides the XML SQL Utility for Java that can generate an XML document from an SQL query, either in text form or as a DOM. It can also take in XML documents and use the information to update the database, like SQL Server 2000.

q XSQL Servlet

This servlet takes in an XML document that contains SQL queries, like the XML templates used by the SQL Server. It can optionally perform XSL transformations on the results, so the results can potentially be any type of file that can be returned from an XSLT transformation, including XML and HTML. Because it is a servlet, it can run on any web server that has a Java virtual machine and can host servlets.

APPLICATIONS OF XML

YOU CAN DO WITH XML

The potential areas of application of XML can be classified into three main ones: three tier web applications, multi-platform electronic publishing and electronic commerce or EDI.

Some of the possible areas of improvement using XML are outlined below:

The following are just a few examples of some of the exciting new technologies enabled by XML:

1) Internet Search Engines:

Imagine a search engine that understands and uses contextual information when performing a full-text search. Searching for information about the Java programming language would no longer yield links to coffee sites or the Island of Java. This is because searching for the term "Java" is narrowed down to those fields tagged as a "programming language". As a result, the speed and accuracy of the search is dramatically improved. Widespread use of XML repository technology on Web servers will play a vital role in easing the "information overload" currently suffered by Internet users. Of course all of these benefits require a sophisticated, scalable and fast repository. This repository must be able to manage the rich XML links and understand XML structure so that it indexes text based on its context and use in a document.

2) Electronic Commerce:

The long-expected rise of electronic commerce has been stymied by the difficulty encountered by consumers in finding the desired product among the myriad of vendors setting up shop on the Internet, all with different product lines, prices, on-line viewing capabilities, delivery options and so forth. So-called intelligent agents have not helped because they have an even harder time than humans in trying to make sense of the digital morass presented by HTML. With XML repository technology, on-line stores can present product information in a standard, structured format, independent of page design. Electronic commerce is obviously focused on financial transactions. Using HTML, the user must manually wade through HTML information to extract relevant data like price, tax, etc. And unlike text, numbers have no inherent context. In other words, price means something, but how do you know whether a number is associated with a price, a tax, an address or anything? XML creates this association, making human and machine interpretation a reality. XML is the catalyst that will finally unleash the explosive potential of electronic commerce. The XML-aware query facilities of the repository make it possible to retrieve relevant information directly and re-purpose it as needed it for processing by an automatic agent or a user. By reducing the time needed to locate a product, a price, or any other relevant information on the Internet, XML repositories will play an important role in making on-line shopping more efficient and enjoyable.

3) Electronic Data Interchange (EDI):

EDI (Electronic Data Interchange) works by providing a collection of standard message formats and element dictionary in a simple way for businesses to exchange data via any electronic messaging service.

XML/EDI provides a standard framework to exchange different types of data -- for example, an invoice, healthcare claim, project status -- so that the information be it in a transaction, exchanged via an Application Program Interface (API), web automation, database portal, catalog, a workflow document or message can be searched, decoded, manipulated, and displayed consistently and correctly by first implementing EDI dictionaries and extending our vocabulary via on-line repositories to include our business language, rules and objects. Thus by combining XML and EDI we create a new powerful paradigm different from XML or EDI.

4) Data Re-purposing:

By breaking documents into discrete elements, it becomes very easy for individuals to extract the truly relevant information from several sources and reassemble it into any format (e.g. web page, document, presentation, whatever). This helps to address the current information overload, because the user receives only the relevant information. In fact, the information might even be assembled by a personal agent. This ability also facilitates the acceleration of learning since it becomes much easier to assemble the "current" body of work on a particular subject, and then take it a step further, pushing the development of human knowledge ever forward.

These are just a few of the exciting technologies enabled by XML. Looking at these examples, it is easy to understand why XML is creating such excitement in the Internet community. As software developers begin to implement XML applications, however, they will have to address the need to turn these ideas into reality, while keeping up with the ever shortening development cycles characteristic of Web development. In many cases, developers will find that their prototypes work fine in the test laboratory but do not scale to address real world conditions of concurrent usage and data volume. XML's rich interlinking and hierarchical naming structure introduces a whole new set of requirements that bring solutions based on the file system of relational architectures to their knees. An XML-savvy object repository, designed to be embedded in XML applications of all types, or the operating system itself, is the only solution that provides the functionality and scalability required to drive the realization of this vision of a new generation of networked applications.

PERSPECTIVE................!!!!!!!!!!!!!!!!!!!!!!!

September 18, 2008

XML BASED SERVERS

Bush Warns Terrorists

Student Records

Name

Address

Roll no

Tel No.

E-mail ID

1 comment:

SYED's BLOG.....

SEARCH THIS BLOG