Monday, October 31, 2011

2 - XML vs HTML

Questions

  1. Have a look at the documents below. Some are XML documents, some are HTML documents, some may be neither. See if you can decide which are which.
  2. Make a list of the distinctive characteristics of an XML document, in terms of things that you can spot when looking at the code.
  3. Assume that (with suitable changes) all these documents could become XML documents. Imagine, and describe, applications that could use these documents.
Answers

This is an HTML document: 

<head>
    <title>The Edwin Smith Surgical Papyrus</title>
    <meta name="Keywords" content="Egypt, egypt, egypt travel, medical, Egyptian, egyptian, Papyrus, Edwin Smith, Surgical, literature, stories, instructions, Memphis, Nile, Cairo, Alexandria, Admonitions of Ipuwer, pharaoh,  pharaonic">
    <meta name="Description" content="The Edwin Smith Surgical Papyrus on ancient Egyptian medical treatments">
</head>
<body background="Back25.jpg" bgcolor="#FFFFFF" text="#000000" link="#808000" vlink="#008080">
    <table border="0" cellspacing="1" width="565">
        <tr>
            <td>
                <center>
                    <!-- START ADCYCLE IFRAME RICH MEDIA CACHE-BUST CODE for Top of Member Pages -->

                    <script language="javascript"><!--
                        var id = 305; var jar = new Date(); var s = jar.getSeconds(); var m = jar.getMinutes();
                        var flash = s * m + id; var cgi = 'http://ads.touregypt.net/cgi-bin/adcycle';
                        var p = '<iframe src="' + cgi + '/adcycle.cgi?gid=1&t=_top&id=' + flash + '&type=iframe" ';
                        p += 'height=60 width=468 border=0 marginwidth=0 marginheight=0 hspace=0 ';
                        p += 'vspace=0 frameborder=0 scrolling=no>';
                        p += '<a href="' + cgi + '/adclick.cgi?gid=1&id=' + flash + '" target="_top">';
                        p += '<img src="' + cgi + '/adcycle.cgi?gid=1&id=' + flash + '" width=468 height=60 ';
                        p += 'border=1 alt="Click to Visit"></a></iframe>'; document.write(p); // -->
                    </script>

                    <noscript>
                        <a href="http://ads.touregypt.net/cgi-bin/adcycle/adclick.cgi?gid=1&id=305" target="_top">
                            <img src="http://ads.touregypt.net/cgi-bin/adcycle/adcycle.cgi?gid=1&id=305" width="468"
                                height="60" border="1"></a></noscript>
                    <!-- END ADCYCLE IFRAME RICH MEDIA CODE -->
                </center>
            </td>
        </tr>
    </table>
    <table border="0" width="570">
        <tr>
            <td>
                <p align="center">
                    <font size="3"><b>The Edwin Smith Surgical Papyrus</b> </font>
                    <p>
                        <font size="3">The Edwin Smith Surgical Papyrus, dating from the seventeenth century
                            B.C., is one of the oldest of all known medical papyri. Its differs fundamentally
                            from the others in the following ways: </font><font face="verdana,arial,helvetica"
                                size="3">
                                <p>
                            </font>
                        <ol>
                            <li><font size="3">The seventeen columns on the recto comprise part of a surgical treatise,
                                the first thus far discovered in the ancient Orient, whether in Egypt or Asia. It
                                is therefore the oldest known surgical treatise. </font>
                                <li><font size="3">

This is an XML document:

<!DOCTYPE advertisement SYSTEM "advertisement.dtd">
<?xml-stylesheet type="text/xsl" href="ad1.xsl" ?>
<advertisement action="update">
      <id version="2">
            NYT.19980701.12345.107
      </id>
      <status value="accepted"></status>
      <expiration>
            19980731
      </expiration>
      <reference>
            Ad to sell Linda's car.
      </reference>
      <comment>
            Up sold to add Friday repeat.
      </comment>
      <contact id="contact1">
            <name>
                  John Smyth
            </name>
            <address>
                  <address_line>c/o Bat Accessories, Inc.</address_line>
                  <address_line>Hitchcock Building, 80th Floor</address_line>
                  <address_line>1313 Mockingbird Lane</address_line>
                  <city>New York</city>,
                  <state>NY</state>
                  <postal>10000-1234</postal>
                  <country>USA</country>
            </address>
            <phone>
                  19085551212
            </phone>
            <fax>
                  19085551213
            </fax>
            <email>
                  jsymth@batacc.com
            </email>
            <url>
                  http://www.batacc.com/~smyth
            </url>
      </contact>
      <source>
            <updated>
                  <timestamp>
                        19980701 12290200
                  </timestamp>
                  <userid>
                        JK1892
                  </userid>
            </updated>
            <created>
                  <timestamp>
                        19980701 12225800
                  </timestamp>
                  <userid>
                        JK1892
                  </userid>
            </created>
            <base version="1">
                  NYT.19980621.90810.98
            </base>
      </source>
      <advertiser>
            <account type="transient">
                  19085551212-1
            </account>
            <contact_ref link="contact1"></contact_ref>
            <payment>
                  <charge>
                        <charge_card brand="amex"></charge_card>
                        <charge_account>3710-111111-99995</charge_account>
                        <charge_expiration>19991231</charge_expiration>
                        <contact_ref link="contact1"></contact_ref>
                        <charge_authorization status="allowed">4561</charge_authorization>
                  </charge>
            </payment>
      </advertiser>
      <coding>
            <automotive>
                  <auto_side value="sell">sell</auto_side>
                  <auto_category value="used">used</auto_category>
                  <auto_year>1991</auto_year>
                  <auto_make>Saab</auto_make>
                  <auto_model>900 Convertible</auto_model>
                  <auto_mileage>72000</auto_mileage>
                  <auto_price>$13,900</auto_price>
                  <auto_exterior>white</auto_exterior>
                  <auto_interior>gray leather</auto_interior>
                  <auto_body value="convertible">convertible</auto_body>
                  <auto_vin>372AB918098910X</auto_vin>
            </automotive>
            <contact>
                  <name></name>
                  <phone>19085551212</phone>
            </contact>
      </coding>
      <text>
            <font size="10">
                  <center>
                        <keyword name="auto_make" punct=" ">SAAB </keyword>
                        <keyword name="auto_model" punct=" ">900SE </keyword>
                  </center>
            </font>
            <keyword name="auto_year" punct=" ">1997 </keyword>
            <keyword name="auto_exterior" punct=" ">yellow </keyword>
            <keyword name="auto_body" punct=", ">convertible, </keyword>
            <keyword name="auto_mileage" format="9'k miles'" scale="1000"
                                    punct=", ">14k miles, </keyword>
            Auto, PL, PW, AC, power leather Seats
            Showroom cond. Assume lease.
            <center>
                  Call
                  <keyword name="phone" format="T999-999-9999" punct=" ">
                        212-333-3333
                  </keyword>
            </center>
      </text>
      <publication name="nytimes">
            <pub_alias>
                  981011301
            </pub_alias>
            <pub_price>
                  $128.00
            </pub_price>
            <pub_options>
                  <claim>
                        7
                  </claim>
                  <columns>
                        1
                  </columns>
                  <forwarding collect="email">
                        Please email replies to <mailbox>T1234</mailbox>@nytimes.com
                        <rate basis="Email forwarding service charge--Full run"
                                    unit="ad">$25.00
                        </rate>
                  </forwarding>
                  <tearsheet>
                        <rate basis="Tear sheet service charge" unit="recipient">$20.00</rate>
                  </tearsheet>
                  <shading>
                        <rate basis="Shading premium" unit="standard">20%</rate>
                  </shading>
            </pub_options>
            <class>
                  3720
                  <title>Autos/Vans/Sports Utilities</title>
                  <classword>Automotive</classword>
                  <classword>For Sale</classword>
                  <classword>Used</classword>
                  <lines>
                        4
                  </lines>
                  <sortkey>
                        SAAB91900
                  </sortkey>
                  <zone>
                        M
                        <title>Full Run</title>
                  </zone>
                  <rundate>
                        19980719
                        <rate basis="Automotive, Open, Sunday NY Region"
                                    unit="line">$23.10
                        </rate>
                        <instance>
                              <edition>BASE</edition>
                              <section>12</section>
                              <page>22</page>
                              <column>9</column>
                              <offset>17.85</offset>
                        </instance>
                        <instance>
                              <edition>LI</edition>
                              <section>12</section>
                              <page>18</page>
                              <column>9</column>
                              <offset>17.85</offset>
                        </instance>
                        <instance>
                              <edition>NJ</edition>
                              <section>12</section>
                              <page>18</page>
                              <column>9</column>
                              <offset>17.85</offset>
                        </instance>
                        <instance>
                              <edition>NY/LI</edition>
                              <section>12</section>
                              <page>18</page>
                              <column>9</column>
                              <offset>17.85</offset>
                        </instance>
                        <instance>
                              <edition>WC</edition>
                              <section>12</section>
                              <page>18</page>
                              <column>9</column>
                              <offset>17.85</offset>
                        </instance>
                  </rundate>
                  <rundate>
                        19980724
                        <rate basis="Automotive, Open, Weekday NY Region--
                                    Sunday ad repeated on Friday (within 7 days)"
                                    unit="line">$8.90
                        </rate>
                        <rate basis="Automotive, Open, Weekday NY Region"
                                    unit="line" type="comparison">$15.20
                        </rate>
                        <instance>
                              <edition>METRO</edition>
                              <section>6</section>
                              <page>14</page>
                              <column>6</column>
                              <offset>5.15</offset>
                        </instance>
                  </rundate>
            </class>
      </publication>
</advertisement>


This is an HTML document:

<head>
    <title>Display</title>

    <script language="JavaScript">
        function loadFile() {
            var filename
            var selectionValue
            selectionValue = document.forms[0].selectList.selectedIndex
            filename = document.forms[0].selectList.options[selectionValue].value
            parent.rightFrame.location = filename
        }
    </script>

</head>
<body>
    <h4>
        XML File Chooser</h4>
    <p>
        Select the file you wish to see displayed in the right-hand frame.
        <form name="selectForm">
        <p>
            <select name="selectList">
                <option value="countryList/countryList.xml">Country Data
                    <option value="playerList/playerList.xml">
                Baseball Player Data
            </select>
            <p>
                <input type="BUTTON" value="Load Document" onclick="loadFile()">
        </form>
</body>

This is an XML document:



<!DOCTYPE story SYSTEM "storyxsl.dtd">
<story>
            <title>Freedom's Dream</title>
            <author>by Charles White</author>
            <copyright>Copyright 1996, 1999 by Charles White</copyright>
            <section>
                        <para>Had it been a dream, Antron Crimea's memory of the clenched fist piercing the
sky of a tumultuous, thundering crowd would have been bearable solitude. As
it was though, the reality brought him to another place, to a distance only
something like a dream could take him.</para>
                        <para>&quot;The crowd forgot everything,&quot; is how Antron described the situation to his
psychiatrist, <link id="ChesapeakeLink">
Chesapeake Alert.</link>
Antron remembered the rhythm, the pulse,
everything. After all this time the energy of the crowd still seemed to reverberate through
his head.</para>
                        <para>Chesapeake Alert was nothing but a large bulbous mass of jelly-like flesh; a
brain plopped down on an empty, expensive slice of carpet. And though he
had no legitimate locomotive capabilities of his own, he was aware of the
movements of a billion others.</para>
                        <para>Antron's hundred legs crawled around what was left of the carpet in the kind of
pace unknown to you or I. His earlier confusion had long ago been dissolved
by the righteous events of what he had seen during the course of events Billy
Freedom had ignited.</para>
                        <para>&quot;Sometimes betrayal is a necessity,&quot;said Chesapeake. &quot;Startling. And
expensive. It must be weighed carefully.&quot;</para>
            </section>
            <auto-link xml:link="simple" actuate="user" href="sec_2.xml" show="replace">click here to continue</auto-link>
</story>

This is an XML document:

<!DOCTYPE my.dtd [
    <!ELEMENT anthology - - (poem+)> <!ELEMENT poem - - (title?, stanza+)> <!ELEMENT
title - O (#PCDATA) > <!ELEMENT stanza - O (line+) > <!ELEMENT line O O (#PCDATA)
> ]>
<my.dtd>
<anthology>
         <poem><title>The SICK ROSE
         <stanza>
              <line>O Rose thou art sick.
              <line>The invisible worm,
              <line>That flies in the night
              <line>In the howling storm:
         <stanza>
              <line>Has found out thy bed
              <line>Of crimson joy:
              <line>And his dark secret love
              <line>Does thy life destroy.
          <poem>
              <!-- more poems go here    -->

    </anthology>
</my.dtd>

This is an XML document:


<!DOCTYPE countryCollection SYSTEM "countryList.dtd">

<countrylist>
      <country>
            <officialName>United States of America</officialName>
            <label>Common Names:</label>      
            <commonName>United States</commonName>
            <commonName>U.S.</commonName>
            <label>Capital:</label>     
            <capital>Washington, D.C.</capital>
            <label>Major Cities:</label>                  
            <majorCity> Los Angeles </majorCity>
            <majorCity> New York </majorCity>       
            <majorCity> Chicago </majorCity>        
            <majorCity> Dallas </majorCity>         
            <label>Bordering Bodies of Water:</label>                 
            <borderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater>
            <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>           
            <borderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater>   
            <label>Bordering Countries:</label>           
            <borderingCountry> Canada </borderingCountry>                   
            <borderingCountry> Mexico </borderingCountry>
</country>
      <country>
            <officialName> Japan </officialName>
            <label>Common Names:</label>                  
            <commonName> Japan </commonName>
            <label>Capital:</label>     
            <capital>Tokyo</capital>
            <label>Major Cities:</label>                  
            <majorCity> Nagoya </majorCity>
            <majorCity> Osaka </majorCity>          
            <majorCity> Kobe </majorCity>     
            <label>Bordering Bodies of Water:</label>
            <borderingBodyOfWater> Sea of Japan </borderingBodyOfWater>
            <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>           
      </country>
      <country>
            <officialName> Republic of Kenya </officialName>
            <label>Common Names:</label>                  
            <commonName> Kenya </commonName>
            <label>Capital:</label>     
            <capital> Nairobi </capital>
            <label>Major Cities:</label>                  
            <majorCity> Mombasa </majorCity>
            <majorCity> Lamu </majorCity>
            <majorCity> Malindi </majorCity>        
            <majorCity> Kisumu </majorCity>         
            <label>Bordering Bodies of Water:</label>
          
            <borderingBodyOfWater> Indian Ocean </borderingBodyOfWater>
      </country>
</countrylist>

This is an HTML document:


<head>


    <script language="JavaScript">
        //The global variable containing the XML string we'll examine. Normally, global variables are to be avoided. But here, it's the easiest way to
        //approach the problem, since it would be easy to have a server-side script include the contents of the XML file as a single line here.
        gXMLString = " <officialName> United States of America </officialName> <commonName> United States </commonName> <commonName> U.S. </commonName> <capital> Washington, D.C. </capital> <majorCity> Los Angeles </majorCity> <majorCity> New York </majorCity> <majorCity> Chicago </majorCity> <majorCity> Dallas </majorCity> <borderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater> <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater> <borderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater> <borderingCountry> Canada </borderingCountry> <borderingCountry> Mexico </borderingCountry>"

        function findTagsPresent() {
            var arrayOfPieces = new Array()
            arrayOfPieces = gXMLString.split(" ")
            numberOfPieces = arrayOfPieces.length
            var tagsPresent = new Array()
            var tagsPresentCounter
            tagsPresentCounter = 0
            for (i = 0; i < numberOfPieces; i++) {

                if ((arrayOfPieces[i].indexOf("<") == 0) && (arrayOfPieces[i].indexOf(">") == (arrayOfPieces[i].length - 1)) && (arrayOfPieces[i].indexOf("</") == -1))
                // If that's the case, then we've found an opening tag.
                {
                    var arrayLength
                    arrayLength = tagsPresent.length
                    var foundIt
                    foundIt = false
                    for (j = 0; j < arrayLength; j++) {
                        if (tagsPresent[j] == arrayOfPieces[i]) {
                            foundIt = true
                            break
                        }
                    }
                    if (foundIt != true)
                    //And if that's the case, it's not already in tagsPresent
                    {
                        tagsPresent[tagsPresentCounter] = arrayOfPieces[i]
                        tagsPresentCounter++
                    }
                }
            }
            return tagsPresent
        }
        function writeListOfTagsPresentWithCheckboxes() {
            var listOfTags
            listOfTags = findTagsPresent()
            var listLength
            listLength = listOfTags.length
            var numberOfCheckBoxes
            numberOfCheckBoxes = 0
            for (i = 0; i < listLength; i++) {
                var tagStringLength
                tagStringLength = listOfTags[i].length
                var strippedTagString
                strippedTagString = listOfTags[i].substring(1, (tagStringLength - 1))
                document.write("<BR>")
                document.write("<INPUT TYPE='checkbox' NAME='box" + i + "' VALUE='" + strippedTagString + "'> &nbsp;")
                document.write(strippedTagString)
                numberOfCheckBoxes++
            }
            document.write("<P>")
            document.write("<INPUT TYPE='button' value='Display' onClick='displaySelectedXMLData(" + numberOfCheckBoxes + ")'>")
        }
        function contentsTaggedThisWay(tagString) {
            var arrayOfPieces = new Array()
            arrayOfPieces = gXMLString.split(" ")
            var numberOfPieces
            numberOfPieces = arrayOfPieces.length
            var taggedData
            taggedData = ""
            var i
            i = 0
            while (i < numberOfPieces) {
                if (arrayOfPieces[i] == ("<" + tagString + ">")) {
                    var foundEndTag
                    taggedData += "<BR>"
                    foundEndTag = false
                    var j
                    j = 1
                    while (!(foundEndTag)) {
                        if (arrayOfPieces[(i + j)] == ("</" + tagString + ">")) {
                            foundEndTag = true
                        }
                        else {
                            taggedData += arrayOfPieces[(i + j)]
                            taggedData += " "
                            j++
                        }
                    }
                }
                i++
            }
            return taggedData
        }
        function displaySelectedXMLData(numberOfBoxes) {
            var stringToWrite
            stringToWrite = ""
            parent.rightFrame.location.reload()
            stringToWrite = "<HTML> <HEAD> </HEAD> <BODY>"
            var i
            i = 0
            while (i < numberOfBoxes) {
                currentBoxName = "box" + i
                if (document.selectionForm.elements[currentBoxName].checked) {
                    stringToWrite += "<P><B>" + document.selectionForm.elements[currentBoxName].value + "</B>"
                    stringToWrite += contentsTaggedThisWay(document.selectionForm.elements[currentBoxName].value)
                }
                i++
            }
            stringToWrite += "</BODY> </HTML>"
            parent.rightFrame.document.write(stringToWrite)
        } 
    </script>

</head>
<body>
    <h4>
        Data Chooser</h4>
    <p>
        Choose the tags whose data you want to display.
        <form name="selectionForm">

        <script language="JavaScript">
            writeListOfTagsPresentWithCheckboxes()
        </script>

        <p>
        </form>
</body>

Characteristics of XML

The defining characteristic of XML is that is used to store data. When storing this data, XML aims to define a structure for it. An XML document adheres to a user-defined structure. Through this structure, each document can be checked to confirm that the data is in the right structure.

XML also covers data of a hierarchical nature. A particular set of data might be best described in a pre-determined hierarchical structure and may also contain relationships. An example of this is demonstrated in the example defining an "advertisement" with elements such as "id", "status", "reference" and "comments" and their attributes and sub-hierarchies.

When defining the structure of your XML file, you are forced to instil unique labels for each item of data. This produces a described structure that is easily readable to humans, but at the same time is machine-friendly. The XML structure can be validated through a DTD or XSD file, thus ensuring that the file possesses the right structure, as shown in the "anthology" example.

Applications of XML

The simple and powerful nature of XML has seen it being adopted worldwide in a multitude of completely different systems. Although the systems may operate in a totally different manner, the storage and exchange of data is being exchange in a standard way: XML.

The provided examples illustrate this, by providing examples for systems such as an advertisement system, a book system (defining the structure of the book as appropriate ex: title, author, sections and paragraphs) and a country information bank.







No comments:

Post a Comment