- Have a look at the documents below. Some are XML documents, some are HTML documents, some may be neither. See if you can decide which are which.
- Make a list of the distinctive characteristics of an XML document, in terms of things that you can spot when looking at the code.
- Assume that (with suitable changes) all these documents could become XML documents. Imagine, and describe, applications that could use these documents.
Answers
This is an HTML document:
<head>
<title>The Edwin Smith Surgical Papyrus</title>
<meta name="Keywords" content="Egypt, egypt, egypt travel, medical, Egyptian, egyptian, Papyrus, Edwin Smith, Surgical, literature, stories, instructions, Memphis, Nile, Cairo, Alexandria, Admonitions of Ipuwer, pharaoh, pharaonic">
<meta name="Description" content="The Edwin Smith Surgical Papyrus on ancient Egyptian medical treatments">
</head>
<body background="Back25.jpg" bgcolor="#FFFFFF" text="#000000" link="#808000" vlink="#008080">
<table border="0" cellspacing="1" width="565">
<tr>
<td>
<center>
<!-- START ADCYCLE IFRAME RICH MEDIA CACHE-BUST CODE for Top of Member Pages -->
<script language="javascript"><!--
var id = 305; var jar = new Date(); var s = jar.getSeconds(); var m = jar.getMinutes();
var flash = s * m + id; var cgi = 'http://ads.touregypt.net/cgi-bin/adcycle';
var p = '<iframe src="' + cgi + '/adcycle.cgi?gid=1&t=_top&id=' + flash + '&type=iframe" ';
p += 'height=60 width=468 border=0 marginwidth=0 marginheight=0 hspace=0 ';
p += 'vspace=0 frameborder=0 scrolling=no>';
p += '<a href="' + cgi + '/adclick.cgi?gid=1&id=' + flash + '" target="_top">';
p += '<img src="' + cgi + '/adcycle.cgi?gid=1&id=' + flash + '" width=468 height=60 ';
p += 'border=1 alt="Click to Visit"></a></iframe>'; document.write(p); // -->
</script>
<noscript>
<a href="http://ads.touregypt.net/cgi-bin/adcycle/adclick.cgi?gid=1&id=305" target="_top">
<img src="http://ads.touregypt.net/cgi-bin/adcycle/adcycle.cgi?gid=1&id=305" width="468"
height="60" border="1"></a></noscript>
<!-- END ADCYCLE IFRAME RICH MEDIA CODE -->
</center>
</td>
</tr>
</table>
<table border="0" width="570">
<tr>
<td>
<p align="center">
<font size="3"><b>The Edwin Smith Surgical Papyrus</b> </font>
<p>
<font size="3">The Edwin Smith Surgical Papyrus, dating from the seventeenth century
B.C., is one of the oldest of all known medical papyri. Its differs fundamentally
from the others in the following ways: </font><font face="verdana,arial,helvetica"
size="3">
<p>
</font>
<ol>
<li><font size="3">The seventeen columns on the recto comprise part of a surgical treatise,
the first thus far discovered in the ancient Orient, whether in Egypt or Asia. It
is therefore the oldest known surgical treatise. </font>
<li><font size="3">
This is an XML document:
<!DOCTYPE advertisement SYSTEM "advertisement.dtd">
<?xml-stylesheet type="text/xsl" href="ad1.xsl" ?>
<advertisement action="update">
<id version="2">
NYT.19980701.12345.107
</id>
<status value="accepted"></status>
<expiration>
19980731
</expiration>
<reference>
Ad to sell Linda's car.
</reference>
<comment>
Up sold to add Friday repeat.
</comment>
<contact id="contact1">
<name>
John Smyth
</name>
<address>
<address_line>c/o Bat Accessories, Inc.</address_line>
<address_line>Hitchcock Building, 80th Floor</address_line>
<address_line>1313 Mockingbird Lane</address_line>
<city>New York</city>,
<state>NY</state>
<postal>10000-1234</postal>
<country>USA</country>
</address>
<phone>
19085551212
</phone>
<fax>
19085551213
</fax>
<email>
jsymth@batacc.com
</email>
<url>
http://www.batacc.com/~smyth
</url>
</contact>
<source>
<updated>
<timestamp>
19980701 12290200
</timestamp>
<userid>
JK1892
</userid>
</updated>
<created>
<timestamp>
19980701 12225800
</timestamp>
<userid>
JK1892
</userid>
</created>
<base version="1">
NYT.19980621.90810.98
</base>
</source>
<advertiser>
<account type="transient">
19085551212-1
</account>
<contact_ref link="contact1"></contact_ref>
<payment>
<charge>
<charge_card brand="amex"></charge_card>
<charge_account>3710-111111-99995</charge_account>
<charge_expiration>19991231</charge_expiration>
<contact_ref link="contact1"></contact_ref>
<charge_authorization status="allowed">4561</charge_authorization>
</charge>
</payment>
</advertiser>
<coding>
<automotive>
<auto_side value="sell">sell</auto_side>
<auto_category value="used">used</auto_category>
<auto_year>1991</auto_year>
<auto_make>Saab</auto_make>
<auto_model>900 Convertible</auto_model>
<auto_mileage>72000</auto_mileage>
<auto_price>$13,900</auto_price>
<auto_exterior>white</auto_exterior>
<auto_interior>gray leather</auto_interior>
<auto_body value="convertible">convertible</auto_body>
<auto_vin>372AB918098910X</auto_vin>
</automotive>
<contact>
<name></name>
<phone>19085551212</phone>
</contact>
</coding>
<text>
<font size="10">
<center>
<keyword name="auto_make" punct=" ">SAAB </keyword>
<keyword name="auto_model" punct=" ">900SE </keyword>
</center>
</font>
<keyword name="auto_year" punct=" ">1997 </keyword>
<keyword name="auto_exterior" punct=" ">yellow </keyword>
<keyword name="auto_body" punct=", ">convertible, </keyword>
<keyword name="auto_mileage" format="9'k miles'" scale="1000"
punct=", ">14k miles, </keyword>
Auto, PL, PW, AC, power leather Seats
Showroom cond. Assume lease.
<center>
Call
<keyword name="phone" format="T999-999-9999" punct=" ">
212-333-3333
</keyword>
</center>
</text>
<publication name="nytimes">
<pub_alias>
981011301
</pub_alias>
<pub_price>
$128.00
</pub_price>
<pub_options>
<claim>
7
</claim>
<columns>
1
</columns>
<forwarding collect="email">
Please email replies to <mailbox>T1234</mailbox>@nytimes.com
<rate basis="Email forwarding service charge--Full run"
unit="ad">$25.00
</rate>
</forwarding>
<tearsheet>
<rate basis="Tear sheet service charge" unit="recipient">$20.00</rate>
</tearsheet>
<shading>
<rate basis="Shading premium" unit="standard">20%</rate>
</shading>
</pub_options>
<class>
3720
<title>Autos/Vans/Sports Utilities</title>
<classword>Automotive</classword>
<classword>For Sale</classword>
<classword>Used</classword>
<lines>
4
</lines>
<sortkey>
SAAB91900
</sortkey>
<zone>
M
<title>Full Run</title>
</zone>
<rundate>
19980719
<rate basis="Automotive, Open, Sunday NY Region"
unit="line">$23.10
</rate>
<instance>
<edition>BASE</edition>
<section>12</section>
<page>22</page>
<column>9</column>
<offset>17.85</offset>
</instance>
<instance>
<edition>LI</edition>
<section>12</section>
<page>18</page>
<column>9</column>
<offset>17.85</offset>
</instance>
<instance>
<edition>NJ</edition>
<section>12</section>
<page>18</page>
<column>9</column>
<offset>17.85</offset>
</instance>
<instance>
<edition>NY/LI</edition>
<section>12</section>
<page>18</page>
<column>9</column>
<offset>17.85</offset>
</instance>
<instance>
<edition>WC</edition>
<section>12</section>
<page>18</page>
<column>9</column>
<offset>17.85</offset>
</instance>
</rundate>
<rundate>
19980724
<rate basis="Automotive, Open, Weekday NY Region--
Sunday ad repeated on Friday (within 7 days)"
unit="line">$8.90
</rate>
<rate basis="Automotive, Open, Weekday NY Region"
unit="line" type="comparison">$15.20
</rate>
<instance>
<edition>METRO</edition>
<section>6</section>
<page>14</page>
<column>6</column>
<offset>5.15</offset>
</instance>
</rundate>
</class>
</publication>
</advertisement>
<head>
<title>Display</title>
<script language="JavaScript">
function loadFile() {
var filename
var selectionValue
selectionValue = document.forms[0].selectList.selectedIndex
filename = document.forms[0].selectList.options[selectionValue].value
parent.rightFrame.location = filename
}
</script>
</head>
<body>
<h4>
XML File Chooser</h4>
<p>
Select the file you wish to see displayed in the right-hand frame.
<form name="selectForm">
<p>
<select name="selectList">
<option value="countryList/countryList.xml">Country Data
<option value="playerList/playerList.xml">
Baseball Player Data
</select>
<p>
<input type="BUTTON" value="Load Document" onclick="loadFile()">
</form>
</body>
<!DOCTYPE story SYSTEM "storyxsl.dtd">
<story>
<title>Freedom's Dream</title>
<author>by Charles White</author>
<copyright>Copyright 1996, 1999 by Charles White</copyright>
<section>
<para>Had it been a dream, Antron Crimea's memory of the clenched fist piercing the
sky of a tumultuous, thundering crowd would have been bearable solitude. As
it was though, the reality brought him to another place, to a distance only
something like a dream could take him.</para>
<para>"The crowd forgot everything," is how Antron described the situation to his
psychiatrist, <link id="ChesapeakeLink">
Chesapeake Alert.</link>
Antron remembered the rhythm, the pulse,
everything. After all this time the energy of the crowd still seemed to reverberate through
his head.</para>
<para>Chesapeake Alert was nothing but a large bulbous mass of jelly-like flesh; a
brain plopped down on an empty, expensive slice of carpet. And though he
had no legitimate locomotive capabilities of his own, he was aware of the
movements of a billion others.</para>
<para>Antron's hundred legs crawled around what was left of the carpet in the kind of
pace unknown to you or I. His earlier confusion had long ago been dissolved
by the righteous events of what he had seen during the course of events Billy
Freedom had ignited.</para>
<para>"Sometimes betrayal is a necessity,"said Chesapeake. "Startling. And
expensive. It must be weighed carefully."</para>
</section>
<auto-link xml:link="simple" actuate="user" href="sec_2.xml" show="replace">click here to continue</auto-link>
</story>
<!DOCTYPE my.dtd [
<!ELEMENT anthology - - (poem+)> <!ELEMENT poem - - (title?, stanza+)> <!ELEMENT
title - O (#PCDATA) > <!ELEMENT stanza - O (line+) > <!ELEMENT line O O (#PCDATA)
> ]>
<my.dtd>
<anthology>
<poem><title>The SICK ROSE
<stanza>
<line>O Rose thou art sick.
<line>The invisible worm,
<line>That flies in the night
<line>In the howling storm:
<stanza>
<line>Has found out thy bed
<line>Of crimson joy:
<line>And his dark secret love
<line>Does thy life destroy.
<poem>
<!-- more poems go here -->
</anthology>
</my.dtd>
<!DOCTYPE countryCollection SYSTEM "countryList.dtd">
<countrylist>
<country>
<officialName>United States of America</officialName>
<label>Common Names:</label>
<commonName>United States</commonName>
<commonName>U.S.</commonName>
<label>Capital:</label>
<capital>Washington, D.C.</capital>
<label>Major Cities:</label>
<majorCity> Los Angeles </majorCity>
<majorCity> New York </majorCity>
<majorCity> Chicago </majorCity>
<majorCity> Dallas </majorCity>
<label>Bordering Bodies of Water:</label>
<borderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater>
<borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>
<borderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater>
<label>Bordering Countries:</label>
<borderingCountry> Canada </borderingCountry>
<borderingCountry> Mexico </borderingCountry>
</country>
<country>
<officialName> Japan </officialName>
<label>Common Names:</label>
<commonName> Japan </commonName>
<label>Capital:</label>
<capital>Tokyo</capital>
<label>Major Cities:</label>
<majorCity> Nagoya </majorCity>
<majorCity> Osaka </majorCity>
<majorCity> Kobe </majorCity>
<label>Bordering Bodies of Water:</label>
<borderingBodyOfWater> Sea of Japan </borderingBodyOfWater>
<borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>
</country>
<country>
<officialName> Republic of Kenya </officialName>
<label>Common Names:</label>
<commonName> Kenya </commonName>
<label>Capital:</label>
<capital> Nairobi </capital>
<label>Major Cities:</label>
<majorCity> Mombasa </majorCity>
<majorCity> Lamu </majorCity>
<majorCity> Malindi </majorCity>
<majorCity> Kisumu </majorCity>
<label>Bordering Bodies of Water:</label>
<borderingBodyOfWater> Indian Ocean </borderingBodyOfWater>
</country>
</countrylist>
<head>
<script language="JavaScript">
//The global variable containing the XML string we'll examine. Normally, global variables are to be avoided. But here, it's the easiest way to
//approach the problem, since it would be easy to have a server-side script include the contents of the XML file as a single line here.
gXMLString = " <officialName> United States of America </officialName> <commonName> United States </commonName> <commonName> U.S. </commonName> <capital> Washington, D.C. </capital> <majorCity> Los Angeles </majorCity> <majorCity> New York </majorCity> <majorCity> Chicago </majorCity> <majorCity> Dallas </majorCity> <borderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater> <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater> <borderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater> <borderingCountry> Canada </borderingCountry> <borderingCountry> Mexico </borderingCountry>"
function findTagsPresent() {
var arrayOfPieces = new Array()
arrayOfPieces = gXMLString.split(" ")
numberOfPieces = arrayOfPieces.length
var tagsPresent = new Array()
var tagsPresentCounter
tagsPresentCounter = 0
for (i = 0; i < numberOfPieces; i++) {
if ((arrayOfPieces[i].indexOf("<") == 0) && (arrayOfPieces[i].indexOf(">") == (arrayOfPieces[i].length - 1)) && (arrayOfPieces[i].indexOf("</") == -1))
// If that's the case, then we've found an opening tag.
{
var arrayLength
arrayLength = tagsPresent.length
var foundIt
foundIt = false
for (j = 0; j < arrayLength; j++) {
if (tagsPresent[j] == arrayOfPieces[i]) {
foundIt = true
break
}
}
if (foundIt != true)
//And if that's the case, it's not already in tagsPresent
{
tagsPresent[tagsPresentCounter] = arrayOfPieces[i]
tagsPresentCounter++
}
}
}
return tagsPresent
}
function writeListOfTagsPresentWithCheckboxes() {
var listOfTags
listOfTags = findTagsPresent()
var listLength
listLength = listOfTags.length
var numberOfCheckBoxes
numberOfCheckBoxes = 0
for (i = 0; i < listLength; i++) {
var tagStringLength
tagStringLength = listOfTags[i].length
var strippedTagString
strippedTagString = listOfTags[i].substring(1, (tagStringLength - 1))
document.write("<BR>")
document.write("<INPUT TYPE='checkbox' NAME='box" + i + "' VALUE='" + strippedTagString + "'> ")
document.write(strippedTagString)
numberOfCheckBoxes++
}
document.write("<P>")
document.write("<INPUT TYPE='button' value='Display' onClick='displaySelectedXMLData(" + numberOfCheckBoxes + ")'>")
}
function contentsTaggedThisWay(tagString) {
var arrayOfPieces = new Array()
arrayOfPieces = gXMLString.split(" ")
var numberOfPieces
numberOfPieces = arrayOfPieces.length
var taggedData
taggedData = ""
var i
i = 0
while (i < numberOfPieces) {
if (arrayOfPieces[i] == ("<" + tagString + ">")) {
var foundEndTag
taggedData += "<BR>"
foundEndTag = false
var j
j = 1
while (!(foundEndTag)) {
if (arrayOfPieces[(i + j)] == ("</" + tagString + ">")) {
foundEndTag = true
}
else {
taggedData += arrayOfPieces[(i + j)]
taggedData += " "
j++
}
}
}
i++
}
return taggedData
}
function displaySelectedXMLData(numberOfBoxes) {
var stringToWrite
stringToWrite = ""
parent.rightFrame.location.reload()
stringToWrite = "<HTML> <HEAD> </HEAD> <BODY>"
var i
i = 0
while (i < numberOfBoxes) {
currentBoxName = "box" + i
if (document.selectionForm.elements[currentBoxName].checked) {
stringToWrite += "<P><B>" + document.selectionForm.elements[currentBoxName].value + "</B>"
stringToWrite += contentsTaggedThisWay(document.selectionForm.elements[currentBoxName].value)
}
i++
}
stringToWrite += "</BODY> </HTML>"
parent.rightFrame.document.write(stringToWrite)
}
</script>
</head>
<body>
<h4>
Data Chooser</h4>
<p>
Choose the tags whose data you want to display.
<form name="selectionForm">
<script language="JavaScript">
writeListOfTagsPresentWithCheckboxes()
</script>
<p>
</form>
</body>
Characteristics of XML
The defining characteristic of XML is that is used to store data. When storing this data, XML aims to define a structure for it. An XML document adheres to a user-defined structure. Through this structure, each document can be checked to confirm that the data is in the right structure.XML also covers data of a hierarchical nature. A particular set of data might be best described in a pre-determined hierarchical structure and may also contain relationships. An example of this is demonstrated in the example defining an "advertisement" with elements such as "id", "status", "reference" and "comments" and their attributes and sub-hierarchies.
When defining the structure of your XML file, you are forced to instil unique labels for each item of data. This produces a described structure that is easily readable to humans, but at the same time is machine-friendly. The XML structure can be validated through a DTD or XSD file, thus ensuring that the file possesses the right structure, as shown in the "anthology" example.
Applications of XML
The simple and powerful nature of XML has seen it being adopted worldwide in a multitude of completely different systems. Although the systems may operate in a totally different manner, the storage and exchange of data is being exchange in a standard way: XML.The provided examples illustrate this, by providing examples for systems such as an advertisement system, a book system (defining the structure of the book as appropriate ex: title, author, sections and paragraphs) and a country information bank.