8

 

The DTD

 

Validations in XML

 

So far, we have only read an XML file, without catering to special cases, wherein, either an entity has been used, or data has to be validated as per the element. The XmlTextReader class is the most optimum choice for reading an XML file, barring the cases where data has to be validated, or in cases where an entity has to be replaced with a value. For such purposes, the XmlValidatingReader class is more suited. This class is derived from XmlReader, and it conducts three types of validations- DTD, XDR and XSD schema validations.

 

This class is used when the primary task is either to conduct data validations or to resolve general entities or to provide support for default entities.

 

a.cs

using System;

using System.IO;

using System.Xml;

public class zzz

{

public static void Main()

{

XmlValidatingReader r = null;

XmlParserContext p;

p = new XmlParserContext(null, null, "vijay", null, null, "<!ENTITY pr '100'>","","", XmlSpace.None);

r = new XmlValidatingReader ("<vijay mukhi='great' price='Rs &pr;'></vijay>", XmlNodeType.Element, p);

r.ValidationType = ValidationType.None;

r.MoveToContent();

while (r.MoveToNextAttribute())

{

Console.WriteLine("{0} = {1}", r.Name, r.Value);

}

r.Close();

}

}

 

Output

mukhi = great

price = Rs 100

 

To create the object p of type XmlParserContext, the constructor with nine parameters of XmlParserContext class is called. The nine parameters are as follows:

     The first parameter refers to the NameTable type. It has a value of null.

     The second parameter refers to NamespaceManager type. It also has a value of null.

     The third Parameter is the DocType, i.e. the root tag 'vijay'.

     The fourth parameter is the pubid for the external DTD file.

     The fifth parameter is the sysid for the external DTD file.

     The sixth parameter is the internal DTD, where an ENTITY declaration <!ENTITY pr '100'> has been created. This simply states that the word 'pr' is preceded by a '&' and followed by a semi-colon must be replaced with the string '100'.

     The seventh parameter in sequence is the location from where the fragment is to be loaded, i.e. the base URI.

     The eighth parameter stands for the xml:lang scope.

     The ninth parameter stands for the xml:space scope.

The parameters to the constructor of XmlValidatingReader class are similar to those of the XmlTextReader, which we had encountered earlier. This class is derived from the XmlTextReader as well as the IXmlLineInfo interface.

 

There are five different values that a Validationtype can be initialized to:

 

1. The first is Auto, which validates only when the DTD or schema information is found.

 

2. The second is DTD, which validates based on the instructions found in the DTD.

 

3. The third option, which creates an XML 1.0 non-validation parser, validates the default attributes and resolves entities without using the DOCTYPE. Thus, if the root tag is changed from 'vijay' to 'vijay1', no errors will be generated. Placing the ValidationType statement within comments will generate the following exception:

 

"Unhandled Exception: System.Xml.Schema.XmlSchemaException: The root element name must match the DocType name. An error occurred at (1, 2)."

 

4. The fourth option is XSD, which validates as per the XSD schemas. 

 

5. The fifth option is XDR, which validates as per the XDR schemas. In our program we have set this property to a value of None.

 

Once the required properties are set, the MoveToContent function is used to move to the first element, 'vijay'. The next function, MoveToNextAttribute returns a value of True when there are attributes remaining to be read. Otherwise, it returns a value of False. In our case, it is similar to the MoveToFirstElement function.

 

The while loop repeats twice, since there are two attributes. The Name and Value properties for the first attribute are displayed as 'mukhi' and 'great'. This is very similar to what we have observed in the earlier program. The name for the second attribute is displayed as 'price'. However, its value is not the same, because it has an entity &pr;. The XmlValidatingReader replaces the entity pr with the string '100', prior to displaying the value.  Therefore, the output is displayed as 'price' and 'Rs. 100'.

 

a.cs

using System;

using System.IO;

using System.Xml;

using System.Xml.Schema;

class zzz

{

public static void Main()

{

XmlTextReader r = new XmlTextReader("b.xml");

XmlValidatingReader v = new XmlValidatingReader(r);

v.ValidationType = ValidationType.DTD;

v.ValidationEventHandler += new ValidationEventHandler (abc);

while(v.Read());

}

public static void abc(object s, ValidationEventArgs a)

{

Console.WriteLine("Severity:{0}", a.Severity);

Console.WriteLine("Message:{0}", a.Message);

}

}

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE vijay1 >

<vijay>

</vijay>

 

Output

Severity:Error

Message:The root element name must match the DocType name. An error occurred at file:///c:/csharp/b.xml(3, 2).

Severity:Error

Message:The 'vijay' element is not declared. An error occurred at file:///c:/csharp/b.xml(3, 2).

In the above program, to begin with, an object r that looks like XmlTextReader is created, and then, it is passed to the constructor of XmlValidatingReader, while object v is being created. The ValidationType of the object v is modified to DTD. The ValidationEventHandler event is set to the function abc, which gets called whenever an error occurs. Under the aegis of the Read function, the entire XML file is validated, using the while loop, and the function abc is notified whenever an error is chanced upon.

 

In the function abc, the values contained in the properties - Severity and Message, of the ValidationEventArgs parameter 'a', are printed. The Severity property reveals whether it is an error or warning, whereas, the Message property contains the precise text of the error or warning.

 

In the above case, an error is generated because the DOCTYPE expects the root element to be 'vijay1', whereas, it has been specified as 'vijay'. When no error message is displayed, it may be inferred that no errors have been found.

 

The DTD

 

Using the above C# program, we shall now create our own DTD file. Therefore, we shall modify only the b.xml and b.dtd files.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE vijay SYSTEM "b.dtd" >

<vijay />

 

b.dtd

<!ELEMENT vijay >

 

A DTD is generally very protracted. So, an internal DTD is rarely used. If it is used, its contents have to be placed within [] brackets. To use an external DTD, we use the words SYSTEM followed by the name of the DTD file, which is b.dtd, in this case.

 

In b.dtd, an element 'vijay' is created by inserting the reserved characters '<!', followed by ELEMENT, and finally by the element name 'vijay'. When we run the C# program 'a', the following error is generated:

 

Output

Unhandled Exception: System.Xml.XmlException: This is an invalid content model. Line 1, position 17.

 

An error in the DTD file has resulted in the generation of an un-handled exception. The error occurred due to an incomplete ELEMENT statement.

 

b.dtd

<!ELEMENT vijay EMPTY>

 

The addition of the word EMPTY salvages the situation. By specifying the word EMPTY, it is amply clear that the element named 'vijay' is an empty element.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE vijay SYSTEM "b.dtd" >

<vijay>

</vijay>

 

Output

Severity:Error

Message:Element 'vijay' has invalid child element '#PCDATA'. An error occurred at file:///c:/csharp/b.xml(3, 8).

 

The DTD file states, with absolute clarity, that the ELEMENT 'vijay' is EMPTY. However, an open tag <vijay> and a close tag </vijay>have been added to the XML file. Therefore, an error message is generated, which, as usual, is unintelligible.

 

Instead of using tags such as 'vijay', let us consider a DTD that has been implemented in real life. This one is used for the WML, or the Wireless Markup Language. The rules or syntax of WML are available as a DTD.

 

In our book titled 'WML and WMLScript', we have endeavoured to elucidate the concept of a DTD. You are at liberty to refer to the book. However, we must caution you that, the approach and the explanation used here is entirely at variance with the one used in the earlier book.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

</wml>

 

b.dtd

<!ELEMENT wml EMPTY>

 

Output

Severity:Error

Message:Element 'wml' has invalid child element '#PCDATA'. An error occurred at file:///C:/csharp/b.xml(3, 6).

 

The word 'vijay' has merely been replaced by the word 'wml'. The error generated is akin to the earlier one. At this juncture, we introduce a 'card' into the DTD file.

 

b.dtd

<!ELEMENT wml (card)>

 

Output

Severity:Error

Message:Element 'wml' has incomplete content. Expected 'card'. An error occurred at file:///c:/csharp/b.xml(4, 3).

 

Every WML document must commence with the root tag 'wml'. In the DTD file, we have placed the word 'card' within round brackets, along with wml. This signifies that the wml tag must contain a tag or an element called 'card'. Since there is no card in the XML file, an error is reported, stating that a card is expected, and on account of its unavailability, the wml element is incomplete.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card />

</wml>

 

Output

Severity:Error

Message:The 'card' element is not declared. An error occurred at file:///c:/csharp/b.xml(4, 2)

 

We add the card tag as a single tag to our XML file, in an endeavour to eliminate the error. But, as we have not specified 'card' as a valid element in the DTD file, yet another error message is displayed. Unless 'card' appears as an ELEMENT in the DTD file, it is not possible to use it in the XML file. Therefore, we now include 'card' as an EMPTY element in b.dtd

 

b.dtd

<!ELEMENT wml (card)>

<!ELEMENT card EMPTY>

 

Now, all the errors just vanish. In the DTD file, we had affirmed that the element 'card' shall be empty i.e. it will not have any content.

 

The XML file depicted below displays an error, because the 'card' tag is not a single tag any longer.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

</card>

</wml>

 

Output

Severity:Error

Message:Element 'card' has invalid child element '#PCDATA'. An error occurred at file:///C:/csharp/b.xml(4, 7).

 

The error message displayed here is very similar to the one seen with the wml tag.

The element 'wml' has an invalid child element '#PCDATA'

A slight modification to the XML file is desirable, before we endeavour to eliminate the error.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

hi

</card>

</wml>

 

Output

Severity:Error

Message:Element 'card' has invalid child element 'Text'. An error occurred at file:///c:/csharp/b.xml(4, 7).

 

Inserting the word 'hi' between the card tags results in a slightly altered error messages. In place of PCDATA, we get to see Text. Resorting to the following modifications to the DTD file, both the error messages can be eliminated.

 

b.dtd

<!ELEMENT wml (card)>

<!ELEMENT card (#PCDATA)>

 

To eradicate the errors, the EMPTY word is replaced with #PCDATA, enclosed within round brackets. The word PCDATA is an acronym for Parseable Character Data. In plain English, it represents text that can be entered from the keyboard. Thus, we are at liberty to write as many lines of text as we want, within the card tag. Even if the word 'hi' is removed from within the tags, no error is generated.

 

Our DTD expects a root tag or starting tag of wml. Only a card tag can be inserted amidst within this tag, which is capable of containing limitless content. Insertion of anything else in this tag is a sure recipe for disaster.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

</card>

<card>

</card>

</wml>

 

Output

Severity:Error

Message:Element 'wml' has invalid content. Expected ''. An error occurred at file:///c:/csharp/b.xml(6, 2).

 

The above error has occurred because, the DTD clearly specifies that the root tag wml must have one, and only one, occurrence of the tag called 'card' within it. Here, we have created two tags, thereby, causing the error.

 

b.dtd

<!ELEMENT wml (card)*>

<!ELEMENT card (#PCDATA)>

 

The * symbol, placed after the round brackets, is indicative of the fact that, it can be replaced with zero to infinite values. Thus, the XML file can now either have zero or countless card elements. If you do not give credence to this statement of ours, you may either delete all the card elements from the XML file, or add numerous cards. Either way, no error will be generated.

 

b.dtd

<!ELEMENT wml (card)+>

<!ELEMENT card (#PCDATA)>

 

Replacing the symbol * with a + transforms the meaning from 'zero to infinity' to 'one to infinity'. The only difference between the * symbol and the + symbol is that, the + sign mandates at least one occurrence of the element whereas, the * signs makes it optional. Thus, in the aboveXMLfile, at least a single card element is required.

 

 

 

b.dtd

<!ELEMENT wml (card)?>

<!ELEMENT card (#PCDATA)>

 

The last of the special characters is the symbol ?  that specifies the number of elements to be from 'zero or one'. Thus in the XML file, we may have either one card element or none at all. The presence of two or more cards will generate an error. You should try out various possible combinations for each of the symbols *, + and?.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

<p> hi </p>

</card>

</wml>

 

b.dtd

<!ELEMENT wml (card)*>

<!ELEMENT card (p)>

<!ELEMENT p (#PCDATA)>

 

No error is generated because, in the DTD file, we have now stated that, the card element can have a tag p, which can contain any text. We have, however, done away with the provision of placing any text within the card tag.

 

Add in a new modification to the file.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

<p> <b/> </p>

</card>

</wml>

 

 

b.dtd

<!ELEMENT wml (card)*>

<!ELEMENT card (p)>

<!ELEMENT p (br | b)>

<!ELEMENT br EMPTY>

<!ELEMENT b EMPTY>

 

The DTD appears extensively complicated. The p tag is now competent of containing only two tags, br and b. Text is not allowed any more. The | sign signifies the OR condition, which implies that either tag b or tag br is allowed. The two aforesaid tags are defined as EMPTY tags. To summarise, our DTD states that the p tag can contain a single tag of either b or br.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

<p> <b/> <br/></p>

</card>

</wml>

 

Output

Severity:Error

Message:Element 'p' has invalid content. Expected ''. An error occurred at file:///c:/csharp/b.xml(5, 11).

 

All is not well, because we are allowed to place either a 'b' or a 'br' at a time, but not both together. To remedy the situation, we place a * symbol after the p tag.

 

b.dtd

<!ELEMENT wml (card)*>

<!ELEMENT card (p)*>

<!ELEMENT p (br | b)*>

<!ELEMENT br EMPTY>

<!ELEMENT b EMPTY>

 

 

 

The above DTD provides us the flexibility of having multiple p tags within n number of cards. These, in turn, may have as many b or br tags as desired.

 

By replacing the b tag with #PCDATA, a p tag is in a position to accommodate multiple br tags, as well as an indefinite amount of text.

 

<!ELEMENT p (br | #PCDATA)*>

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card />

<head />

</wml>

 

b.dtd

<!ELEMENT wml (card,head)>

<!ELEMENT card EMPTY>

<!ELEMENT head EMPTY>

 

The above DTD file permits the wml tag to contain a card tag, which is then to be strictly followed by a head tag. The comma signifies that one tag is to be followed by the other. If we refrain from using the head tag in the XML file, the following error message will be generated:

 

Output

Severity:Error

Message:Element 'wml' has incomplete content. Expected 'head'. An error occurred at file:///C:/csharp/b.xml(5, 3).

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<head />

<card />

</wml>

Output

Severity:Error

Message:Element 'wml' has invalid content. Expected 'card'. An error occurred at file:///c:/csharp/b.xml(4, 2).

 

If the order of the tags is interchanged, an error is thrown. The card tag must be followed by the head tag. Besides, there is a restriction imposed that there can be only one insertion of each tag. If there are multiple insertions, it will result in an error.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card />

<card />

<head />

</wml>

 

b.dtd

<!ELEMENT wml (card+,head?)>

<!ELEMENT card EMPTY>

<!ELEMENT head EMPTY>

 

When the plus sign is inserted after the card, it allows the use of more that one card tag in the file. The ? sign denotes 'zero or one' insertions of the head tag. Thus, we can have more than one card tag and have either a single head tag or none at all. If the head tag is present, it must be placed after the card tag, since the order of the tags is sacrosanct.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card />

<head />

<card />

</wml>

 

 

Output

Severity:Error

Message:Element 'wml' has invalid content. Expected ''. An error occurred at file:///c:/csharp/b.xml(6, 2).

 

The Draconian restrictions imposed by the DTD file prohibit us from altering the sequence of the above tags. The card tag has to come first, followed by the head tag. We cannot interchange a head tag with a card tag.  So, the only solution to this problem is to abide by the stipulated sequence.

 

b.dtd

<!ELEMENT wml (card+,head?,template*)*>

<!ELEMENT card EMPTY>

<!ELEMENT head EMPTY>

<!ELEMENT template EMPTY>

 

In the DTD file, we have added a * symbol to the entire set of tags, which make up the wml element. The set consists of the following individual elements in a sequential order:

     More than one card tags.

     Zero or one head tag.

     Zero to many template tags.

 

This set can constitute of numerous permutations and combinations of the above conditions, in the specified order. Thus, the card and head can appear together, or the card can appear by itself without the head tag, or the template tag may not be present at all, and so on.  Every occurrence, however, needs to begin with a card tag.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card aa="hi"/>

</wml>

 

 

 

 

b.dtd

<!ELEMENT wml (card)>

<!ELEMENT card EMPTY>

<!ATTLIST card aa CDATA #IMPLIED>

 

In the above example, the card tag has an attribute called aa initialized to 'hi'. To implement an attribute, we include the word ATTLIST, which is a short form for 'a list of attributes', in the DTD file. This is followed by the name of the tag that the attribute is associated with. Then, the actual name of the attribute aa is specified, followed by the datatype it will hold, which is character data, in our case. The last parameter, #IMPLIED permits the attribute aa to be optional. Therefore, even if you remove it, no error will be generated.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card />

</wml>

 

b.dtd

<!ELEMENT wml (card)>

<!ELEMENT card EMPTY>

<!ATTLIST card aa CDATA #IMPLIED bb CDATA #REQUIRED>

 

Output

Severity:Error

Message:The required attribute 'bb' is missing. An error occurred at file:///c:/csharp/b.xml(4, 2).

 

The error message clearly mentions that the attribute bb is missing. The #REQUIRED demands the presence of attribute bb, along with the card, whenever the card tag is used. Further, the attributes are to be placed one after the other. However, the order of placement is not significant.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card bb="no"/>

</wml>

 

No errors are generated since the attribute bb, which is mandatory, has been specified. You can avoid aa, since it is implied.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card aa="no"/>

</wml>

 

b.dtd

<!ELEMENT wml (card)>

<!ELEMENT card EMPTY>

<!ATTLIST card aa  (hi | bye ) "bye">

 

Output

Severity:Error

Message:'no' is not in the enumeration list. An error occurred at file:///c:/csharp/b.xml(4, 7).

 

The values assigned to attributes can be restricted to specific values. This can be achieved by specifying the values along with ATTLIST in the DTD file and using the OR sign  (|) as the separator. The attribute aa can only be assigned the value of either 'hi' or 'bye'. Specifying any other value would result in an error.

 

If the attribute is not initialized, it assumes the default value of 'bye'.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card aa="hi"/>

</wml>

 

 

The error disappears because the attribute has been assigned a value of 'hi'.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card aa="hi"/>

</wml>

 

b.dtd

<!ELEMENT wml (card)*>

<!ELEMENT card EMPTY>

<!ATTLIST card aa ID #IMPLIED>

 

We have created an attribute aa, with a data type of ID. This does not result in any error.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card aa="hi"/>

<card aa="hi"/>

</wml>

 

Output

Severity:Error

Message:'hi' is already used as an ID. An error occurred at file:///c:/csharp/b.xml(5, 7).

 

The card tag can be used multiple times, due to the presence of the * sign in the DTD file. By associating the type of ID to the attribute aa, it is guaranteed that the same value of 'hi' is not assigned to the attribute. The error message conveys that 'hi' has already been assigned as an ID to the attribute aa, and hence, it cannot be used again.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card aa="hi"/>

<card aa="hi1"/>

</wml>

 

If we assign a different value to the attribute, the error is dispensed with. Thus, a data type of ID guarantees that the attribute shall never have a duplicate value.

 

b.xml

<?xml version="1.0" ?>

<!DOCTYPE wml SYSTEM "b.dtd" >

<wml>

<card>

Hi &sonal;

</card>

</wml>

 

b.dtd

<!ELEMENT wml (card)*>

<!ELEMENT card (#PCDATA)*>

<!ENTITY sonal "hi" >

 

Entities have been touched upon earlier. Here, the word 'sonal' will be replaced with 'hi'. This is called an Entity Reference. The DTD file requires an ENTITY word with the variable 'sonal', and the value 'hi'.