Examples Showing Why <XMP> and <LISTING> Are Bad For SGML

An SGML-based parsing-validating engine will find 48 errors with this file. Let us start by showing what the three samples for which I substituted <PRE> for <XMP> in my calling page for this HTML tag/element (this uses <XMP>):

St. <B>Pat</B>rick was a <B>gent</B>leman Who through <B>strat</B>egy &amp; <B>stealth</B> Drove <B>all</B> the snakes from <B>Ir</B>eland, Here's a <B>toast</B>ing to his <B>health</B>; But <B>not</B> too many <B>toast</B>ings Lest you <B>lose</B> yourself &amp; <B>then</B> For<B>get</B> the good St. <B>Pat</B>rick &amp; see <B>all</B> those snakes a<B>gain</B>.

When passed through the validator each instance of the </B> closing tag creates an error. This is because in a literal character data portion of the document, an SGML parser turns off as much as it can, such that it does not even see the opening <B> tags, but since it is looking for the closing </XMP> tag it must check all closing tags, and it does see them, and therefore each is seen as an "end tag for element 'B' which is not open." One can see that simply on the browser screen it looks just like the <PRE> sample with which I substituted it (this uses <PRE>):

St. <B>Pat</B>rick was a <B>gent</B>leman
Who through <B>strat</B>egy &amp; <B>stealth</B>
Drove <B>all</B> the snakes from <B>Ir</B>eland,
Here's a <B>toast</B>ing to his <B>health</B>;
But <B>not</B> too many <B>toast</B>ings
Lest you <B>lose</B> yourself &amp; <B>then</B>
For<B>get</B> the good St. <B>Pat</B>rick
&amp; see <B>all</B> those snakes a<B>gain</B>.

Notice that while there are 16 such errors in lines 14 through 21 (the <XMP> sample), there are no such errors in lines 33 through 40 (the corresponding lines in the <PRE> sample). So this file is already not valid HTML on account of the closing tags within the <XMP> above. The same follows for the next of these three examples (this uses <XMP>):

The <STRONG> tag (ending with </STRONG>) is more emphatic than the <EM> tag (which ends with </EM>). Use the &amp; character entity to place an ampersand in the text.

The two errors in the above sample (on lines 48 and 49) are the same situation as before but this time with the <STRONG> and <EM> tags for which again, only the closing tags are detected. In the valid "Working XMP and LISTING Tag Element Examples" file, we substituted that with (this uses <PRE>):

The <STRONG> tag (ending with </STRONG>) is more emphatic
than the <EM> tag (which ends with </EM>).  Use the &amp;
character entity to place an ampersand in the text.

And again notice that there are no errors generated for lines 57 and 58 which used <PRE>. The third case introduces some different errors (this uses <XMP>):

... and then there comes the <!-- WiredMinds eMetrics tracking with Enterprise Edition V5.4 START --> <script type='text/javascript' src='https://count.carrierzone.com/app/count_server/count.js'></script> <script type='text/javascript'><!-- wm_custnum='e760263a926a9cf7'; wm_page_name='listind8.html'; wm_group_name='/services/webpages/t/h/the-pope.com/public'; wm_campaign_key='campaign_id'; wm_track_alt=''; wiredminds.count(); // --> </script> <!-- WiredMinds eMetrics tracking with Enterprise Edition V5.4 END --> </BODY> tag & the </HTML> tag which close out the document.

In this case, on line 64 the closing </BODY> tag tells the validator that the end of the BODY of the document has been reached, and after that may only come white space, comments, the closing </HTML> tag, or a <PLAINTEXT> tag and its content. Nothing else would belong here, hence the complaint here and on line 65 that "character data is not allowed here." This (and what again happens with the next line with what looks to the validator like a closing </HTML> tag) also means that any attempt to open another element thereafter in this file is also an error, which explains all the "document type does not allow element 'X' here" lines 67, 96, 100, 106, 121, 137, 151, 159, 161, 169, and 182, and where 'X' is variously 'P', 'PRE', 'XMP', 'LISTING', and 'PLAINTEXT'. That last is only a problem because of the closing </HTML> tag on line 65; a closing </BODY> tag alone would not have caused a problem for a subsequent <PLAINTEXT> tag. Another couple errors are generated by this </BODY> tag by the fact that XMP resides totally in the BODY of a document. When the BODY ends, any section, such as a P or PRE or DIV, or as here, an XMP section must also end. But its end tag is not here but comes later, and the end tag is not optional, hence the message "end tag for 'XMP' omitted, but its declaration does not permit this." Then later on when the closing </XMP> tag is encountered on line 66, the XMP section is seen as not open (it was already closed with the BODY), and hence comes the message, "end tag for element 'XMP' which is not open." The above apparent closing of the BODY and the HTML document, as seen from the standpoint of the SGML validator, similarly means that the real ending tags for these elements further down in the file are similarly reported as errors on lines 181 and 192. In the valid "Working XMP and LISTING Tag Element Examples" file, this was simulated using <PRE> (this uses <PRE>):

... and then there comes the </BODY> tag & the
</HTML> tag which close out the document.

And again notice that there are no errors flagged for lines 97 and 98. Everything after the </BODY> and </HTML> tags in the above <XMP> sample, from an SGML standpoint, has no business being in the file at all, since according to those closing tags, the body and html text have ended, so what is all this garbage coming after? But <PRE> does not cause this problem.

It also has to do with a fundamental difference between how HTML is meant to be displayed (and is on nearly all browsers and most other user agents), versus how in SGML such things are supposed to be handled. In proper SGML, any ending tag of any kind should terminate the raw data display section, since an "END TAG OPEN" (</n, where "n" is any lower or upper case alphabetic character) is what ends such spans of raw displayed data. But in HTML, in proper and correct implementations of these elements, it is only the tag itself (</XMP> or </LISTING>) which can trigger the end of the literal text span. Indeed, in a correct HTML (but non-SGML) application, everything between the opening tag and the closing tag is to be disregarded as anything but mere text to display, so that while SGML sees the end of the file in the closing </HTML> tag above, HTML applications merely see that as so much text to display.

One other limitation with <XMP> and <LISTING> is the fact that the content of the element cannot contain an example of the closing tag itself. For example, if one attempts to show a closing </XMP> within an XMP element, there is no way to show it except with the actual closing tag itself, which then closes the element prematurely and doesn't show. And with </LISTING> there is the same problem within a LISTING element. But one should be able to show either of these closing tags within the other, for example a closing </LISTING> tag within an XMP element, or vice versa. The SGML-friendly <PRE> element however has no such problem, since it can show the closing tag instead of enforcing it by using character entities to display as the opening and closing brackets. <PLAINTEXT>, being a fundamentally different tag, should not have this problem (but it does on some browsers), since it accepts no closing tag, so one should be able to see one if entered, along with any other sort of closing tag.

In the following examples, I have attempted to close the section with first the invalid tag "</NN>" which would be enough to close out the literal section, from an SGML standpoint, then the same using the valid closing HTML tag "</EM>" and then with whichever closing tags do not match that which opened the section. Since only closing tags are noted, the validator would find errors on the same closing tags without opening tags as it did for the above examples, and such is the reason for the errors found on lines 154, 155, 156, 157, 164, 165, 166, 167, 186, 187, 188, and 189. Notice also that so sketchy is its scan of literal text sections that the validator fails to notice that "NN" is not a legal HTML element. Finally it also generate the error "character data is not allowed here" after the closing <PLAINTEXT> tag, since PLAINTEXT does not belong to the BODY but comes after. See here what your browser does with these cases (this uses <XMP>):

This is raw text, which I will attempt to close with just any ending tag that should be correct for SGML, but in a proper implementation of HTML will not end until the closing XMP tag comes along. Do you see the closing NN tag here? </NN> Do you see the closing EM tag here? </EM> Do you see the closing LISTING tag here </LISTING> Do you see the closing PLAINTEXT tag here </PLAINTEXT> Or does it only close for XMP (which is correct HTML)?

And now the same for <LISTING> (this uses <LISTING>):

This is raw text, which I will attempt to close with just any ending tag that should be correct for SGML, but in a proper implementation of HTML will not end until the closing LISTING tag comes along. Do you see the closing NN tag here? Do you see the closing EM tag here? Do you see the closing XMP tag here Do you see the closing PLAINTEXT tag here Or does it only close for LISTING (which is correct HTML)?

It is because these two elements (and <PLAINTEXT> for similar reasons) force HTML user agents to work in a manner not consistant with proper SGML that there has been so much pressure to eliminate these actually rather convenient tags. Imagine not having to go through all of your raw text, converting every & to &amp; and then every < to &lt;, every > to &gt; and " to &#34;, just so you can put it in a <PRE> section, and then also having to use stylesheets or what not to get the size right when replacing <LISTING> with <PRE>! I end this file with yet another test using <PLAINTEXT> (this uses <PLAINTEXT>):

This is raw text, which I will attempt to close with just any ending tag that might be correct for SGML, but in a proper implementation of HTML will not end at all, or at least only until the closing PLAINTEXT tag comes along. Do you see the closing NN tag here? </NN> Do you see the closing EM tag here? </EM> Do you see the closing XMP tag here </XMP> Do you see the closing LISTING tag here </LISTING> Do you see the closing PLAINTEXT tag here </PLAINTEXT> Do you see this as raw text or normal or not at all? </HTML>