A Note on Making the DTD's Available for Use

I have provided usable copies of the seven pre-HTML 2 DTD's (plus three "reconstructions" of DTD's that would be amenable to the forms of HTML used from October of 1990 through about May 24, 1993) which can be invoked by anyone wishing to try writing HTML (or HTML+) which is compliant to the various DTD's that had been variously proposed, experimented with, or flat-out meant to define HTML or HTML+ at the given time. In most cases I have found that at least some degree of adaptation was necessary in order to put the DTD in a usable form. And for the oldest forms, the DTD I prepared cannot be a fully reliable indicator of what would have passed for the version of the language. For example, several of them include the SGML declaration, and the various parsing/validating engines (W3C's, also WDG's, and most likely at least most others) already have their own internal SGML declarations to which they have been coded and they will not accept another. So one thing I have done here has been to comment out this declaration when it is found. Another major change to several DTD's is the external sources for entity declarations. Several of these early DTD's refer to outside files for entity declarations, and the files are either not to be found where declared, or not at all, or only in rather sketchy and primitive draft forms. The HTML+ DTD's even altogether omit such basic and obviously universally useful entities as &, <, >, and ". Even the draft for HTML 3.0 seems to lack any of these declarations, though is is probable that &, <, and > may have been in some one of the referenced files, or intended to be drawn from them. This may account for the mysterious lack of " in HTML 3.2 (plus HTML 3.2 did include new left and right quotation marks, perhaps in replacement of it, but it should still have been provided for backward compatibility). Following is my notes for each of the ten DTD's:

"html0.a.dtd"

This DTD is purely my own construction, attempting to capture the form of HTML used in the earliest days in october 1990 through about Mid-January of 1991 or so, that particular "HTML" which is easily distinguished by hand-entered lower-case tags, no <NEXTID> tags, no clear distinction between "HEAD" and "BODY," no use of NAME and HREF attributes of <A> tag appearing together, and only the crudest and simplest few HTML elements. One problem that appears with this version is that url strings always appeared in this era without quotation marks, so if the string contained any characters such as "/," they cannot be parsed correctly. This DTD can be seen here and is invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.a.dtd">

For the variant seen towards the end of that period (in a few files from early January 1991) that use the <P> element as a container tag, it can be invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.a.dtd" [ <!ENTITY % HTML.zero.b "INCLUDE"> ]>

There never was any W3C attempt at making any file for this DTD.

"html0.c.dtd"

This DTD is purely my own construction, attempting to capture the form of HTML that the NeXT HTML editor generated from January 23, 1991 through November 23, 1992, that particular "HTML" which is easily distinguished by the use of the <NEXTID> tags with only a number as their attribute, ALL CAPITALS used for HTML tags and attributes, the use of the list of tags current in the earliest HTML documentation, invariable use of the NAME attribute in every <A> tag, and without starting that atribute on a new line wherever it appeares. This DTD has several problems in that <XMP>, <LISTING>, and <PLAINTEXT> were allowed, which allows for ill-formed contents that may prematurely terminate the example or listing section or else trigger other errors, and it still continues the same problem as the previous DTD stage in that a url without quotation marks but containing certain characters will also trigger errors. The <NEXTID> problem is worked around only by giving its attribute an enumerated list of possible values (and each value merely a number), thus allowing the number to appear by itself as is always found during this period, but unfortunately limiting it to values under 128. This DTD can be seen here and is invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.c.dtd">

There never was any W3C attempt at making any file for this DTD.

"html0.d.dtd"

This DTD is purely my own construction, attempting to capture the form of HTML that the NeXT HTML editor generated from November 26, 1992 through at least about May 24, 1993 or so, that particular "HTML" which is easily distinguished by the use of the <HEADER> tags instead of using <HEAD> tags as seen everywhere else that the HEAD of the document is set apart with element tags at all. This is the only DTD I provide that allows for this element, and also for the highlighted phrase elements (<HP0>, <HP1>, <HP2>, and <HP3>) that had been first mentioned (but rarely if ever implemented) during the previous HTML era, that spanning from January 23, 1991 through November 23, 1992. This DTD can be seen here and is invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.d.dtd">

There is no "original" surviving file for this DTD, and it is unlikely that any has ever existed.

"html0.f.dtd"

This is very oldest HTML DTD that survives, clearly just an experiment in DTD's well before HTML began to follow any DTD to any real degree. For my own copy, in this case I have made no change to the DTD at all. It does not include any SGML declaration, so there is none to comment out, and it references no exterior files for entities or anything else. It has no errors of the sort that would render it unusable as an SGML DTD, and at any rate was very primitive and not meant to be used seriously. In one file I illustrate the fullest extent of the HTML language it specifies (each tag, attribute, and entity), along with its idiosyncrasies and limitations. This DTD can be seen here and is invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.f.dtd">

The original DTD can be seen here.

"html0.h.dtd"

This is the DTD prepared by Dan Connolly in January 1993, and the first that seems to be functional in a practically useful form. At least one HTML file written by Dan Connolly has been positively identified as having been written to the standard of this DTD, but as it was not endorsed by Tim Berners-Lee (his name is not on it as it is on the next DTD) nor complied with by the then current version of the NeXT HTML editor (it was generating HTML to what I described above as the html0.d "standard"), I still therefore regard this as belonging to HTML 0 and not HTML 1. Dan Connolly used it at least for that one file, and possibly some few others, but no one else ever did.

For my copy of this file, I commented out the SGML declaration and deleted the <!DOCTYPE> declarator which would have made this DTD capable of being affixed to the start of an HTML file. In one file I illustrate the fullest extent of the HTML language it specifies (each tag, attribute, and entity), along with its idiosyncrasies and limitations. This DTD can be seen here and is invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.h.dtd">

The original DTD can be seen here.

"html1.k.dtd"

This DTD is included as part of the description of HTML published as an "Internet Draft" by the IIIR Working Group at the end of June, 1993. This marks the first attempt to associate a narrative description of the current version of HTML with a DTD as the actual definition of that version of HTML, and hence passes my concept for what I call "HTML 1." This document is signed by both Dan Connolly and Tim Berners-Lee, and as such can be regarded as having the endorsement of both of them, however long it may have taken for any HTML editors to catch up with it (or surpass it). The DTD itself is actually contained only in pages 31-36 of this document, and part of the cleanup I had to do besides eliminating the narrative portions preceding and following the DTD itself was to eliminate the pagination marks interspersed among the DTD contents and correct a few lines that had accidently wrapped around for being too long for the narrow ascii text file format. And once again the SGML declaraion had to be commented out and the <!DOCTYPE> declarator to make the DTD something that could be affixed to the start of the file was deleted. This DTD also introduced a couple minor errors in that the entity declarations for &amp; and &lt; are incorrect, and &quot;, though mentioned in the narrative, is simply not included at all. In addition, there are several mistakes, as can be gleaned from the intended language as described in the narrative versus the language as actually defined in the DTD, for example the COMPACT attribute is supposed to stand alone as an attribute but instead takes a value, and it was not applied to <DL> as it should and is to the other list types ("HTML 1.m" would have the exact opposite error about this), and <IMG> should also have ALT and ALIGN attributes. I therefore have two versions of this file available, one with the DTD exactly as it is with its errors (but at least the parsing/validating engines can still use it) and the other corrected to the clear intent of its authors. In one file I illustrate the fullest extent of the HTML language it specifies (each tag, attribute, and entity), along with its idiosyncrasies and limitations, without the corrections, and in another file, with the corrections. These DTDs can be seen here (straight), or here (corrected) and can be invoked in a file by either of:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html1.k.dtd">

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html1.kc.dtd">

The original DTD can be seen here, on pages 31 through 36.

"html1.m.dtd"

This DTD is published along with a set of files describing HTML in May 1994. It shows some advance over the initial draft of "HTML 1," and is actually almost like HTML 2.0. One change needed to make this DTD usable for validating files is that the pointer to the ISO Latin entity file needed to be corrected to point to the actual file that exists out there. The only other fix would be that COMPACT should go on all list types and not just <DL>. I therefore have two versions of this file available, one with the DTD exactly as it is with only the entity files reference pointer fixed, and the other corrected to the clear intent of its authors. In one file I illustrate the fullest extent of the HTML language it specifies (each tag, attribute, and entity), along with its idiosyncrasies and limitations, without the corrections, and in another file, with the corrections. These DTDs can be seen here (straight), or here (corrected) and can be invoked in a file by either of:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html1.m.dtd">

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html1.mc.dtd">

The original DTD can be seen here.

"html1.q.dtd"

This DTD is the oldest surviving DTD for HTML+, dating from November 1993. This DTD has a number of problems that render it unable to be accepted even by parsing/validating engines as a valid DTD, and so even the "straight" version I have posted has several modifications made just so it can be used. There was not only SGML declaration and <!DOCTYPE> declarator to comment out and remove, but also some problems with the language details as well. For one, in order to make the main HTMLPLUS element declaration able to accept actual <HEAD> and <BODY> declarations or content, the contents of the element have to be specifically the <HEAD> and then <BODY> elements, not a choice between either this pair of elements explicitly declared, or else neither one of them declared with either opening or closing tags and only their content in successive sequence. Then the <HEAD> and <BODY> element declarations themselves must specify that both the opening and closing tags are optional. This creates a slight change from the language intended, but also enables it to recognize and accept document content. So this change had to be made. Another is that the <IMAGE> element calls out that it accepts as content any combination of "A" or "%text;" but "A" is already included as one of the possible contents of "%text;", so an extra "A" only causes parsing problems and carries no real intentional change to the language. Yet another is that the <BOX> element is ambiguous as to its contents since "%math;" could occur either before or after "OVER" and there is no way to tell which, so with some different parenthesis I have made it so that the second "%math;" can only be meant if it comes after an "OVER" else the first "%math;" is clearly the one intended. This results in no linguistic differences from that which was plainly intended. Finally <FORM> has a problem in that it attempts to include #PCDATA directly within itself and this makes it unable to recognize elements within the form, so a different content declaration was prepared to exclude #PCDATA so forms now can be successfully parsed.

The corrected version fixes not only all the above but also corrects typographical errors that would have rendered impossible using the <ISINDEX>, <NEXTID>, <ADDRESS>, and <BLOCKQUOTE> elements in a file, or even the &amp; and &lt; character entities, and restores the missing &quot; entity. I therefore have two versions of this file available, one with the only the necessary things fixed, and the other corrected to the clear intent of its authors. In one file I illustrate the fullest extent of the HTML+ language it specifies (each tag, attribute, and entity), along with its idiosyncrasies and limitations, without the corrections, and in another file, with the corrections. These DTDs can be seen here (straight), or here (corrected) and can be invoked in a file by either of:

<!DOCTYPE HTMLPLUS SYSTEM "http://www.the-pope.com/html1.q.dtd">

<!DOCTYPE HTMLPLUS SYSTEM "http://www.the-pope.com/html1.qc.dtd">

The original DTD can be seen here.

"html1.s.dtd"

This is the final surviving (and possibly final at all) HTML+ DTD, dating from April 1994. For this file, again the SGML declaration is commented out and the <!DOCTYPE> declarator is deleted, and the pointers to the ISO latin character entities and also math symbol entities had to be updated to the actual positions for these files. The four basic entities (&amp;, &lt;, &gt;, and &quot;) are also added to this version. Also, the default is to exclude all but the most basic features, unless extra SGML commands are contained in the document's <!DOCTYPE> declaration to enable them, so another version called "html1.t" has been created with the defaults for all of these features set to enable them. I therefore have two versions of this file available, one with the only the necessary things fixed, and the other corrected to the clear intent of its authors. In addition, there is a problem in the DTD in that if figures are enabled then MATH constructs get excluded even if MATH is enabled, so there is also provided here a pair of corrected DTDs in which this problem is fixed. Between the two files, file1, and file2, I illustrate the fullest extent of the HTML+ language it specifies with all features enabled (each tag, attribute, and entity), along with its idiosyncrasies and limitations, but one has to exclude MATH and the other has to exclude figures, without the corrections, and in another file, with the corrections. These DTDs can be seen here (straight), or here (featured enabled by default), or here (corrected), or here (corrected and features enabled by default) and can be invoked in a file by any one of:

<!DOCTYPE HTMLPLUS SYSTEM "http://www.the-pope.com/html1.s.dtd">

<!DOCTYPE HTMLPLUS SYSTEM "http://www.the-pope.com/html1.t.dtd">

<!DOCTYPE HTMLPLUS SYSTEM "http://www.the-pope.com/html1.sc.dtd">

<!DOCTYPE HTMLPLUS SYSTEM "http://www.the-pope.com/html1.tc.dtd">

The original DTD can be seen here.

"html3.0.dtd"

This is the final draft of HTML 3.0 as it stood when it was abandoned in March 1995. The ISO Latin entity and Math symbol entity file pointers are updated in the copy I provide, plus I have added the basic HTML entities (&amp;, &lt;, &gt;, and &quot;), and deleted an external reference to a supposed "w3c-style" notation. In the case of this DTD it was not necessary to comment out or delete any SGML delcarations or <!DOCTYPE> declarators, as they were not included in this draft. This DTD can be seen here and is invoked in a file by:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html3.0.dtd">

The original DTD can be seen here.

This file, "dtdnote.html," is HTML 4.01 Transitional compliant.
The "HTML 0.f" demonstration file "html0.f.html" is HTML "HTML 0.f" compliant, but with one warning.
The "HTML 0.h" demonstration file "hthl.html" is HTML "HTML 0.h" compliant, but with one warning.
The "HTML 1.k" demonstration file "htkl.html" is HTML "HTML 1.k" (uncorrected DTD) compliant, but with one warning.
The "HTML 1.k" demonstration file "htklc.html" is HTML "HTML 1.kc" (corrected DTD) compliant, but with one warning.
The "HTML 1.m" demonstration file "htkla.html" is HTML "HTML 1.m" (uncorrected DTD) compliant, but with one warning.
The "HTML 1.m" demonstration file "htklac.html" is HTML "HTML 1.mc" (corrected DTD) compliant, but with one warning.
The "HTML 1.q" demonstration file "htpl.html" is HTML "HTML 1.q" (uncorrected DTD) compliant, but with one warning.
The "HTML 1.q" demonstration file "htplc.html" is HTML "HTML 1.qc" (corrected DTD) compliant, but with one warning.
The "HTML 1.s" demonstration file "htpla.html" is HTML "HTML 1.s" (uncorrected DTD) compliant, but with one warning.
The "HTML 1.s" demonstration file "htplb.html" is HTML "HTML 1.s" (uncorrected DTD) compliant, but with one warning.
The "HTML 1.s" demonstration file "htplac.html" is HTML "HTML 1.sc" (corrected DTD) compliant, but with one warning.


Next Level Up